- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 402字
- 2021-07-14 10:51:24
Downloading open data
Before conducting any data analysis, an essential step is to collect high-quality, meaningful data. One important data source is open data, which is selected, organized, and freely available to the public. Most open data is published online in either text format or as APIs. Here, we introduce how to download the text format of an open data file with the download.file
function.
Getting ready
In this recipe, you need to prepare your environment with R installed and a computer that can access the Internet.
How to do it…
Please perform the following steps to download open data from the Internet:
- First, visit the http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices link to view the historical price of the S&P 500 in Yahoo Finance:
Figure 1: Historical price of S&P 500
- Scroll down to the bottom of the page, right-click and copy the link in Download to Spreadsheet (the link should appear similar to http://real-chart.finance.yahoo.com/table.csv?s=%5EGSPC&d=6&e=3&f=2015&g=d&a=0&b=3&c=1950&ignore=.csv):
Figure 2: Download to Spreadsheet
- Download this file with the
download.file
function:> download.file('http://rea l-chart.finance.yahoo.com/table.csv?s=%5EGSPC&d=6&e=3&f=2015&g=d&a=0&b=3&c=1950&ignore=.csv', 'snp500.csv')
- You can now use the
getwd
function to determine the current directory, and then uselist.files
to search for the downloaded file:> getwd() > list.files('./')
How it works…
In this recipe, we demonstrated how to download a file using download.file
in R. First, we used Yahoo Finance to view historical prices of the S&P 500. At the bottom of the page, we found a link with a http://
URL prefix. The http://
URL prefix stands for Hypertext Transfer Protocol (HTTP), which serves the purpose of transmitting and receiving information over the Internet. Therefore, we can request the remote server with the link address through the use of download.file
. Last, we can make the request for the link and save the remote file into our local directory.
There's more…
Apart from using the download.file
function to download the file, you can use RCurl
to download a file with either a HTTP URL prefix or HTTPS URL prefix:
- First, go to the https://nycopendata.socrata.com/Social-Services/NYC-Wi-Fi-Hotspot-Locations/a9we-mtpn? link to explore the Wi-Fi hotspot location file in the NYC open data:
Figure 3: Wi-Fi hotspot location of NYC
- Next, click on Export and find the CSV download link:
Figure 4: Downloading the CSV format of the Wi-Fi hotspot location
- You can then install and load the
RCurl
package:> install.packages("RCurl") > library(RCurl)
- Finally, download the HTTPS URL prefix file by using the
getURL
function:> rows <- getURL("https://nycopendata.socrata.com/api/views/jd4g-ks2z/rows.csv?accessType=DOWNLOAD")
- Python Deep Learning
- Raspberry Pi 2 Server Essentials
- MATLAB定量決策五大類問題
- Unity 5.x By Example
- KnockoutJS Starter
- Visual FoxPro程序設計習題集及實驗指導(第四版)
- Swift 4從零到精通iOS開發
- QGIS 2 Cookbook
- C++程序設計教程
- Instant Automapper
- INSTANT Apache ServiceMix How-to
- Mastering ASP.NET Core 2.0
- C語言程序設計實驗指導教程
- Game Programming using Qt 5 Beginner's Guide
- HTML5+CSS3+JavaScript案例實戰