官术网_书友最值得收藏!

Downloading open data

Before conducting any data analysis, an essential step is to collect high-quality, meaningful data. One important data source is open data, which is selected, organized, and freely available to the public. Most open data is published online in either text format or as APIs. Here, we introduce how to download the text format of an open data file with the download.file function.

Getting ready

In this recipe, you need to prepare your environment with R installed and a computer that can access the Internet.

How to do it…

Please perform the following steps to download open data from the Internet:

  1. First, visit the http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices link to view the historical price of the S&P 500 in Yahoo Finance:

    Figure 1: Historical price of S&P 500

  2. Scroll down to the bottom of the page, right-click and copy the link in Download to Spreadsheet (the link should appear similar to http://real-chart.finance.yahoo.com/table.csv?s=%5EGSPC&d=6&e=3&f=2015&g=d&a=0&b=3&c=1950&ignore=.csv):

    Figure 2: Download to Spreadsheet

  3. Download this file with the download.file function:
    > download.file('http://rea
    l-chart.finance.yahoo.com/table.csv?s=%5EGSPC&d=6&e=3&f=2015&g=d&a=0&b=3&c=1950&ignore=.csv', 'snp500.csv')
    
  4. You can now use the getwd function to determine the current directory, and then use list.files to search for the downloaded file:
    > getwd()
    > list.files('./')
    

How it works…

In this recipe, we demonstrated how to download a file using download.file in R. First, we used Yahoo Finance to view historical prices of the S&P 500. At the bottom of the page, we found a link with a http:// URL prefix. The http:// URL prefix stands for Hypertext Transfer Protocol (HTTP), which serves the purpose of transmitting and receiving information over the Internet. Therefore, we can request the remote server with the link address through the use of download.file. Last, we can make the request for the link and save the remote file into our local directory.

There's more…

Apart from using the download.file function to download the file, you can use RCurl to download a file with either a HTTP URL prefix or HTTPS URL prefix:

  1. First, go to the https://nycopendata.socrata.com/Social-Services/NYC-Wi-Fi-Hotspot-Locations/a9we-mtpn? link to explore the Wi-Fi hotspot location file in the NYC open data:

    Figure 3: Wi-Fi hotspot location of NYC

  2. Next, click on Export and find the CSV download link:

    Figure 4: Downloading the CSV format of the Wi-Fi hotspot location

  3. You can then install and load the RCurl package:
    > install.packages("RCurl")
    > library(RCurl)
    
  4. Finally, download the HTTPS URL prefix file by using the getURL function:
    > rows <- getURL("https://nycopendata.socrata.com/api/views/jd4g-ks2z/rows.csv?accessType=DOWNLOAD")
    
主站蜘蛛池模板: 罗城| 贵港市| 富裕县| 宜黄县| 安福县| 乌兰察布市| 浏阳市| 河南省| 将乐县| 永年县| 武安市| 罗江县| 杭锦后旗| 凤山县| 文安县| 来凤县| 景宁| 英山县| 剑阁县| 阜康市| 大余县| 北海市| 二连浩特市| 灵宝市| 鄢陵县| 曲阜市| 荣昌县| 铜梁县| 富源县| 潼关县| 五家渠市| 石景山区| 桐城市| 浦县| 舞钢市| 郴州市| 汝南县| 巴彦淖尔市| 鹿邑县| 中阳县| 米林县|