官术网_书友最值得收藏!

Working with Excel files

Excel is another popular tool used to store and analyze data. Of course, one can convert Excel files to CSV files or other text formats by using Excel. Alternatively, to simplify the process, you can use install and load the xlsx package to read and process Excel data in R.

Getting ready

In this recipe, you need to prepare your environment with R installed and a computer that can access the Internet.

How to do it…

Please perform the following steps to read Excel documents:

  1. First, install and load the xlsx package:
    > install.packages("xlsx")
    > library(xlsx)
    
  2. Access www.data.worldbank.org/topic/economy-and-growth to find world economy indicator data in Excel:

    Figure 6: World economy indicator

  3. Download world economy indicator data from the following URL using download.file:
    > download.file("http://api.worldbank.org/v2/en/topic/3?downloadformat=excel", "worldbank.xls", mode="wb")
    
  4. Examine the downloaded file with Excel (or Open Office):

    Figure 7: Using Excel to examine the downloaded file

  5. You can use read.xlsx2 to read data from the downloaded Excel file:
    > options(java.parameters = "-Xmx2000m")
    > wb <- read.xlsx2("worldbank.xls", sheetIndex = 1, startRow = 4)
    
  6. Select the country name, country code, indicator name, indicator code, and data for 2014 out of the read data:
    > wb2 <- wb[c("Country.Name","Country.Code","Indicator.Name","Indicator.Code", "X2014")]
    
  7. Subsequently, you can examine the dimension of the file using the dim function:
    > dim(wb2)
    
  8. Finally, you can write filtered data back to a file named 2014wbdata.xlsx:
    > write.xlsx2(wb2, "2014wbdata.xlsx", sheetName = "Sheet1")
    

How it works…

In this recipe, we covered how to read and write an Excel file containing world development indicators with the xlsx package. In the first step, we needed to install and load the xlsx package, which enables the user to read and write Excel files in the R command prompt through the use of a Java POI package. Thus, to utilize the Java POI package, the installation process will also install rJava and xlsxjars at the same time. You can find the Java POI .jar file under <R Installed Path>\library\xlsx]jars\java. Using the author's computer as an example, which has the Windows 7 operating system installed, the .jar files are located at the C:\Program Files\R\R-3.2.1\library\xlsxjars\java path.

Next, we downloaded world economy indicator data from the link (http://data.worldbank.org/topic/economy-and-growth) with the download.file function. By default, download.file downloads the file in ASCII mode. To download the file in binary mode, we need to set download mode to wb.

After the Excel file is downloaded, we can examine it with Excel. The screenshot of the Excel file shows that the economy indicator starts from row 4 in Sheet 1. Therefore, we can use the read.xlsx2 function to read world economy indicators from this location. The xlsx package provides two functions to read data from Excel: read.xlsx and read.xlsx2. As the read.xlsx2 function mainly processes data in Java, read.xlsx2 performs better (in particular, read.xlsx2 processes data considerably faster on sheets with more than 100,000 cells).

After we have read the contents of the worksheet into an R data frame, we can select variables Country.Name, Country.Code, Indicator.Name, Indicator.Code, and X2014 out of the extracted R data frame. Next, we can use the dim function to examine the dimensions of the data frame. Finally, we can use write.xlsx2 to write transformed data to an Excel file, 2014wbdata.xlsx.

主站蜘蛛池模板: 石柱| 苏尼特右旗| 庆城县| 永安市| 靖宇县| 巨鹿县| 惠水县| 封开县| 江阴市| 天镇县| 铜梁县| 奉节县| 班玛县| 霍州市| 什邡市| 肥东县| 古交市| 华蓥市| 仁怀市| 竹北市| 弋阳县| 西充县| 柳河县| 铜山县| 浦城县| 南靖县| 宜阳县| 崇阳县| 清远市| 红安县| 临澧县| 五华县| 宁德市| 平塘县| 镇雄县| 海丰县| 吉安县| 武汉市| 漳平市| 合阳县| 西林县|