- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 505字
- 2021-07-14 10:51:25
Scanning text files
In previous recipes, we introduced how to use read.table
and read.csv
to load data into an R session. However, read.table
and read.csv
only work if the number of columns is fixed and the data size is small. To be more flexible in data processing, we will demonstrate how to use the scan
function to read data from the file.
Getting ready
In this recipe, you need to have completed the previous recipes and have snp500.csv
downloaded in the current directory.
How to do it…
Please perform the following steps to scan data from the CSV file:
- First, you can use the
scan
function to read data fromsnp500.csv
:> stock_data3 <- scan('snp500.csv',sep=',', what=list(Date = '', Open = 0, High = 0, Low = 0,Close = 0, Volume = 0, Adj_Close = 0), skip=1, fill=T) Read 16481 records
- You can then examine loaded data with
mode
andstr
:> mode(stock_data3) [1] "list" > str(stock_data3) List of 7 $ Date : chr [1:16481] "2015-07-02" "2015-07-01" "2015-06-30" "2015-06-29" ... $ Open : num [1:16481] 2078 2067 2061 2099 2103 ... $ High : num [1:16481] 2085 2083 2074 2099 2109 ... $ Low : num [1:16481] 2071 2067 2056 2057 2095 ... $ Close : num [1:16481] 2077 2077 2063 2058 2102 ... $ Volume : num [1:16481] 3.00e+09 3.73e+09 4.08e+09 3.68e+09 5.03e+09 ... $ Adj_Close: num [1:16481] 2077 2077 2063 2058 2102 ...
How it works…
When comparing read.csv
and read.table
, the scan
function is more flexible and efficient in data reading. Here, we specify the field name and support type of each field within a list in the what
parameter. In this case, the first field is of character type, and the rest of the fields are of numeric type. Therefore, we can set two single (or double) quotes for the Date
column, and 0
for the rest of the fields. Then, as we need to skip the header row and automatically add empty fields to any lines with fewer fields than the number of columns, we set skip
to 1
and fill
to True
.
At this point, we can now examine the data with some built-in functions. Here, we use mode
to obtain the type of the object and use str
to display the structure of the data.
There's more…
On some occasions, the data is separated by fixed width rather than fixed delimiter. To specify the width of each column, you can use the read.fwf
function:
- First, you can use
download.file
to downloadweather.op
from the author's GitHub page:> download.file("https://github.com/ywchiu/rcookbook/raw/master/chapter2/weather.op", "weather.op")
- You can then examine the data with the file editor:
Figure 5: Using the file editor to examine the file
- Read the data by specifying the width of each column in
widths
, the column name incol.names
, and skip the first row by settingskip
to1
:> weather <- read.fwf("weather.op", widths = c(6,6,10,11,9,8), col.names = c("STN","WBAN","YEARMODA","TEMP","MAX","MIN"), skip=1)
- Lastly, you can examine the data using the
head
andnames
functions:> head(weather) STN WBAN YEARMODA TEMP MAX MIN 1 8403 99999 20140101 85.8 24 102.7* 69.3* 2 8403 99999 20140102 86.3 24 102.9* 71.1* 3 8403 99999 20140103 85.9 24 101.1* 72.0* 4 8403 99999 20140104 85.6 24 102.7* 70.5* 5 8403 99999 20140105 84.8 23 102.0* 66.6* 6 8403 99999 20140106 86.8 23 102.0* 70.9* > names(weather) [1] "STN" "WBAN" "YEARMODA" "TEMP" "MAX" [6] "MIN"
- Python程序設計教程(第2版)
- Learn to Create WordPress Themes by Building 5 Projects
- Hands-On Data Structures and Algorithms with JavaScript
- Learning ASP.NET Core 2.0
- ASP.NET 3.5程序設計與項目實踐
- Building an RPG with Unity 2018
- QGIS 2 Cookbook
- Java EE Web應用開發基礎
- Instant Apache Camel Messaging System
- Python計算機視覺與深度學習實戰
- ASP.NET Core and Angular 2
- ASP.NET jQuery Cookbook(Second Edition)
- 數據庫技術及應用教程上機指導與習題(第2版)
- 大象:Thinking in UML(第二版)
- Java與Android移動應用開發:技術、方法與實踐