官术网_书友最值得收藏!

Dataframes, lists, arrays, and matrices

Dataframes have several important features that make them useful for data analysis:

  • Rectangular data structures, with the typical use being cases (for example, the days in one month) listed down the rows and variables (page views, unique visitors, or referrers) listed along the columns
  • A mix of data types is supported. A typical data frame might include variables containing dates, numbers (integers or floats), and text
  • With subsetting and variable extraction, R provides a lot of built-in functionality to select rows and variables within a dataframe
  • Many functions include a data argument, which makes it very simple to pass dataframes into functions and process only the variables and cases that are relevant, which makes for cleaner and simpler code

We can inspect the first few rows of the dataframe using the head(analyticsData) command. The following screenshot shows the output of this command:

As you can see, there are four variables within the dataframe: one contains dates, two contain integer variables, and one contains a numeric variable.

Variables can be extracted from dataframes very simply using the $ operator, as follows:

> analyticsData$pageViews
[1] 836 676 940 689 647 899 934 718 776 570 651 816
[13] 731 604 627 946 634 990 994 599 657 642 894 983
[25] 646 540 756 989 965 821

Variables can also be extracted from dataframes using [], as shown in the following command:

> analyticsData[, "pageViews"]

Note the use of a comma with nothing before it to indicate that all rows are required. In general, dataframes can be accessed using dataObject[x,y], with x being the number(s) or name(s) of the rows required and y being the number(s) or name(s) of the columns required. For example, if the first 10 rows were required from the pageViews column, it could be achieved like this:

> analyticsData[1:10,"pageViews"]
[1] 836 676 940 689 647 899 934 718 776 570

Leaving the space before the comma blank returns all rows, and leaving the space after the comma blank returns all variables. For example, the following command returns the first three rows of all variables:

> analyticsData[1:3,]

The following screenshot shows the output of this command:

Dataframes are a special type of list. Lists can hold many different types of data, including lists. As with many data types in R, their elements can be named, which can be useful to write code that is easy to understand. Let's make a list of the options for dinner, with drink quantities expressed in milliliters.

In the following example, please also note the use of the c() function, which is used to produce vectors and lists by giving their elements separated by commas. R will pick an appropriate class for the return value, string for vectors that contain strings, numeric for those that only contain numbers, logical for Boolean values, and so on:

> dinnerList <- list("Vegetables" =
  c("Potatoes", "Cabbage", "Carrots"),
  "Dessert" = c("Ice cream", "Apple pie"),
  "Drinks" = c(250, 330, 500)
  )
Note that code is indented throughout, although entering code directly into the console will not produce indentations; it is done for readability.

Indexing is similar to dataframes (which are, after all, just a special instance of a list). They can be indexed by number, as shown in the following command:

> dinnerList[1:2]
$Vegetables
[1] "Potatoes" "Cabbage"  "Carrots"
    
$Dessert
[1] "Ice cream" "Apple pie"

This returns a list. Returning an object of the appropriate class is achieved using [[]]:

> dinnerList[[3]]
[1] 250 330 500

In this case, a numeric vector is returned. They can also be indexed by name, as shown in the following code:

> dinnerList["Drinks"]
$Drinks
[1] 250 330 500

Note that this also returns a list.

Matrices and arrays, which, unlike dataframes, only hold one type of data, also make use of square brackets for indexing, with analyticsMatrix[, 3:6] returning all rows of the third to sixth columns, analyticsMatrix[1, 3] returning just the first row of the third column, and analyticsArray[1, 2, ] returning the first row of the second column across all of the elements within the third dimension.

主站蜘蛛池模板: 平陆县| 错那县| 辽阳市| 陇西县| 东乡县| 白玉县| 凌源市| 广宗县| 晋城| 临安市| 鄂尔多斯市| 德昌县| 中阳县| 新竹县| 普格县| 黑水县| 商丘市| 芮城县| 綦江县| 罗源县| 华亭县| 临颍县| 泉州市| 贵港市| 韶关市| 赞皇县| 利川市| 琼中| 烟台市| 宝鸡市| 安庆市| 小金县| 富宁县| 盱眙县| 广宁县| 万宁市| 区。| 长春市| 柳河县| 江孜县| 沙坪坝区|