官术网_书友最值得收藏!

For loop

For loops are often used to go through every column or row of a dataframe in R.

Say, for example, that we're interested in the mean of all of the numeric columns of the built-in iris dataset (which is four out of the five—everything but the Species column, which is a factor variable of character strings indicating the species of each iris.) We could type, four times, mean(iris$Sepal.Length), with each input variable name changing each time. However, a far more efficient way to complete this exercise would be to use a for loop.

If we simply want to print the means to the console, we could use a for loop as follows:

for(i in seq_along(iris)){
print(mean(iris[[i]]))
}

The output will be as follows:

We'll come back to the output, especially that warning message, in a second—first, let's break down the components of the for loop. The syntax will always be as follows:

for(i in a range of numbers){
some_action
}

In this particular for loop, we chose i as our iterator variable. A for loop in R will automatically iterate this variable, which means that every time it reaches the end of the loop, it will increase i by one. You might have noticed that once the loop has finished completing, i was added to the global environment as a Value, 5L (which means it's an integer, the number 5). Our iterator will always get added to the environment when a loop concludes.

It is displayed on the screen, as shown in the following screenshot:

The R function seq_along() is very helpful for the for loops, because it automatically moves along the number of columns of the dataframe (if that's the input) or more generally, iterates along the number of items contained in whatever is input into it.

We also chose to print the mean of each column in this particular for loop. Accessing the columns is done using indexing, so when i = 1, iris[[i]] is equal to the Sepal.Length variable, which is column 1, and so on. We got an error for column 5, because it isn't numeric (the Species variable!) Species doesn't have a mean, because it's a character variable.

This is actually a great example of where we can combine for loops with an if statement. Take a look at the following code:

for(i in seq_along(iris)){
if(class(iris[[i]]) == "numeric"){
print(mean(iris[[i]]))
}
}

The if statement here will only print the mean of an iris column if the class of that column is numeric (which makes sense, since only numeric columns should have means!) The output is now only as follows:

If we're really feeling fancy, we could have even added an else statement with a different message for when the class of a column isn't numeric, such as in this loop:

for(i in seq_along(iris)){
if(class(iris[[i]]) == "numeric"){
print(mean(iris[[i]]))
}else{
print(paste("Variable", i, "isn't numeric"))
}
}

The output is as follows:

seq_along() returns a sequence of numbers and makes for loops more straightforward. However, if you need to iterate using any other function, the syntax of the for statement will change slightly. The following code will print every row of the Species column in iris:

for(i in 1:nrow(iris)){
print(iris[i, "Sepal.Width"])
}

You have to explicitly use 1:nrow(iris) in the for statement, or this loop will not run. nrow() simply returns the number of rows of iris versus the entire sequence of the number of columns that seq_along() returns as shown below:

nrow(iris)
[1] 150
seq_along(iris)
[1] 1 2 3 4 5
主站蜘蛛池模板: 铁力市| 吉木乃县| 营口市| 望谟县| 凉城县| 额济纳旗| 锦屏县| 贞丰县| 大新县| 西乌珠穆沁旗| 台中市| 新化县| 胶州市| 莫力| 柳江县| 甘南县| 内江市| 临沭县| 闸北区| 北宁市| 博客| 通化市| 彭山县| 双柏县| 南宫市| 左云县| 霸州市| 措勤县| 卢龙县| 巴南区| 龙里县| 吴堡县| 高尔夫| 镇雄县| 江陵县| 大邑县| 临沧市| 荥阳市| 宝丰县| 偏关县| 北海市|