官术网_书友最值得收藏!

Sorting data

The power of sorting enables us to view data in an arrangement so that we can analyze the data more efficiently. In a database, we can use an order by clause to sort data with appointed columns. In R, we can use the order and sort functions to place data in an arrangement.

Getting ready

Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to sort the salaries dataset:

  1. First, we can use the sort function to sort data:
    > a <- c(5,1,4,3,2,6,3)
    > sort(a)
    [1] 1 2 3 3 4 5 6
    > sort(a, decreasing=TRUE)
    [1] 6 5 4 3 3 2 1
    
  2. Next, we can determine how the order function works on the same input vector:
    > order(a)
    [1] 2 5 4 7 3 1 6
    > order(a, decreasing = TRUE)
    [1] 6 1 3 4 7 5 2
    
  3. To sort a data frame by a specific column, we first obtain the ordered index and then employ the index to retrieve the sorted dataset:
    > sorted_salaries <- salaries[order(salaries$salary, decreasing = TRUE),]
    > head(sorted_salaries)
     emp_no salary from_date to_date
    684 10068 113229 2001-08-03 9999-01-01
    683 10068 112470 2000-08-03 2001-08-03
    682 10068 111623 1999-08-04 2000-08-03
    681 10068 108345 1998-08-04 1999-08-04
    680 10068 106204 1997-08-04 1998-08-04
    679 10068 105533 1996-08-04 1997-08-04
    
  4. Besides sorting data by a single column, we can sort data by multiple columns:
    > sorted_salaries2 <-salaries[order(salaries$salary, salaries$from_date, decreasing = TRUE),]
    > head(sorted_salaries2)
     emp_no salary from_date to_date
    684 10068 113229 2001-08-03 9999-01-01
    683 10068 112470 2000-08-03 2001-08-03
    682 10068 111623 1999-08-04 2000-08-03
    681 10068 108345 1998-08-04 1999-08-04
    680 10068 106204 1997-08-04 1998-08-04
    679 10068 105533 1996-08-04 1997-08-04
    

How it works…

R provides two methods to sort data: one is sort and the other is order. For the sort function, the function returns sorted vector as output. In our first case, we set up an a integer vector with seven integer elements. We then applied the sort function to sort the a vector, which yielded a sorted vector as the output. By default, the sorted vector is in ascending order. However, we can change the order sequence by specifying decreasing to TRUE. On the other hand, the order function returns an ordering index vector as output. Still, we can specify whether the returned index vector is in ascending or descending order.

To arrange elements in the vector in ascending or descending order, we can simply use the sort function. However, to arrange records in a specific column, we should use the order function. In our example, we first obtained the ordering index in descending order from the salary attribute and then retrieved the record from salaries with an ordering index. As a result, we found records in salaries arranged by salary. Besides sorting records by a single attribute, we can sort records by multiple attributes. All we need to do is to place the salary and from_date attributes one by one in the order function.

There's more…

You can use the arrange function in plyr to sort salary data with salary in ascending order and from_date in descending order:

> arranged_salaries <- arrange(salaries, salary, desc(from_date))
> head(arranged_salaries)
 emp_no salary from_date to_date
1 10048 39507 1986-02-24 1987-01-27
2 10027 39520 1996-04-01 1997-04-01
3 10064 39551 1986-11-20 1987-11-20
4 10072 39567 1990-05-21 1991-05-21
5 10072 39724 1991-05-21 1992-05-20
6 10049 39735 1993-05-04 1994-05-04
主站蜘蛛池模板: 汽车| 常德市| 西乌珠穆沁旗| 元朗区| 辽中县| 祥云县| 鹤山市| 石狮市| 华安县| 壤塘县| 平度市| 迁西县| 崇义县| 普兰县| 汤阴县| 金塔县| 大丰市| 修水县| 宣汉县| 社会| 彭泽县| 界首市| 南澳县| 涪陵区| 报价| 阿勒泰市| 望奎县| 舞钢市| 纳雍县| 太保市| 翁牛特旗| 乌兰察布市| 灵璧县| 平罗县| 台北市| 兰州市| 连平县| 大余县| 永清县| 墨玉县| 横峰县|