## 2.4 Sort, Rank, & Order

In the past two sections, you have mastered how to create vectors of different types including numeric, character and logical. In addition, you know how to create vectors with patterns. A vector usually contains more than one elements. Sometimes, you want to order the elements in various ways. In this section, we will introduce important functions that relate to ordering elements in a vector.

### 2.4.1 Numeric vectors

Let’s start with numeric vectors. Firstly, let’s create a numeric vector which will be used throughout this part.

```
<- c(2, 3, 2, 0, 4, 7)
x #check the value of x x
```

*a. Sort vectors*

The first function we will introduce is `sort()`

. By default, the `sort()`

function **sorts** elements in vector in the ascending order, namely from the smallest to largest.

```
sort(x)
#> [1] 0 2 2 3 4 7
```

If you want to sort the vector in the descending order, namely from the largest to smallest, you can set a second argument `decreasing = TRUE`

.

`sort(x, decreasing = TRUE)`

*b. Ranks of vectors*

Next, let’s talk about ranks. The `rank()`

function gives the **ranks** for each element of the vector, namely the corresponding positions in the **ascending order**.

```
rank(x)
#> [1] 2.5 4.0 2.5 1.0 5.0 6.0
```

If you check the values of `x`

, you can see that the smallest value of `x`

is 0, which corresponds to the fourth element. Thus, the fourth element has rank 1. The second smallest value of `x`

is 2, which is shared at the first and the third elements, resulting a **tie** (elements with the same value will result in a tie). Normally, these two elements would have ranks 2 and 3. To break the tie, the `rank()`

function assigns all the elements involving in the tie (the first and third elements in this example) the same rank, which is **average** of all their ranks (the average of 2 and 3), by default. In addition to this default behavior for handling ties, `rank()`

also provides other options by setting the `ties.method`

argument.

If you set `ties.method = "min"`

, all the tied elements will have the *minimum rank* instead of the average rank. In this case, the minimum rank is 2.

```
rank(x, ties.method = "min")
#> [1] 2 4 2 1 5 6
```

If you want to break the ties by the order element appears in the vector, you can set `ties.method = "first"`

. Then the earlier appearing element will have smaller ranks than the later one. In this example, the first element will have rank 2 and the third element has rank 3, since the first element appears earlier than the third element. There are other options for handling ties, which you can look up in the documentation of `rank()`

if interested.

```
rank(x, ties.method = "first")
#> [1] 2 4 3 1 5 6
```

Unlike `sort()`

, you can’t get positions in the descending order from the `rank()`

function, which means you can’t add `decreasing = TRUE`

in `rank()`

.

*c. Order of vectors*

The next item we want to introduce is the `order()`

function. Note that the function name order could be a bit misleading since ordering elements also has the same meaning of sorting. However, although it is related to sorting, `order()`

is a very *different* function from `sort()`

.

Let’s recall the values of `x`

and apply `order()`

on `x`

.

```
x#> [1] 2 3 2 0 4 7
order(x)
#> [1] 4 1 3 2 5 6
```

From the result, you can see that the `order()`

function returns **indices** for the elements in the ascending order, namely from the smallest to the largest. For example, the first output is 4, indicating the 4th element in `x`

is the smallest. The second output is 1, showing the 1st element in `x`

is the second smallest.

Unlike `rank()`

, the `order()`

function breaks the ties by the appearing order by default.

If you want the indices corresponding to the descending order, then you can set `decreasing = TRUE`

just like what we did in the `sort()`

function.

`order(x, decreasing = TRUE) `

So far, we have covered `sort()`

, `rank()`

and `order()`

functions for numeric vectors. It is helpful to provide a brief summary.

- The
`sort()`

function sorts elements in vectors. - The
`rank()`

function will give ranks for each element of the vector. - The
`order()`

function returns indices for the elements.

### 2.4.2 Character vectors

Now, let’s move to character vectors. For character vectors, R uses the **lexicographical ordering**, which is sometimes called dictionary order since it is the order used in a dictionary. Similar to numeric vectors, let’s first prepare a character vector. Note that the strings in character vectors can contain letters, numbers, or symbols.

`<- c("a", "A", "B", "b", "ab","aC", "1c", ".a", "1a","2a",".a","&u","3","_4") char_vec `

*a. Ordering rules*

First, let’s discuss the ordering of a single character, including symbols, digits and letters. There are a few important ordering rules as follows.

- symbols < digits < letters: symbols appear first, followed by digits, and letters come last.
- symbols are ordered in the following way.

```
<- c(" ",",",";","_","(",")","!","[","]","{","}","-","*","/","#","$","%","^","&","`","@","+","=","|","?","<",">",".")
syms sort(syms)
#> [1] " " "_" "-" "," ";" "!" "?" "." "(" ")" "[" "]" "{" "}" "@" "*" "/" "&" "#"
#> [20] "%" "`" "^" "+" "<" "=" ">" "|" "$"
```

- digits are in an ascending order: the smaller digits appear earlier than the bigger ones.

```
<- 0:9
nums sort(nums)
#> [1] 0 1 2 3 4 5 6 7 8 9
```

- letters are alphabetically ordered, for the same letter，the lower case comes first.

```
<- c(letters,LETTERS)
all_letters sort(all_letters)
#> [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" "f" "F" "g" "G" "h" "H" "i" "I" "j"
#> [20] "J" "k" "K" "l" "L" "m" "M" "n" "N" "o" "O" "p" "P" "q" "Q" "r" "R" "s" "S"
#> [39] "t" "T" "u" "U" "v" "V" "w" "W" "x" "X" "y" "Y" "z" "Z"
```

Here, `letters`

is a character vector pre-created by R, it has all 26 letters in the alphabet with lower case. And `LETTERS`

is another character vector, which has all 26 letters in the alphabet with upper case.

*b. Sort vectors*

As before, you can apply `sort()`

on character vectors. Basically, the elements of character vectors ordered by the first character of their values, move to the second character if there are ties in the first character (same first character), and look at more characters until the ties are broken or run out of characters.

```
sort(char_vec)
#> [1] "_4" ".a" ".a" "&u" "1a" "1c" "2a" "3" "a" "A" "ab" "aC" "b" "B"
```

We have the following observations.

- Symbols appear first, followed by digits, and letters come last.
- According to the ordering rule of symbols,
`_4`

is the first,`.a`

should be the second and`&u`

is the third. `1a`

and`1c`

have the same first character, since a comes before c,`1a`

comes before`1c`

.`ab`

and`aC`

have the same first character, since b comes before C (regardless of the case),`ab`

comes before`aC`

.

Of course, we can also have the order reversed by setting `decreasing = TRUE`

.

`sort(char_vec, decreasing = TRUE)`

*c. Ranks of vectors*

Similarly, you can look at the rank for each element according to the ordering rules. Here, the element with rank 1 is `_4`

and `.a`

has rank 2. Just like numeric vectors, if you have elements with the same value in character vectors, the rank of these elements will be the same (the average of the corresponding ranks) by default.

```
rank(char_vec)
#> [1] 9.0 10.0 14.0 13.0 11.0 12.0 6.0 2.5 5.0 7.0 2.5 4.0 8.0 1.0
```

As expected, you can set the `ties.method`

argument in `rank()`

to use other methods for breaking ties.

```
rank(char_vec, ties.method = "min")
rank(char_vec, ties.method = "first")
```

*d. Order of vectors*

Again, you can get the indices for each element in character vectors with the same `order()`

function like that for numeric vectors. Also, the `order()`

function breaks the ties by the appearing order by default.

```
order(char_vec)
#> [1] 14 8 11 12 9 7 10 13 1 2 5 6 4 3
```

The `decreasing`

argument still works for `order()`

!

```
order(char_vec, decreasing = TRUE)
#> [1] 3 4 6 5 2 1 13 10 7 9 12 8 11 14
```

### 2.4.3 Logical vectors

Since there are only two possible values `TRUE`

and `FALSE`

for logical vectors, it is straightforward to sort them with the knowledge of `FALSE < TRUE`

. You can try the following example.

```
<- c(TRUE, FALSE, FALSE, TRUE, TRUE)
logi_vec sort(logi_vec)
rank(logi_vec)
order(logi_vec)
```

### 2.4.4 Exercises

Write R codes to solve the following problems.

Create a numeric vector named

`exe`

with values`2, 0, -3, 0, 5, 6`

and sort`exe`

from the largest to the smallest.In

`exe`

, what’s the ranks of`2`

and the first`0`

?For

`exe`

, get indices for the elements in the ascending order.Create a character vector with values

`"&5", "Nd", "9iC", "3df", "df", "nd", "_5", "9ic"`

and sort it in the ascending order. Then, explain 1) why`3df`

goes before`9ic`

; 2) why`&5`

goes before`3df`

; 3) why`9ic`

goes before`9iC`

.