2.4 Sort, Rank, & Order
In the past two sections, you have mastered how to create vectors of different types including numeric, character and logical. In addition, you know how to create vectors with patterns. A vector usually contains more than one elements. Sometimes, you want to order the elements in various ways. In this section, we will introduce important functions that relate to ordering elements in a vector.
2.4.1 Numeric vectors
Let’s start with numeric vectors. Firstly, let’s create a numeric vector which will be used throughout this part.
<- c(2, 3, 2, 0, 4, 7)
x #check the value of x x
a. Sort vectors
The first function we will introduce is sort()
. By default, the sort()
function sorts elements in vector in the ascending order, namely from the smallest to largest.
sort(x)
#> [1] 0 2 2 3 4 7
If you want to sort the vector in the descending order, namely from the largest to smallest, you can set a second argument decreasing = TRUE
.
sort(x, decreasing = TRUE)
b. Ranks of vectors
Next, let’s talk about ranks. The rank()
function gives the ranks for each element of the vector, namely the corresponding positions in the ascending order.
rank(x)
#> [1] 2.5 4.0 2.5 1.0 5.0 6.0
If you check the values of x
, you can see that the smallest value of x
is 0, which corresponds to the fourth element. Thus, the fourth element has rank 1. The second smallest value of x
is 2, which is shared at the first and the third elements, resulting a tie (elements with the same value will result in a tie). Normally, these two elements would have ranks 2 and 3. To break the tie, the rank()
function assigns all the elements involving in the tie (the first and third elements in this example) the same rank, which is average of all their ranks (the average of 2 and 3), by default. In addition to this default behavior for handling ties, rank()
also provides other options by setting the ties.method
argument.
If you set ties.method = "min"
, all the tied elements will have the minimum rank instead of the average rank. In this case, the minimum rank is 2.
rank(x, ties.method = "min")
#> [1] 2 4 2 1 5 6
If you want to break the ties by the order element appears in the vector, you can set ties.method = "first"
. Then the earlier appearing element will have smaller ranks than the later one. In this example, the first element will have rank 2 and the third element has rank 3, since the first element appears earlier than the third element. There are other options for handling ties, which you can look up in the documentation of rank()
if interested.
rank(x, ties.method = "first")
#> [1] 2 4 3 1 5 6
Unlike sort()
, you can’t get positions in the descending order from the rank()
function, which means you can’t add decreasing = TRUE
in rank()
.
c. Order of vectors
The next item we want to introduce is the order()
function. Note that the function name order could be a bit misleading since ordering elements also has the same meaning of sorting. However, although it is related to sorting, order()
is a very different function from sort()
.
Let’s recall the values of x
and apply order()
on x
.
x#> [1] 2 3 2 0 4 7
order(x)
#> [1] 4 1 3 2 5 6
From the result, you can see that the order()
function returns indices for the elements in the ascending order, namely from the smallest to the largest. For example, the first output is 4, indicating the 4th element in x
is the smallest. The second output is 1, showing the 1st element in x
is the second smallest.
Unlike rank()
, the order()
function breaks the ties by the appearing order by default.
If you want the indices corresponding to the descending order, then you can set decreasing = TRUE
just like what we did in the sort()
function.
order(x, decreasing = TRUE)
So far, we have covered sort()
, rank()
and order()
functions for numeric vectors. It is helpful to provide a brief summary.
- The
sort()
function sorts elements in vectors. - The
rank()
function will give ranks for each element of the vector. - The
order()
function returns indices for the elements.
2.4.2 Character vectors
Now, let’s move to character vectors. For character vectors, R uses the lexicographical ordering, which is sometimes called dictionary order since it is the order used in a dictionary. Similar to numeric vectors, let’s first prepare a character vector. Note that the strings in character vectors can contain letters, numbers, or symbols.
<- c("a", "A", "B", "b", "ab","aC", "1c", ".a", "1a","2a",".a","&u","3","_4") char_vec
a. Ordering rules
First, let’s discuss the ordering of a single character, including symbols, digits and letters. There are a few important ordering rules as follows.
- symbols < digits < letters: symbols appear first, followed by digits, and letters come last.
- symbols are ordered in the following way.
<- c(" ",",",";","_","(",")","!","[","]","{","}","-","*","/","#","$","%","^","&","`","@","+","=","|","?","<",">",".")
syms sort(syms)
#> [1] " " "_" "-" "," ";" "!" "?" "." "(" ")" "[" "]" "{" "}" "@" "*" "/" "&" "#"
#> [20] "%" "`" "^" "+" "<" "=" ">" "|" "$"
- digits are in an ascending order: the smaller digits appear earlier than the bigger ones.
<- 0:9
nums sort(nums)
#> [1] 0 1 2 3 4 5 6 7 8 9
- letters are alphabetically ordered, for the same letter,the lower case comes first.
<- c(letters,LETTERS)
all_letters sort(all_letters)
#> [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" "f" "F" "g" "G" "h" "H" "i" "I" "j"
#> [20] "J" "k" "K" "l" "L" "m" "M" "n" "N" "o" "O" "p" "P" "q" "Q" "r" "R" "s" "S"
#> [39] "t" "T" "u" "U" "v" "V" "w" "W" "x" "X" "y" "Y" "z" "Z"
Here, letters
is a character vector pre-created by R, it has all 26 letters in the alphabet with lower case. And LETTERS
is another character vector, which has all 26 letters in the alphabet with upper case.
b. Sort vectors
As before, you can apply sort()
on character vectors. Basically, the elements of character vectors ordered by the first character of their values, move to the second character if there are ties in the first character (same first character), and look at more characters until the ties are broken or run out of characters.
sort(char_vec)
#> [1] "_4" ".a" ".a" "&u" "1a" "1c" "2a" "3" "a" "A" "ab" "aC" "b" "B"
We have the following observations.
- Symbols appear first, followed by digits, and letters come last.
- According to the ordering rule of symbols,
_4
is the first,.a
should be the second and&u
is the third. 1a
and1c
have the same first character, since a comes before c,1a
comes before1c
.ab
andaC
have the same first character, since b comes before C (regardless of the case),ab
comes beforeaC
.
Of course, we can also have the order reversed by setting decreasing = TRUE
.
sort(char_vec, decreasing = TRUE)
c. Ranks of vectors
Similarly, you can look at the rank for each element according to the ordering rules. Here, the element with rank 1 is _4
and .a
has rank 2. Just like numeric vectors, if you have elements with the same value in character vectors, the rank of these elements will be the same (the average of the corresponding ranks) by default.
rank(char_vec)
#> [1] 9.0 10.0 14.0 13.0 11.0 12.0 6.0 2.5 5.0 7.0 2.5 4.0 8.0 1.0
As expected, you can set the ties.method
argument in rank()
to use other methods for breaking ties.
rank(char_vec, ties.method = "min")
rank(char_vec, ties.method = "first")
d. Order of vectors
Again, you can get the indices for each element in character vectors with the same order()
function like that for numeric vectors. Also, the order()
function breaks the ties by the appearing order by default.
order(char_vec)
#> [1] 14 8 11 12 9 7 10 13 1 2 5 6 4 3
The decreasing
argument still works for order()
!
order(char_vec, decreasing = TRUE)
#> [1] 3 4 6 5 2 1 13 10 7 9 12 8 11 14
2.4.3 Logical vectors
Since there are only two possible values TRUE
and FALSE
for logical vectors, it is straightforward to sort them with the knowledge of FALSE < TRUE
. You can try the following example.
<- c(TRUE, FALSE, FALSE, TRUE, TRUE)
logi_vec sort(logi_vec)
rank(logi_vec)
order(logi_vec)
2.4.4 Exercises
Write R codes to solve the following problems.
Create a numeric vector named
exe
with values2, 0, -3, 0, 5, 6
and sortexe
from the largest to the smallest.In
exe
, what’s the ranks of2
and the first0
?For
exe
, get indices for the elements in the ascending order.Create a character vector with values
"&5", "Nd", "9iC", "3df", "df", "nd", "_5", "9ic"
and sort it in the ascending order. Then, explain 1) why3df
goes before9ic
; 2) why&5
goes before3df
; 3) why9ic
goes before9iC
.