7.3 Reorder Observations

Now, let’s look at the third task: find the 10 countries with the highest life expectancy in the year 2008. To order observations, you can use the function arrange() in the dplyr package.

First, let’s create a new dataset called gm_2008 that only contains the observations in 2008 and the variables country, life_expectancy, and GDP_per_capita.

library(r02pro)
library(dplyr)
library(tibble)
gm_2008 <- gm %>%
    filter(year == 2008) %>%
    select(country, life_expectancy, GDP_per_capita)

To arrange the observations in the ascending order of the life expectancy (life_expectancy), you just need to add life_expectancy as an argument of the arrange() function. To arrange the descending order, you can add desc() around the variable.

gm_2008 %>%
    arrange(life_expectancy)  #arrange in the ascending order of life_expectancy
#> # A tibble: 236 × 3
#>    country                  life_expectancy GDP_per_capita
#>    <chr>                              <dbl>          <dbl>
#>  1 Eswatini                            46.4          3.21 
#>  2 Central African Republic            47.4          0.514
#>  3 Lesotho                             47.4          0.948
#>  4 Zimbabwe                            50.2          0.941
#>  5 Mozambique                          53.5          0.463
#>  6 Somalia                             54.6         NA    
#>  7 Malawi                              55            0.346
#>  8 Sierra Leone                        55.3          0.52 
#>  9 Guinea-Bissau                       55.5          0.574
#> 10 South Africa                        55.7          5.48 
#> # ℹ 226 more rows
gm_2008 %>%
    arrange(desc(life_expectancy))  #arrange in the descending order of life_expectancy
#> # A tibble: 236 × 3
#>    country          life_expectancy GDP_per_capita
#>    <chr>                      <dbl>          <dbl>
#>  1 Japan                       83.3           31.7
#>  2 Hong Kong, China            82.7           35.9
#>  3 Switzerland                 82.5           80.3
#>  4 Singapore                   82.5           43.3
#>  5 Iceland                     82.4           49.5
#>  6 Australia                   81.9           53.3
#>  7 Andorra                     81.8           35.4
#>  8 Spain                       81.8           25.8
#>  9 Italy                       81.8           31.6
#> 10 San Marino                  81.8           56.7
#> # ℹ 226 more rows

You may observe from the results that there are several countries with the same life_expectancy value, leading to a tie. To break the tie, you can supply additional variables in the arrange() function, which will arrange the observations within the tie according to the additional variables in the order they are supplied.

gm_2008 %>%
    arrange(life_expectancy, GDP_per_capita)
#> # A tibble: 236 × 3
#>    country                  life_expectancy GDP_per_capita
#>    <chr>                              <dbl>          <dbl>
#>  1 Eswatini                            46.4          3.21 
#>  2 Central African Republic            47.4          0.514
#>  3 Lesotho                             47.4          0.948
#>  4 Zimbabwe                            50.2          0.941
#>  5 Mozambique                          53.5          0.463
#>  6 Somalia                             54.6         NA    
#>  7 Malawi                              55            0.346
#>  8 Sierra Leone                        55.3          0.52 
#>  9 Guinea-Bissau                       55.5          0.574
#> 10 Zambia                              55.7          1.13 
#> # ℹ 226 more rows

Here, the observations are arranged in the ascending order of life_expectancy, and the ties are broken in the ascending order of GDP_per_capita. Note that the observations that has an NA value in the specified variable will always be arranged at the end of the output.

If you want to break the tie in the descending order of GDP_per_capita, you can use desc() around the variable.

gm_2008 %>%
    arrange(life_expectancy, desc(GDP_per_capita))
#> # A tibble: 236 × 3
#>    country                  life_expectancy GDP_per_capita
#>    <chr>                              <dbl>          <dbl>
#>  1 Eswatini                            46.4          3.21 
#>  2 Lesotho                             47.4          0.948
#>  3 Central African Republic            47.4          0.514
#>  4 Zimbabwe                            50.2          0.941
#>  5 Mozambique                          53.5          0.463
#>  6 Somalia                             54.6         NA    
#>  7 Malawi                              55            0.346
#>  8 Sierra Leone                        55.3          0.52 
#>  9 Guinea-Bissau                       55.5          0.574
#> 10 South Africa                        55.7          5.48 
#> # ℹ 226 more rows

Now, we are ready to present the 10 countries with the highest life expectancy in 2008.

gm_2008 %>%
    arrange(desc(life_expectancy)) %>%
    head(10)
#> # A tibble: 10 × 3
#>    country          life_expectancy GDP_per_capita
#>    <chr>                      <dbl>          <dbl>
#>  1 Japan                       83.3           31.7
#>  2 Hong Kong, China            82.7           35.9
#>  3 Switzerland                 82.5           80.3
#>  4 Singapore                   82.5           43.3
#>  5 Iceland                     82.4           49.5
#>  6 Australia                   81.9           53.3
#>  7 Andorra                     81.8           35.4
#>  8 Spain                       81.8           25.8
#>  9 Italy                       81.8           31.6
#> 10 San Marino                  81.8           56.7

Here, the head(10) function is used to get the first 10 observations in the dataset.

7.3.1 Exercises

Using the ahp dataset,

Find all houses built in 2008 with house_style as "2Story", then arrange the the observations in the ascending order of remodel year, and break the ties in the descending order of sale_price.
Find all houses sold in 2009 with house_style as "1Story", and arrange the observations in the descending order of sale_price.