6.2 Reorder Observations

Now, let’s look at the second task: find the 10 houses with the highest sale prices. To order observations, you can use the function arrange() in the dplyr package.

To make the ordering easier to check, we will first focus on a smaller data set, that corresponds to the houses that was sold in Jan 2009 with overall condition equaling 5.

library(r02pro)
library(dplyr)
library(tibble)
jan09 <- filter(ahp, yr_sold == 2009, mo_sold == 1, oa_cond == 5)
jan09

To arrange the observations in the ascending order of the year that the house was built (yr_built), you just need to add yr_built as a second argument of the arrange() function. To arrange the descending order, you just need to add desc() around the variable.

arrange(jan09, yr_built)         #arrange in the ascending order of yr_built
#> # A tibble: 8 × 56
#>   dt_sold    yr_sold mo_sold yr_built yr_remodel bldg_class bldg_type
#>   <date>       <dbl>   <dbl>    <dbl>      <dbl>      <dbl> <chr>
#> 1 2009-01-09    2009       1     1920       1950         30 1Fam
#> 2 2009-01-07    2009       1     1979       1998         20 1Fam
#> 3 2009-01-17    2009       1     2004       2004         20 1Fam
#> 4 2009-01-28    2009       1     2004       2004         60 1Fam
#> 5 2009-01-07    2009       1     2004       2005         20 1Fam
#> 6 2009-01-18    2009       1     2008       2008         20 1Fam
#> 7 2009-01-07    2009       1     2008       2009         60 1Fam
#> 8 2009-01-16    2009       1       NA       2007         20 1Fam
#> # … with 49 more variables: house_style <chr>, zoning <chr>, neighborhd <chr>,
#> #   oa_cond <dbl>, oa_qual <dbl>, func <chr>, liv_area <dbl>, 1fl_area <dbl>,
#> #   2fl_area <dbl>, tot_rms <dbl>, bedroom <dbl>, bathroom <dbl>, kit <dbl>,
#> #   kit_qual <chr>, central_air <chr>, elect <chr>, bsmt_area <dbl>,
#> #   bsmt_cond <chr>, bsmt_exp <chr>, bsmt_fin_qual <chr>, bsmt_ht <chr>,
#> #   ext_cond <chr>, ext_cover <chr>, ext_qual <chr>, fdn <chr>, fence <chr>,
#> #   fp <dbl>, fp_qual <chr>, gar_area <dbl>, gar_car <dbl>, gar_cond <chr>, …
arrange(jan09, desc(yr_built))   #arrange in the descending order of yr_built
#> # A tibble: 8 × 56
#>   dt_sold    yr_sold mo_sold yr_built yr_remodel bldg_class bldg_type
#>   <date>       <dbl>   <dbl>    <dbl>      <dbl>      <dbl> <chr>
#> 1 2009-01-18    2009       1     2008       2008         20 1Fam
#> 2 2009-01-07    2009       1     2008       2009         60 1Fam
#> 3 2009-01-17    2009       1     2004       2004         20 1Fam
#> 4 2009-01-28    2009       1     2004       2004         60 1Fam
#> 5 2009-01-07    2009       1     2004       2005         20 1Fam
#> 6 2009-01-07    2009       1     1979       1998         20 1Fam
#> 7 2009-01-09    2009       1     1920       1950         30 1Fam
#> 8 2009-01-16    2009       1       NA       2007         20 1Fam
#> # … with 49 more variables: house_style <chr>, zoning <chr>, neighborhd <chr>,
#> #   oa_cond <dbl>, oa_qual <dbl>, func <chr>, liv_area <dbl>, 1fl_area <dbl>,
#> #   2fl_area <dbl>, tot_rms <dbl>, bedroom <dbl>, bathroom <dbl>, kit <dbl>,
#> #   kit_qual <chr>, central_air <chr>, elect <chr>, bsmt_area <dbl>,
#> #   bsmt_cond <chr>, bsmt_exp <chr>, bsmt_fin_qual <chr>, bsmt_ht <chr>,
#> #   ext_cond <chr>, ext_cover <chr>, ext_qual <chr>, fdn <chr>, fence <chr>,
#> #   fp <dbl>, fp_qual <chr>, gar_area <dbl>, gar_car <dbl>, gar_cond <chr>, …

You may observe from the results that there are several houses with the same yr_built value, leading to a tie. To break the tie, you can supply additional variables in the arrange() function, which will arrange the observations in the tie according to the additional variables sequentially.

arrange(jan09, desc(yr_built), bldg_class)
#> # A tibble: 8 × 56
#>   dt_sold    yr_sold mo_sold yr_built yr_remodel bldg_class bldg_type
#>   <date>       <dbl>   <dbl>    <dbl>      <dbl>      <dbl> <chr>
#> 1 2009-01-18    2009       1     2008       2008         20 1Fam
#> 2 2009-01-07    2009       1     2008       2009         60 1Fam
#> 3 2009-01-17    2009       1     2004       2004         20 1Fam
#> 4 2009-01-07    2009       1     2004       2005         20 1Fam
#> 5 2009-01-28    2009       1     2004       2004         60 1Fam
#> 6 2009-01-07    2009       1     1979       1998         20 1Fam
#> 7 2009-01-09    2009       1     1920       1950         30 1Fam
#> 8 2009-01-16    2009       1       NA       2007         20 1Fam
#> # … with 49 more variables: house_style <chr>, zoning <chr>, neighborhd <chr>,
#> #   oa_cond <dbl>, oa_qual <dbl>, func <chr>, liv_area <dbl>, 1fl_area <dbl>,
#> #   2fl_area <dbl>, tot_rms <dbl>, bedroom <dbl>, bathroom <dbl>, kit <dbl>,
#> #   kit_qual <chr>, central_air <chr>, elect <chr>, bsmt_area <dbl>,
#> #   bsmt_cond <chr>, bsmt_exp <chr>, bsmt_fin_qual <chr>, bsmt_ht <chr>,
#> #   ext_cond <chr>, ext_cover <chr>, ext_qual <chr>, fdn <chr>, fence <chr>,
#> #   fp <dbl>, fp_qual <chr>, gar_area <dbl>, gar_car <dbl>, gar_cond <chr>, …

Here, the observations are arranged in the descending order of yr_built, and the ties are broken in the ascending order of bldg_class. Clearly, you can supply as many arguments as needed in the arrange() function. It is also important to note that the observations that has an NA value in the specified variable will always be arranged at the end of the output.

6.2.1 Exercises

Using the ahp dataset,

1. Find all houses built in 2008 with house style as 2Story, then arrange the the observations in the ascending order of remodel year, and break the ties in the descending order of sale price.