6.2 Reorder Observations
Now, let’s look at the second task: find the 10 houses with the highest sale prices. To order observations, you can use the function arrange()
in the dplyr package.
To make the ordering easier to check, we will first focus on a smaller data set, that corresponds to the houses that was sold in Jan 2009 with overall condition equaling 5.
library(r02pro)
library(dplyr)
library(tibble)
<- filter(ahp, yr_sold == 2009, mo_sold == 1, oa_cond == 5)
jan09 jan09
To arrange the observations in the ascending order of the year that the house was built (yr_built
), you just need to add yr_built
as a second argument of the arrange()
function. To arrange the descending order, you just need to add desc()
around the variable.
arrange(jan09, yr_built) #arrange in the ascending order of yr_built
#> # A tibble: 8 × 56
#> dt_sold yr_sold mo_sold yr_built yr_remodel bldg_class bldg_type
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2009-01-09 2009 1 1920 1950 30 1Fam
#> 2 2009-01-07 2009 1 1979 1998 20 1Fam
#> 3 2009-01-17 2009 1 2004 2004 20 1Fam
#> 4 2009-01-28 2009 1 2004 2004 60 1Fam
#> 5 2009-01-07 2009 1 2004 2005 20 1Fam
#> 6 2009-01-18 2009 1 2008 2008 20 1Fam
#> 7 2009-01-07 2009 1 2008 2009 60 1Fam
#> 8 2009-01-16 2009 1 NA 2007 20 1Fam
#> # … with 49 more variables: house_style <chr>, zoning <chr>, neighborhd <chr>,
#> # oa_cond <dbl>, oa_qual <dbl>, func <chr>, liv_area <dbl>, `1fl_area` <dbl>,
#> # `2fl_area` <dbl>, tot_rms <dbl>, bedroom <dbl>, bathroom <dbl>, kit <dbl>,
#> # kit_qual <chr>, central_air <chr>, elect <chr>, bsmt_area <dbl>,
#> # bsmt_cond <chr>, bsmt_exp <chr>, bsmt_fin_qual <chr>, bsmt_ht <chr>,
#> # ext_cond <chr>, ext_cover <chr>, ext_qual <chr>, fdn <chr>, fence <chr>,
#> # fp <dbl>, fp_qual <chr>, gar_area <dbl>, gar_car <dbl>, gar_cond <chr>, …
arrange(jan09, desc(yr_built)) #arrange in the descending order of yr_built
#> # A tibble: 8 × 56
#> dt_sold yr_sold mo_sold yr_built yr_remodel bldg_class bldg_type
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2009-01-18 2009 1 2008 2008 20 1Fam
#> 2 2009-01-07 2009 1 2008 2009 60 1Fam
#> 3 2009-01-17 2009 1 2004 2004 20 1Fam
#> 4 2009-01-28 2009 1 2004 2004 60 1Fam
#> 5 2009-01-07 2009 1 2004 2005 20 1Fam
#> 6 2009-01-07 2009 1 1979 1998 20 1Fam
#> 7 2009-01-09 2009 1 1920 1950 30 1Fam
#> 8 2009-01-16 2009 1 NA 2007 20 1Fam
#> # … with 49 more variables: house_style <chr>, zoning <chr>, neighborhd <chr>,
#> # oa_cond <dbl>, oa_qual <dbl>, func <chr>, liv_area <dbl>, `1fl_area` <dbl>,
#> # `2fl_area` <dbl>, tot_rms <dbl>, bedroom <dbl>, bathroom <dbl>, kit <dbl>,
#> # kit_qual <chr>, central_air <chr>, elect <chr>, bsmt_area <dbl>,
#> # bsmt_cond <chr>, bsmt_exp <chr>, bsmt_fin_qual <chr>, bsmt_ht <chr>,
#> # ext_cond <chr>, ext_cover <chr>, ext_qual <chr>, fdn <chr>, fence <chr>,
#> # fp <dbl>, fp_qual <chr>, gar_area <dbl>, gar_car <dbl>, gar_cond <chr>, …
You may observe from the results that there are several houses with the same yr_built
value, leading to a tie. To break the tie, you can supply additional variables in the arrange()
function, which will arrange the observations in the tie according to the additional variables sequentially.
arrange(jan09, desc(yr_built), bldg_class)
#> # A tibble: 8 × 56
#> dt_sold yr_sold mo_sold yr_built yr_remodel bldg_class bldg_type
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2009-01-18 2009 1 2008 2008 20 1Fam
#> 2 2009-01-07 2009 1 2008 2009 60 1Fam
#> 3 2009-01-17 2009 1 2004 2004 20 1Fam
#> 4 2009-01-07 2009 1 2004 2005 20 1Fam
#> 5 2009-01-28 2009 1 2004 2004 60 1Fam
#> 6 2009-01-07 2009 1 1979 1998 20 1Fam
#> 7 2009-01-09 2009 1 1920 1950 30 1Fam
#> 8 2009-01-16 2009 1 NA 2007 20 1Fam
#> # … with 49 more variables: house_style <chr>, zoning <chr>, neighborhd <chr>,
#> # oa_cond <dbl>, oa_qual <dbl>, func <chr>, liv_area <dbl>, `1fl_area` <dbl>,
#> # `2fl_area` <dbl>, tot_rms <dbl>, bedroom <dbl>, bathroom <dbl>, kit <dbl>,
#> # kit_qual <chr>, central_air <chr>, elect <chr>, bsmt_area <dbl>,
#> # bsmt_cond <chr>, bsmt_exp <chr>, bsmt_fin_qual <chr>, bsmt_ht <chr>,
#> # ext_cond <chr>, ext_cover <chr>, ext_qual <chr>, fdn <chr>, fence <chr>,
#> # fp <dbl>, fp_qual <chr>, gar_area <dbl>, gar_car <dbl>, gar_cond <chr>, …
Here, the observations are arranged in the descending order of yr_built
, and the ties are broken in the ascending order of bldg_class
. Clearly, you can supply as many arguments as needed in the arrange()
function. It is also important to note that the observations that has an NA
value in the specified variable will always be arranged at the end of the output.