6.7 Grouped Mutate and Filter via group_by()
In Section 6.6, you have seen the power of group_by()
and summarize()
which can help to create grouped summaries. In turns out the group_by()
can be combined with filter()
and mutate()
to filter observations by group and create new variables by group.
Let’s say we have to focus on the overall conditions where the number of corresponding houses are at least 30. Then, summarize the number of observations and the average sale price for each overall condition.
library(r02pro)
library(dplyr)
%>%
ahp group_by(oa_cond) %>%
filter(n() >= 30) %>%
summarize(count = n(), ave_price = mean(sale_price, na.rm = TRUE))
#> # A tibble: 6 × 3
#> oa_cond count ave_price
#> <dbl> <int> <dbl>
#> 1 3 35 99.8
#> 2 4 70 114.
#> 3 5 1165 207.
#> 4 6 367 149.
#> 5 7 270 155.
#> 6 8 101 156.
The code works in the following steps.
- Group the observations according to
oa_cond
. - Filter the observations that has a
oa_cond
which the number of houses having such value is at least 30. - For the filtered groups, create two new variables represents the number of houses, and the average sale price, respectively.
Next, let’s try to find the top two expensive houses that was remodel each year, show only the remodel year and sale price, and arrange the result in the ascending order of remodel year. To achieve this task, we need to do the following.
- Group the observations by the remodel year.
- Compute the rank of each observation in terms of decreasing sale price within each remodel year.
- Filter the observations the has a rank less than or equal to 2.
- Select the remodel year and sale price.
- Arrange the observations according to the ascending order of remodel year.
%>%
ahp group_by(yr_remodel) %>%
mutate(r = rank(desc(sale_price), ties.method = "first")) %>%
filter(r <= 2 ) %>%
select(yr_remodel, sale_price) %>%
arrange(yr_remodel)
#> # A tibble: 122 × 2
#> # Groups: yr_remodel [61]
#> yr_remodel sale_price
#> <dbl> <dbl>
#> 1 1950 257.
#> 2 1950 256
#> 3 1951 155
#> 4 1951 141
#> 5 1952 146.
#> 6 1952 166
#> 7 1953 217
#> 8 1953 225
#> 9 1954 156.
#> 10 1954 150.
#> # … with 112 more rows
The final example is to group the houses by the combination of kitchen quality (kit_qual
) and central AC (central_air
), find the two houses with the minimum and maximum sale prices for each group, and compute the IQR of the sale prices for each group. Finally, show the kit_qual
, central_air
and the three values and arrange the observations in the descending order of maximum sale price for each group. The following is the sequence of operations necessary.
- Group the houses by
kit_qual
andcentral_air
. - Create three new variables representing the minimum, maximum, and IQR of the sale prices of each group.
- Select variables
kit_qual
andcentral_air
and the three newly created ones. - Arrange the observations in the descending order of maximum sale price for each group.
%>%
ahp group_by(kit_qual, central_air) %>%
summarize(max_price = max(sale_price, na.rm = TRUE), min_price = min(sale_price, na.rm = TRUE), IQR_price = IQR(sale_price, na.rm = TRUE)) %>%
select(kit_qual, central_air, max_price, min_price, IQR_price) %>%
arrange(desc(max_price))
#> `summarise()` has grouped output by 'kit_qual'. You can override using the
#> `.groups` argument.