4.4 Smoothline Fits
Now, you know how to create scatterplots with many possible customizations via specifying different aesthetics. In addition to scatterplots, a very useful type of plots that can capture the trend of pairwise relationship is the smoothline fits.
4.4.1 Creating Smoothline Fits using geom_smooth()
To create a smoothline fit, you can use the geom_smooth()
function in the ggplot2 package. Let’s say you want to find the trend between the sale price and the living area of a house.
library(ggplot2)
library(r02pro)
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area, y = sale_price))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Perhaps it is helpful to review the code for generating a scatterplot between liv_area
and sale_price
.
ggplot(data = sahp) +
geom_point(mapping = aes(x = liv_area, y = sale_price))
We can see that the only difference is the use of different geoms. In fact, the mechanism of geom_smooth()
is that it fits a smooth line according to the points of the given variable pair. By default, it uses the loess method (locally estimated scatterplot smoothing), which is a popular nonparametric regression technique. In addition to the smoothline, it also generates a shaded area, representing the confidence interval around the fitted smoothline. To hide this shaded area, you can add the argument se = FALSE
as a global aesthetic.
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area, y = sale_price),
se = FALSE)
In addition to the default loess method for smoothline fit, geom_smooth()
also provides other smoothing methods. For example, we can set method = "lm"
to fit a linear line.
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area, y = sale_price),
method = "lm")
#> `geom_smooth()` using formula 'y ~ x'
4.4.2 Aesthetics in Smoothline Fits
As in scatterplots, you can also set global aesthetics as well as map variables to aesthetics in smoothline fits. Let’s begin with mapping variables to aesthetics. We first define a new logical vector good_qual
which is TRUE
when oa_qual > 5
.
$good_qual <- sahp$oa_qual > 5 sahp
a. Group
When we map a variable to the group
aesthetic, geom_smooth
will first divide all the data points into different groups according to the variable value, and then fit a separate smoothline for each group.
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area,
y = sale_price,
group = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
You can see that two smoothlines are generated. However, it is not clear from the plot which group each smoothline corresponds to. To make the two smoothlines different, you can map the variable to other aesthetics.
b. Color
As in geom_point()
, we can map the variable to the color aesthetic.
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area,
y = sale_price,
color = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
This is a more informative plot than the one using group
aesthetic as you can see the two smoothlines have different colors according to the value of good_qual
.
c. Line Type
Another useful aesthetic that was not applicable in geom_point()
is linetype
, which controls the linetypes for each smoothline.
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area,
y = sale_price,
linetype = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The plot shows a dashed line for the smoothline corresponding to good_qual == TRUE
, and a solid line for the smoothline corresponding to good_qual == FALSE
.
d. Size
You can also map good_qual
to the size
aesthetic, which controls the width of each smoothline fit.
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area,
y = sale_price,
size = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
It is worth to mention that shape
is not a valid aesthetic for geom_smooth
as it doesn’t make sense to talk about the shape of a line.
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area,
y = sale_price,
shape = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
When you try to map a variable to the shape
aesthetic, geom_smooth()
will show a warning message “Warning: Ignoring unknown aesthetics: shape,” and use the group
aesthetic instead.
Naturally, you can also have global aesthetic and it is straightforward to combine multiple aesthetics in the same plot.
ggplot(data = sahp) +
geom_smooth(mapping = aes(x = liv_area,
y = sale_price,
color = good_qual),
linetype = 2)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
4.4.3 Exercises
Using the sahp
dataset with the ggplot2 package, answer the following questions.
- Create a smoothline fit to visualize the relationship between
lot_area
(on the x-axis) andsale_price
(on the y-axis). - Create several smoothlines with different colors corresponding to the value of
kit_qual
to visualize the relationship betweenlot_area
(on the x-axis) andsale_price
(on the y-axis). - Create smoothlines without confidence interval around and with different linetypes to distinguish whether the house has more than 2 bedrooms to visualize the relationship between
lot_area
(on the x-axis) andsale_price
(on the y-axis) .