4.4 Smoothline Fits

Now, you know how to create scatterplots with many possible customizations via specifying different aesthetics. In addition to scatterplots, a very useful type of plots that can capture the trend of pairwise relationship is the smoothline fits.

4.4.1 Creating Smoothline Fits using geom_smooth()

To create a smoothline fit, you can use the geom_smooth() function in the ggplot2 package. Let’s say you want to find the trend between the sale price and the living area of a house.

library(ggplot2)
library(r02pro)
ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, y = sale_price))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Perhaps it is helpful to review the code for generating a scatterplot between liv_area and sale_price.

ggplot(data = sahp) + 
  geom_point(mapping = aes(x = liv_area, y = sale_price))

We can see that the only difference is the use of different geoms. In fact, the mechanism of geom_smooth() is that it fits a smooth line according to the points of the given variable pair. By default, it uses the loess method (locally estimated scatterplot smoothing), which is a popular nonparametric regression technique. In addition to the smoothline, it also generates a shaded area, representing the confidence interval around the fitted smoothline. To hide this shaded area, you can add the argument se = FALSE as a global aesthetic.

ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, y = sale_price), 
              se = FALSE)

In addition to the default loess method for smoothline fit, geom_smooth() also provides other smoothing methods. For example, we can set method = "lm" to fit a linear line.

ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, y = sale_price), 
              method = "lm")
#> `geom_smooth()` using formula 'y ~ x'

4.4.2 Aesthetics in Smoothline Fits

As in scatterplots, you can also set global aesthetics as well as map variables to aesthetics in smoothline fits. Let’s begin with mapping variables to aesthetics. We first define a new logical vector good_qual which is TRUE when oa_qual > 5.

sahp$good_qual <- sahp$oa_qual > 5 

a. Group

When we map a variable to the group aesthetic, geom_smooth will first divide all the data points into different groups according to the variable value, and then fit a separate smoothline for each group.

ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, 
                            y = sale_price, 
                            group = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

You can see that two smoothlines are generated. However, it is not clear from the plot which group each smoothline corresponds to. To make the two smoothlines different, you can map the variable to other aesthetics.

b. Color

As in geom_point(), we can map the variable to the color aesthetic.

ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, 
                            y = sale_price, 
                            color = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

This is a more informative plot than the one using group aesthetic as you can see the two smoothlines have different colors according to the value of good_qual.

c. Line Type

Another useful aesthetic that was not applicable in geom_point() is linetype, which controls the linetypes for each smoothline.

ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, 
                            y = sale_price, 
                            linetype = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The plot shows a dashed line for the smoothline corresponding to good_qual == TRUE, and a solid line for the smoothline corresponding to good_qual == FALSE.

d. Size

You can also map good_qual to the size aesthetic, which controls the width of each smoothline fit.

ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, 
                            y = sale_price, 
                            size = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

It is worth to mention that shape is not a valid aesthetic for geom_smooth as it doesn’t make sense to talk about the shape of a line.

ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, 
                            y = sale_price, 
                            shape = good_qual))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

When you try to map a variable to the shape aesthetic, geom_smooth() will show a warning message “Warning: Ignoring unknown aesthetics: shape,” and use the group aesthetic instead.

Naturally, you can also have global aesthetic and it is straightforward to combine multiple aesthetics in the same plot.

ggplot(data = sahp) + 
  geom_smooth(mapping = aes(x = liv_area, 
                            y = sale_price, 
                            color = good_qual), 
              linetype = 2)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

4.4.3 Exercises

Using the sahp dataset with the ggplot2 package, answer the following questions.

  1. Create a smoothline fit to visualize the relationship between lot_area (on the x-axis) and sale_price (on the y-axis).
  2. Create several smoothlines with different colors corresponding to the value of kit_qual to visualize the relationship between lot_area (on the x-axis) and sale_price (on the y-axis).
  3. Create smoothlines without confidence interval around and with different linetypes to distinguish whether the house has more than 2 bedrooms to visualize the relationship between lot_area (on the x-axis) and sale_price (on the y-axis) .