4.7 Line Plots

In this section, we will introduce the line plot, which is very useful for visualizing trends and often used in time series.

In the sahp dataset, it would be interesting to generate a plot showing the trend of the sale price as a function of the sold date of the house.

4.7.1 Line Plots via plot()

To generate a line plot, you can use the plot() function by setting the argument type = "l". Before calling the plot() function, you need to first sort the observations (in Section 2.4) according to the variable on the x-axis, which is dt_sold in this example.

library(r02pro)
dt_sold_order <- order(sahp$dt_sold)
plot(sahp$dt_sold[dt_sold_order], 
     sahp$sale_price[dt_sold_order], 
     type = "l", 
     xlab = "Date Sold", 
     ylab = "Sale Price")

The plot() works by first getting the location of the value pairs of dt_sold and sale_price. Then, connect the points in the order of the observations.

In the line plots, we would like to introduce two additional graphical parameters we can customize in the plot() function.

Parameter Meaning Example
lty The line type “dashed”
lwd The line width 2

The line type can be either a integer from 0 to 6, or the corresponding character string in the following table.

Intgers Strings
lty = 0 “blank”
lty = 1 “solid”
lty = 2 “dashed”
lty = 3 “dotted”
lty = 4 “dotdash”
lty = 5 “longdash”
lty = 6 “twodash”

Let’s see an example with the two parameters.

plot(sahp$dt_sold[dt_sold_order], 
     sahp$sale_price[dt_sold_order], 
     type = "l", 
     xlab = "Date Sold", 
     ylab = "Sale Price", 
     lty = 2, 
     lwd = 2)

The plot() function also offers the capability to show the points and line on the same plot by changing type = "b".

plot(sahp$dt_sold[dt_sold_order],
     sahp$sale_price[dt_sold_order], 
     type = "b", 
     xlab = "Date Sold", 
     ylab = "Sale Price")

4.7.2 Line Plots via geom_line()

In addition to the plot() function, we can use the geom_line() function in the ggplot family.

library(r02pro)
library(tidyverse)
ggplot(data = sahp) + 
  geom_line(mapping = aes(x = dt_sold, y = sale_price))

The generated line plot looks essentially the same as that generated by plot(). It is worth noting that here, the points are connected not by the order of the observations, but by the variable on the x-axis, i.e. dt_sold, which avoids the need to sort the observations by the x-axis.

To get a better view on how geom_line() works, let’s focus on the houses that were sold before 2007.

sahp_2006 <- sahp[format(sahp$dt_sold, "%Y") < 2007, ] #all houses sold before 2007
ggplot(data = sahp_2006) + geom_line(mapping = aes(x = dt_sold, y = sale_price))

Next, we add the scatterplot onto the plot using the global mappings.

ggplot(data = sahp_2006, 
       mapping = aes(x = dt_sold, y = sale_price)) + 
  geom_line() + 
  geom_point()

4.7.3 Aesthetics in Line Plots

As expected, we can also map variables to aesthetics in line plots.

ggplot(data = sahp) + 
  geom_line(mapping = aes(x = dt_sold, 
                          y = sale_price, 
                          color = kit_qual))

Here, the observations are divided into groups according to the value of kit_qual and separate line plots are generated for each group, representing as different colors. In addition, we can also map variables to the linetype aesthetic as in the geom_smooth() function.

ggplot(data = sahp) + 
  geom_line(mapping = aes(x = dt_sold, 
                          y = sale_price, 
                          linetype = kit_qual))

In addition to mapping aesthetics, you can also set global aesthetics as before.

ggplot(data = sahp) + 
  geom_line(mapping = aes(x = dt_sold, y = sale_price ), 
            color = "blue", 
            linetype = 3)

4.7.4 Exercises

First, create a data set sahp_2006 by running the following code

sahp_2006 <- sahp[format(sahp$dt_sold, "%Y") < 2007, ] #all houses sold before 2007

Then, use sahp_2006 to answer the following questions.

  1. Using plot() to create a line plot of dt_sold (on the x-axis) and sale_price (on the y-axis) to show the trend of the sale price as a function of the sold date of the house, then give title “DS” for the x-axis and title “SP” for the y-axis and make the line to be a “twodash” line.

  2. Using the ggplot2 package to create a line plot of dt_sold (on the x-axis) and sale_price (on the y-axis) with different linetypes depending on the value of house_style.