7.1 Convert Between Names and Values
First, let’s create an artificial dataset which contains the weights of a sheep and a pig for years 2019, 2020, and 2021.
library(tibble)
<- rep(c("sheep","pig"), c(3,3))
animal <- rep(2019:2021, 2)
year <- c(110, 120, 140, NA, 300, 800)
weight <- tibble(animal, year, weight)
animal_tidy
animal_tidy#> # A tibble: 6 × 3
#> animal year weight
#> <chr> <int> <dbl>
#> 1 sheep 2019 110
#> 2 sheep 2020 120
#> 3 sheep 2021 140
#> 4 pig 2019 NA
#> 5 pig 2020 300
#> 6 pig 2021 800
By checking the definition of tidy data, it is clear animal_tidy
is indeed tidy. Let’s make it untidy.
7.1.1 Convert Values into Column Names
In animal_tidy
, each row contains the year when the weight measurement was taken. Suppose we want to convert the year value into column names. You can use the pivot_wider()
function in tidyr package, another member of the tidyverse package.
In pivot_wider()
, you need to specify two arguments:
names_from
denotes which column in the original tibble contains the values of the new column names, -values_from
denotes which column in the original tibble contains the values for each cell in the new tibble.
The reason why the function is called pivot_wider()
is due to the fact that it will create a wider dataset than the orginal one, containing more columns.
library(tidyr)
<- animal_tidy %>% pivot_wider(names_from = year, values_from = weight)
animal_wide #untidy animal: wide
animal_wide #> # A tibble: 2 × 4
#> animal `2019` `2020` `2021`
#> <chr> <dbl> <dbl> <dbl>
#> 1 sheep 110 120 140
#> 2 pig NA 300 800
In animal_wide
, we have the columns names 2019
, 2020
, and 2021
coming from the year
variable, and the values 110
, 120
, 140
, NA
, 300
, and 800
from the weight
variable, both of which are in the original tibble animal_tidy
. The animal_wide
is clearly untidy, since neither the weight
nor the year
information is contained in a single column. Note that this data format is commonly encountered in practice.
As it is clear from the resulting tibble, the name weight
is lost during the pivoting process, which is not desirable. Fortunately, you can add a prefix “weight” to the column names via an argument names_prefix
in the pivot_wider()
function.
<- animal_tidy %>% pivot_wider(names_from = year,
animal_wide_weight names_prefix = "weight",
values_from = weight)
animal_wide_weight #> # A tibble: 2 × 4
#> animal weight2019 weight2020 weight2021
#> <chr> <dbl> <dbl> <dbl>
#> 1 sheep 110 120 140
#> 2 pig NA 300 800
7.1.2 Convert Column Names into Values
Now, you will learn how to tidy animal_wide
into a tidy data. To do this, you can use the pivot_longer()
function to convert the columns names 2019
, 2020
, and 2021
into values of a variable, for example, year
.
%>%
animal_wide pivot_longer(cols = -1,
names_to = "year",
values_to = "weight")
#> # A tibble: 6 × 3
#> animal year weight
#> <chr> <chr> <dbl>
#> 1 sheep 2019 110
#> 2 sheep 2020 120
#> 3 sheep 2021 140
#> 4 pig 2019 NA
#> 5 pig 2020 300
#> 6 pig 2021 800
In pivot_longer()
, cols
specifies the column names that you want to convert from, which accept the same format as that in dplyr::select()
introduced in Section 6.4. names_to
specifies the variable name you want to use for the column names. Finally, values_to
specifies the variable name for holding the values in the selected columns.
You can see that we have recovered the animal_tidy
through the tidy process. To tidy animal_wide_weight
, we can use the same function pivot_longer()
along with the argument names_prefix
as below.
%>%
animal_wide_weight pivot_longer(cols = -1,
names_to = "year",
names_prefix = "weight",
values_to = "weight")
#> # A tibble: 6 × 3
#> animal year weight
#> <chr> <chr> <dbl>
#> 1 sheep 2019 110
#> 2 sheep 2020 120
#> 3 sheep 2021 140
#> 4 pig 2019 NA
#> 5 pig 2020 300
#> 6 pig 2021 800
In this regards, pivot_wider()
and pivot_longer()
can be viewed as opposite functions.