7.1 Convert Between Names and Values

First, let’s create an artificial dataset which contains the weights of a sheep and a pig for years 2019, 2020, and 2021.

library(tibble)
animal <- rep(c("sheep","pig"), c(3,3))
year <- rep(2019:2021, 2)
weight <- c(110, 120, 140, NA, 300, 800)
animal_tidy <- tibble(animal, year, weight)
animal_tidy
#> # A tibble: 6 × 3
#>   animal  year weight
#>   <chr>  <int>  <dbl>
#> 1 sheep   2019    110
#> 2 sheep   2020    120
#> 3 sheep   2021    140
#> 4 pig     2019     NA
#> 5 pig     2020    300
#> 6 pig     2021    800

By checking the definition of tidy data, it is clear animal_tidy is indeed tidy. Let’s make it untidy.

7.1.1 Convert Values into Column Names

In animal_tidy, each row contains the year when the weight measurement was taken. Suppose we want to convert the year value into column names. You can use the pivot_wider() function in tidyr package, another member of the tidyverse package.

In pivot_wider(), you need to specify two arguments:

  • names_from denotes which column in the original tibble contains the values of the new column names, - values_from denotes which column in the original tibble contains the values for each cell in the new tibble.

The reason why the function is called pivot_wider() is due to the fact that it will create a wider dataset than the orginal one, containing more columns.

library(tidyr)
animal_wide <- animal_tidy %>% pivot_wider(names_from = year,                                  values_from = weight) 
animal_wide   #untidy animal: wide
#> # A tibble: 2 × 4
#>   animal `2019` `2020` `2021`
#>   <chr>   <dbl>  <dbl>  <dbl>
#> 1 sheep     110    120    140
#> 2 pig        NA    300    800

In animal_wide, we have the columns names 2019, 2020, and 2021 coming from the year variable, and the values 110, 120, 140, NA, 300, and 800 from the weight variable, both of which are in the original tibble animal_tidy. The animal_wide is clearly untidy, since neither the weight nor the year information is contained in a single column. Note that this data format is commonly encountered in practice.

As it is clear from the resulting tibble, the name weight is lost during the pivoting process, which is not desirable. Fortunately, you can add a prefix “weight” to the column names via an argument names_prefix in the pivot_wider() function.

animal_wide_weight <- animal_tidy %>% pivot_wider(names_from = year,
            names_prefix = "weight",
            values_from = weight) 
animal_wide_weight   
#> # A tibble: 2 × 4
#>   animal weight2019 weight2020 weight2021
#>   <chr>       <dbl>      <dbl>      <dbl>
#> 1 sheep         110        120        140
#> 2 pig            NA        300        800

7.1.2 Convert Column Names into Values

Now, you will learn how to tidy animal_wide into a tidy data. To do this, you can use the pivot_longer() function to convert the columns names 2019, 2020, and 2021 into values of a variable, for example, year.

animal_wide %>% 
  pivot_longer(cols = -1, 
               names_to = "year", 
               values_to = "weight")
#> # A tibble: 6 × 3
#>   animal year  weight
#>   <chr>  <chr>  <dbl>
#> 1 sheep  2019     110
#> 2 sheep  2020     120
#> 3 sheep  2021     140
#> 4 pig    2019      NA
#> 5 pig    2020     300
#> 6 pig    2021     800

In pivot_longer(), cols specifies the column names that you want to convert from, which accept the same format as that in dplyr::select() introduced in Section 6.4. names_to specifies the variable name you want to use for the column names. Finally, values_to specifies the variable name for holding the values in the selected columns. You can see that we have recovered the animal_tidy through the tidy process. To tidy animal_wide_weight, we can use the same function pivot_longer() along with the argument names_prefix as below.

animal_wide_weight %>% 
  pivot_longer(cols = -1, 
               names_to = "year",
               names_prefix = "weight", 
               values_to = "weight")
#> # A tibble: 6 × 3
#>   animal year  weight
#>   <chr>  <chr>  <dbl>
#> 1 sheep  2019     110
#> 2 sheep  2020     120
#> 3 sheep  2021     140
#> 4 pig    2019      NA
#> 5 pig    2020     300
#> 6 pig    2021     800

In this regards, pivot_wider() and pivot_longer() can be viewed as opposite functions.