7.2 Separate and Combine Columns via separate() and unite()

To introduce the separate and and unite operation, let’s first create a tidy data set containing the weight and height of year 2019 to 2021 for a sheep and a pig.

library(tidyr)
library(r02pro)
animal <- rep(c("sheep","pig"), c(3,3))
year <- rep(2019:2021, 2)
weight <- c(110, 120, 140, NA, 300, 800)
height <- c(2.2, 2.4, 2.7, 2, 2.1, 2.3)
animal_tidy <- tibble(animal, year, weight, height)
animal_tidy
#> # A tibble: 6 × 4
#>   animal  year weight height
#>   <chr>  <int>  <dbl>  <dbl>
#> 1 sheep   2019    110    2.2
#> 2 sheep   2020    120    2.4
#> 3 sheep   2021    140    2.7
#> 4 pig     2019     NA    2  
#> 5 pig     2020    300    2.1
#> 6 pig     2021    800    2.3

7.2.1 Combine Columns via unite()

Sometimes, we may want to combine multiple columns into a single one. To do this, you can use the unite() function in the tidyr package.

In the unite() function, you need to specify the name of the to-be created column (col), the columns to be combined, and the separator between the columns (sep).

animal_ratio <- animal_tidy %>% 
  unite(col = "ratio", c("weight", "height"), sep = "/") #unite the weight and height into the ratio
animal_ratio
#> # A tibble: 6 × 3
#>   animal  year ratio  
#>   <chr>  <int> <chr>  
#> 1 sheep   2019 110/2.2
#> 2 sheep   2020 120/2.4
#> 3 sheep   2021 140/2.7
#> 4 pig     2019 NA/2   
#> 5 pig     2020 300/2.1
#> 6 pig     2021 800/2.3

Now, the animal_ratio has a new column named ratio which is displayed as the weight divided by the height.

7.2.2 Separate One Column into Multiple Columns Using separate()

We know the unite() function combine multiple columns into a single column. Sometimes, we need to do the opposite operation, namely separate one column into multiple ones. To do this, you can use the separate() function.

Let’s look at the animal_ratio, which is a untidy dataset, since the column ratio contains the information of two variables. To tidy it, we can use the separate() function. In the separate function, you need to specify - col: the name of the column to be separated - into: the desired names of the to-be created columns after separation - sep: the separator during the separation process

animal_ratio %>% 
  separate(col = "ratio", into = c("weight", "height"), sep = "/")
#> # A tibble: 6 × 4
#>   animal  year weight height
#>   <chr>  <int> <chr>  <chr> 
#> 1 sheep   2019 110    2.2   
#> 2 sheep   2020 120    2.4   
#> 3 sheep   2021 140    2.7   
#> 4 pig     2019 NA     2     
#> 5 pig     2020 300    2.1   
#> 6 pig     2021 800    2.3

From the result, you can see that the original tidy tibble is recovered via the separate() function. After a careful look, you may notice that the column types of weight and height are both character, which may not be what we want. To convert the types automatically, you can add the convert = TRUE argument in the separate() function.

animal_ratio %>% 
  separate(col = "ratio", into = c("weight", "height"), sep = "/", convert = TRUE)
#> # A tibble: 6 × 4
#>   animal  year weight height
#>   <chr>  <int>  <int>  <dbl>
#> 1 sheep   2019    110    2.2
#> 2 sheep   2020    120    2.4
#> 3 sheep   2021    140    2.7
#> 4 pig     2019     NA    2  
#> 5 pig     2020    300    2.1
#> 6 pig     2021    800    2.3

Now, weight is interger and height is double.

In addition to separating via the separator, you can also separate by the position. For example, to separate the year into century (first two digits) and 2-digit year (last 2 digits), you can separate the year after position 2.

animal_tidy %>% separate(col = "year", into = c("y12","y34"), sep = 2, convert = TRUE)
#> # A tibble: 6 × 5
#>   animal   y12   y34 weight height
#>   <chr>  <int> <int>  <dbl>  <dbl>
#> 1 sheep     20    19    110    2.2
#> 2 sheep     20    20    120    2.4
#> 3 sheep     20    21    140    2.7
#> 4 pig       20    19     NA    2  
#> 5 pig       20    20    300    2.1
#> 6 pig       20    21    800    2.3

You can also separate at multiple locations. For example, to separate the year into four columns, representing four digits, you can use multiple positions to separate.

anmial_digits <- animal_tidy %>% separate(col = "year", into = c("y1","y2","y3","y4"), sep = 1:3, convert = TRUE)
anmial_digits
#> # A tibble: 6 × 7
#>   animal    y1    y2    y3    y4 weight height
#>   <chr>  <int> <int> <int> <int>  <dbl>  <dbl>
#> 1 sheep      2     0     1     9    110    2.2
#> 2 sheep      2     0     2     0    120    2.4
#> 3 sheep      2     0     2     1    140    2.7
#> 4 pig        2     0     1     9     NA    2  
#> 5 pig        2     0     2     0    300    2.1
#> 6 pig        2     0     2     1    800    2.3

Using the animal_digits, you can also combine all the four digits into a column named year.

anmial_digits %>% unite(col = "year", y1:y4, sep = "")
#> # A tibble: 6 × 4
#>   animal year  weight height
#>   <chr>  <chr>  <dbl>  <dbl>
#> 1 sheep  2019     110    2.2
#> 2 sheep  2020     120    2.4
#> 3 sheep  2021     140    2.7
#> 4 pig    2019      NA    2  
#> 5 pig    2020     300    2.1
#> 6 pig    2021     800    2.3

Note that here, the seperator is set to be the empty string since we don’t want any separator within the four-digit year.