7.2 Separate and Combine Columns via separate()
and unite()
To introduce the separate and and unite operation, let’s first create a tidy data set containing the weight and height of year 2019 to 2021 for a sheep and a pig.
library(tidyr)
library(r02pro)
<- rep(c("sheep","pig"), c(3,3))
animal <- rep(2019:2021, 2)
year <- c(110, 120, 140, NA, 300, 800)
weight <- c(2.2, 2.4, 2.7, 2, 2.1, 2.3)
height <- tibble(animal, year, weight, height)
animal_tidy
animal_tidy#> # A tibble: 6 × 4
#> animal year weight height
#> <chr> <int> <dbl> <dbl>
#> 1 sheep 2019 110 2.2
#> 2 sheep 2020 120 2.4
#> 3 sheep 2021 140 2.7
#> 4 pig 2019 NA 2
#> 5 pig 2020 300 2.1
#> 6 pig 2021 800 2.3
7.2.1 Combine Columns via unite()
Sometimes, we may want to combine multiple columns into a single one. To do this, you can use the unite()
function in the tidyr package.
In the unite()
function, you need to specify the name of the to-be created column (col
), the columns to be combined, and the separator between the columns (sep
).
<- animal_tidy %>%
animal_ratio unite(col = "ratio", c("weight", "height"), sep = "/") #unite the weight and height into the ratio
animal_ratio#> # A tibble: 6 × 3
#> animal year ratio
#> <chr> <int> <chr>
#> 1 sheep 2019 110/2.2
#> 2 sheep 2020 120/2.4
#> 3 sheep 2021 140/2.7
#> 4 pig 2019 NA/2
#> 5 pig 2020 300/2.1
#> 6 pig 2021 800/2.3
Now, the animal_ratio
has a new column named ratio
which is displayed as the weight divided by the height.
7.2.2 Separate One Column into Multiple Columns Using separate()
We know the unite()
function combine multiple columns into a single column. Sometimes, we need to do the opposite operation, namely separate one column into multiple ones. To do this, you can use the separate()
function.
Let’s look at the animal_ratio
, which is a untidy dataset, since the column ratio
contains the information of two variables. To tidy it, we can use the separate()
function. In the separate function, you need to specify
- col
: the name of the column to be separated
- into
: the desired names of the to-be created columns after separation
- sep
: the separator during the separation process
%>%
animal_ratio separate(col = "ratio", into = c("weight", "height"), sep = "/")
#> # A tibble: 6 × 4
#> animal year weight height
#> <chr> <int> <chr> <chr>
#> 1 sheep 2019 110 2.2
#> 2 sheep 2020 120 2.4
#> 3 sheep 2021 140 2.7
#> 4 pig 2019 NA 2
#> 5 pig 2020 300 2.1
#> 6 pig 2021 800 2.3
From the result, you can see that the original tidy tibble is recovered via the separate()
function. After a careful look, you may notice that the column types of weight
and height
are both character
, which may not be what we want. To convert the types automatically, you can add the convert = TRUE
argument in the separate()
function.
%>%
animal_ratio separate(col = "ratio", into = c("weight", "height"), sep = "/", convert = TRUE)
#> # A tibble: 6 × 4
#> animal year weight height
#> <chr> <int> <int> <dbl>
#> 1 sheep 2019 110 2.2
#> 2 sheep 2020 120 2.4
#> 3 sheep 2021 140 2.7
#> 4 pig 2019 NA 2
#> 5 pig 2020 300 2.1
#> 6 pig 2021 800 2.3
Now, weight
is interger and height
is double.
In addition to separating via the separator, you can also separate by the position. For example, to separate the year into century (first two digits) and 2-digit year (last 2 digits), you can separate the year
after position 2.
%>% separate(col = "year", into = c("y12","y34"), sep = 2, convert = TRUE)
animal_tidy #> # A tibble: 6 × 5
#> animal y12 y34 weight height
#> <chr> <int> <int> <dbl> <dbl>
#> 1 sheep 20 19 110 2.2
#> 2 sheep 20 20 120 2.4
#> 3 sheep 20 21 140 2.7
#> 4 pig 20 19 NA 2
#> 5 pig 20 20 300 2.1
#> 6 pig 20 21 800 2.3
You can also separate at multiple locations. For example, to separate the year into four columns, representing four digits, you can use multiple positions to separate.
<- animal_tidy %>% separate(col = "year", into = c("y1","y2","y3","y4"), sep = 1:3, convert = TRUE)
anmial_digits
anmial_digits#> # A tibble: 6 × 7
#> animal y1 y2 y3 y4 weight height
#> <chr> <int> <int> <int> <int> <dbl> <dbl>
#> 1 sheep 2 0 1 9 110 2.2
#> 2 sheep 2 0 2 0 120 2.4
#> 3 sheep 2 0 2 1 140 2.7
#> 4 pig 2 0 1 9 NA 2
#> 5 pig 2 0 2 0 300 2.1
#> 6 pig 2 0 2 1 800 2.3
Using the animal_digits
, you can also combine all the four digits into a column named year
.
%>% unite(col = "year", y1:y4, sep = "")
anmial_digits #> # A tibble: 6 × 4
#> animal year weight height
#> <chr> <chr> <dbl> <dbl>
#> 1 sheep 2019 110 2.2
#> 2 sheep 2020 120 2.4
#> 3 sheep 2021 140 2.7
#> 4 pig 2019 NA 2
#> 5 pig 2020 300 2.1
#> 6 pig 2021 800 2.3
Note that here, the seperator is set to be the empty string since we don’t want any separator within the four-digit year.