9.1 Concatenate and Subset Strings
9.1.1 Concatenate Strings
First, let’s introduce how to concatenate multiple strings into a single string. To do this, we will use the str_c()
function in the stringr package.
library(stringr)
str_c("I", "am", "learning", "R!") #equivalent to `paste0()`
#> [1] "IamlearningR!"
Similar to the paste()
function, you can specify a separator between the input strings.
str_c("I", "am", "learning", "R!", sep = " ")
#> [1] "I am learning R!"
str_c("apple", "banana", "cherry", sep = "&")
#> [1] "apple&banana&cherry"
The str_c()
can also accept vectors of length greater than one as its arguments, where it will conduct the concatenating operation elementwisely.
str_c("There are", 3:5, c("apples", "bananas", "cherries"), "in the kitchen.", sep = " ")
#> [1] "There are 3 apples in the kitchen."
#> [2] "There are 4 bananas in the kitchen."
#> [3] "There are 5 cherries in the kitchen."
As you can see from the result, the first and third string are recycled to match the length of the second string.
Next, let’s introduce how to concatenate the strings inside a chracter vector. First, let’s try to apply the str_c()
function on a character vector.
Apparently, we didn’t get the expected concatenated string. To do so, we need to add an argument collapse
as the separator.
str_c(fruits, collapse = "|")
#> [1] "apples|bananas|cherries"
str_c(fruits, collapse = "-")
#> [1] "apples-bananas-cherries"
When we have multiple character vectors as the arguments in the str_c()
function with the collapse
argument, the character vectors will first be concatenated respectively, and then the concatenated strings will be further concatenated with the separator as in the collapse
argument.
Here, the str_c()
function will first concatenate the fruits
vector with itself, and then concatenate the two concatenated strings with the separator |
.
9.1.2 Create and Modify Substrings
It is often of interest to create substrings from an existing string. A substring is a string formed by a consecutive sequence of characters within the existing string.
To get a substring, you can use the substr()
function. The start
and end
arguments represent the starting and ending position of the desired substrings.
Here, you will get a substring from position 1 to position 3 for each element of the original character vector. In addition to using the positive integer to represent the position, you can also use negative integer to represent the relative position from the last symbol.
Here, for each string, we extract the substring from position 1 to the second to last position. In addition to using one integer for start and end, you can also supply a vector of the same length as the input vector to extract multiple substrings at once.
Let’s look at the result together.
From the results, we can see that the substrings are extracted as follows:
- The first element is the first to the last position of
"apples"
. - The second element is the second to the second to last position of
"bananas"
. - The third element is the third to the third to last position of
"cherries"
.
Once we know how to create the substring, it is straightforward to modify the substing in place of the original string. Let’s try to update the substrings.
fruits_new <- fruits
str_sub(fruits_new, start = 1:3, end = (-1):(-3)) <- c("ns_1", "ns_2", "ns_3")
fruits_new
#> [1] "ns_1" "bns_2s" "chns_3es"
fruits
#> [1] "apples" "bananas" "cherries"
From the results, we can see the corresponding substrings are replaced by "ns_1", "ns_2", "ns_3"
, respectively.
9.1.3 Exercises
(String Concatenation) Create a character vector
fruits
with the elements"apple"
,"banana"
, and"cherry"
. Use thestr_c()
function to concatenate the elements in thefruits
vector with the separator" & "
.(Create Substrings) Create a character vector
colors
with the elements"red"
,"green"
,"blue"
,"yellow"
, and"purple"
. Use thestr_sub()
function to extract the substrings from the second to the fourth position of each element in thecolors
vector.(Update Substring) Create a character vector
animals
with the elements"cat"
,"dog"
,"elephant"
, and"giraffe"
. Use thestr_sub()
function to update the substrings from the second to the fourth position of each element in theanimals
vector to"at"
,"og"
,"leph"
, and"iraf"
, respectively. Verify the contents.