8.1 Concatenate and Subset Strings

8.1.1 Concatenate Strings

First, let’s introduce how to concatenate multiple strings into a single string. To do this, we will use the str_c() function in the stringr package.

library(stringr)
str_c("sheep", "pig", "monkey")    #equivalent to `paste0()`
#> [1] "sheeppigmonkey"

Similar to the paste() function, you can specify a separator between the input strings.

str_c("sheep", "pig", "monkey", sep = " ")
#> [1] "sheep pig monkey"
str_c("sheep", "pig", "monkey", sep = "&")
#> [1] "sheep&pig&monkey"

The str_c() can also accept vectors of length greater than one as its arguments, where it will conduct the concatenating operation elementwisely.

str_c("There are ", c("sheep", "pig", "monkey"), "s in the zoo.") 
#> [1] "There are sheeps in the zoo."  "There are pigs in the zoo."   
#> [3] "There are monkeys in the zoo."

As you can see from the result, the first and third string are recycled to match the length of the second string.

Next, let’s introduce how to concatenate the strings inside a chracter vector. First, let’s try to apply the str_c() function on a character vector.

animals <- c("sheep", "pig", "monkey")
str_c(animals)
#> [1] "sheep"  "pig"    "monkey"

Apparently, we didn’t get the expected concatenated string. To do so, we need to add an argument collapse as the separator.

str_c(animals, collapse = "|")
#> [1] "sheep|pig|monkey"
str_c(animals, collapse = "-")
#> [1] "sheep-pig-monkey"

When we have multiple character vectors as the arguments in the str_c() function with the collapse argument, the character vectors will first be concatenated respectively, and then the concatenated strings will be further concatenated with the separator as in the collapse argument.

str_c(animals, animals, collapse = "|")
#> [1] "sheepsheep|pigpig|monkeymonkey"
str_c(animals, collapse = "-")
#> [1] "sheep-pig-monkey"

8.1.2 Create and Modify Substrings

It is often of interest to create substrings from an existing string. A substring is a string formed by a consecutive sequence of characters within the existing string.

To get a substring, you can use the substr() function. The start and end arguments represent the starting and ending position of the desired substrings.

str_sub(animals, start = 1, end = 3)
#> [1] "she" "pig" "mon"

Here, you will get a substring from position 1 to poisition 3 for each element of the original character vector. In addition to using the positive integer to represent the position, you can also use negative integer to represent the relative position from the last symbol.

str_sub(animals, start = 1, end = -2)
#> [1] "shee"  "pi"    "monke"

Here, for each string, we extract the substring from position 1 to the second to last position. In addition to using one integer for start and end, you can also supply a vector of the same length as the string to be subsetted.

str_sub(animals, start = 1:3, end = (-1):(-3))
#> [1] "sheep" "i"     "nk"

Let’s look at the result together.

  • The first element is the first position to the last position of "sheep".
  • The second element is the second position to the second to last position of "pig".
  • The third element is the third position to the third to last position of "monkey".

Once we know how to create the substring, it is straightforward to modify the substing in place of the original string. Let’s try to update the substrings.

animals
str_sub(animals, start = 1:3, end = (-1):(-3)) <- 
        c("ns_1", "ns_2", "ns_3")
animals

From the results, we can see the corresponding substrings are replaced by "ns_1", "ns_2", "ns_3", respectively.