2.2 Vectors: Storage Types, Attributes and Coercion

Having learned vectors in Section 2.1, we first introduce an important concept called storage types and use two commonly used numeric classes (integers and doubles) as examples, then introduce the concept of attributes from a named vector, and discuss the coercion rule when you combine values of different types into a single vector.

2.2.1 Storage Types

Having learned the numeric vector, character vector, and logical vector, it is time to introduce how they are stored in R. To find the internal storage type of an R object, you can use the typeof() function. First, let’s see an example of numeric vector.

my_double <- c(1, 3, 4)
class(my_double)          #class
#> [1] "numeric"
is.numeric(my_double)
#> [1] TRUE
typeof(my_double)         #storage type
#> [1] "double"

We can see that the internal storage type of my_num is double, meaning that my_num is stored as a double precision numeric value. Looking at the values of my_num, it is easy to see that they are all integers. You may be wondering it is necessary to store the integers in a double type. The answer is no. You can definitely store the integers in a integer type, which offers great memory savings compared to doubles. The tricky part is that you usually need to explicitly tell R that you are storing them as integers.

To create an integer vector, you can still use the c() function with the integers separated by comma as arguments. However, you need to put an “L” after each integer. Let’s create an integer and check its typeof().

my_int <- c(1L, 3L, 4L)
typeof(my_int)
#> [1] "integer"
class(my_int)
#> [1] "integer"
is.numeric(my_double)
#> [1] TRUE

You can see that my_int is indeed of integer type, with the class of it being integer as well.

It is also worth noting that the displaying value of my_double and my_int are the same.

my_double
#> [1] 1 3 4
my_int
#> [1] 1 3 4

In addition to class() and typeof(), another useful function is str(), which gives the detailed structure of an R object along with the first few values.

Whenever we have an R object, it is useful to apply class(), typeof(), and str() on it.

str(my_double)
#>  num [1:3] 1 3 4
str(my_int)
#>  int [1:3] 1 3 4

From the str() results, my_int is stored as integers while my_double is stored as double precision numeric values.

Despite the differences between integers and doubles, you can usually ignore their differences unless you are working on a very big data set. R will automatically convert objects between integers and doubles when necessary.

Let’s look at the internal storage type and structure of character and logical vectors.

my_char <- c("pig", "monkey")
my_logi <- c(TRUE, FALSE, TRUE)
typeof(my_char)
typeof(my_logi)
str(my_char)
str(my_logi)

The results are align with our expectations with character vectors stored as "character" and logical vectors stored as "logical".

2.2.2 Named Vectors and Attributes

In addition to storing the values of a vector, you can also create named vectors. To do that, the first option is to give each element a name in the processing of creating the vector using the form of name = value.

x_wo_name <- c(165, 60, 22)
x_wo_name
#> [1] 165  60  22
x_w_name <- c(height = 165, weight = 60, BMI = 22)
x_w_name
#> height weight    BMI 
#>    165     60     22

For a named vector, you can access its elements via the names, and update the values via the assignment operator.

x_w_name["height"]
#> height 
#>    165
x_w_name["weight"] <- x_w_name["weight"] + 10
x_w_name
#> height weight    BMI 
#>    165     70     22

A second way to assign names to a vector is to use the names() function. For example, if we want to represent whether it snows on each day using a logical vector.

y <- c(TRUE, FALSE, TRUE)
y
#> [1]  TRUE FALSE  TRUE
names(y) <- c("Jan 1", "Jan 2", "Jan 3")
y
#> Jan 1 Jan 2 Jan 3 
#>  TRUE FALSE  TRUE

Note that the assignment operation looks similar to the object assignment operation. The values for names need to be a character vector.

The names of a vector is a type of attributes of R Objects. We will introduce other types of attributes as we encounter them. The name attribute provides additional information regarding the meaning of each element, and enables us to extract values using the names (to be discussed in Section 2.6.3).

To examine the attributes of an R object, you can use the attributes() function. The str() also displays the attributes.

attributes(x_w_name)
#> $names
#> [1] "height" "weight" "BMI"
str(x_w_name)
#>  Named num [1:3] 165 70 22
#>  - attr(*, "names")= chr [1:3] "height" "weight" "BMI"
str(x_wo_name)
#>  num [1:3] 165 60 22

You can see that x_w_name is a named numeric vector, with the names attribute. In contrast, str() function tells us x_wo_name is a plain numeric vector with no attributes.

To directly extract certain attributes of an R object, you can use the attr() function on it with the second argument being the specific attribute you wish to extract.

attr(x_w_name, "names")
#> [1] "height" "weight" "BMI"

2.2.3 The Coercion Rule

So far, you know that vectors are objects that have values of the same type, including numeric values (integers or doubles), strings, or logical values. But in practice, you may have values with a mix of different types. If you still want to combine them into a vector, R will unify all values into the most complex one, which is usually called the coercion rule. Specifically, R uses the following order of complexity (from simple to complex). \[\mbox{logical} < \mbox{numeric} < \mbox{character}\]

Let’s see a few examples to learn how the coercion works. The first example mixes logical values with numbers.

mix_1 <- c(TRUE, 7, 24, FALSE)
mix_1 
#> [1]  1  7 24  0
typeof(mix_1)
#> [1] "double"
class(mix_1)
#> [1] "numeric"

You can see that the logical values are converted to numbers, in particular, TRUE will be converted to 1 and FALSE will be converted to 0 when they appear with numbers, that’s because numbers are more complex than logical values, and R will unify all values into the most complex one. Then you will see that mix_1 is a numeric vector with four numbers. This is the most commonly usage of coercion rule in R.

Besides the coercion rule which automatically converts all elements into the most complex type, you can also use functions to do the conversion manually. In particular, as.numeric() converts its argument into numeric type. And as.logical() converts its argument into logical values.

as.numeric(c(TRUE, FALSE))
#> [1] 1 0
as.logical(c(1, 0))
#> [1]  TRUE FALSE
as.logical(c("TRUE", "FALSE"))
#> [1]  TRUE FALSE

In addition to using the combine function c(), the coercion rule also applies with you apply other operators or functions between two types of value.

typeof(7 * TRUE)
#> [1] "double"
typeof(5 ^ FALSE)
#> [1] "double"
paste(7, TRUE)
#> [1] "7 TRUE"
paste(c(7, TRUE), collapse = " ")
#> [1] "7 1"

It is worth noting the mechanism of paste(). It automatically converts all the arguments into strings and then concatenate them. In the second paste() code, the coercion occurred during the creation of c(7, TRUE), resulting in a 1 instead of TRUE in the final string.

The second example mixes numbers with strings.

mix_2 <- c(8, "happy", 26, "string")
mix_2 
#> [1] "8"      "happy"  "26"     "string"
class(mix_2)
#> [1] "character"
class(paste(8, "happy"))
#> [1] "character"

You can see that both 8 and 26 are converted into strings since strings are more complex than numbers. Then mix_2 will be a character vector. The same thing happens when you use the paste() function with arguments being a number and a string.

To manually converts an input into a character type, you can use the as.character() function.

as.character(1:5)
#> [1] "1" "2" "3" "4" "5"

The next example mixes logical values, numbers and strings.

mix_3 <- c(16, TRUE, "pig")
mix_3
#> [1] "16"   "TRUE" "pig"
class(mix_3)
#> [1] "character"
mix_4 <- paste(16, TRUE, "pig")
mix_4
#> [1] "16 TRUE pig"
class(mix_4)
#> [1] "character"

You can see in both mix_3 and mix_4, both 97 and TRUE are converted to strings! That’s because values of character type are the most complex among all values.

Here, you can use as.character() on the logical values.

as.character(c(TRUE, FALSE))
#> [1] "TRUE"  "FALSE"

Next, let’s see an interesting example in which we have two layers of coercion.

mix_5 <- c(c(16, TRUE), "pig")
mix_5
#> [1] "16"  "1"   "pig"

However, if you create another vector mix_4, you first have c(16, TRUE) which will be converted to c(16, 1) since numbers are more complex than logical values. Then, c(16, 1) will be converted to c("16", "1") when you combine it with "pig", leading to the results of mix_4.

Lastly, let’s talk about the coercion within numeric values. In particular, we have learned that there are two kinds of types numeric values are stored: namely integers and doubles. In the coercion rule, we have \[\mbox{integer} < \mbox{double}.\]

Let’s see the following examples.

typeof(c(1, 5L))
#> [1] "double"
typeof(c(TRUE, 5L))
#> [1] "integer"
typeof(1 + 5L)
#> [1] "double"

Let’s now summarize the coercion rule of all types we have learned. \[\mbox{logical} < \mbox{integer} < \mbox{double} < \mbox{character}\]

2.2.4 Complex Vectors

Another vector classes in R is complex, which stores complex numbers.

my_complex <- c(1 + 2i, 3 + 4i, -3 - 4i)
my_complex
#> [1]  1+2i  3+4i -3-4i
class(my_complex)
#> [1] "complex"
typeof(my_complex)
#> [1] "complex"
str(my_complex)
#>  cplx [1:3] 1+2i 3+4i -3-4i

You can use the functions Re(), Im(), and Mod() to get the real part, imaginary part, and the modulus of the complex vector, respectively.

Re(my_complex)
#> [1]  1  3 -3
Im(my_complex)
#> [1]  2  4 -4
Mod(my_complex)
#> [1] 2.236068 5.000000 5.000000

2.2.5 Exercises

  1. After running the following code, what do you think are the storage types of z and w? Explain the reason.
x <- 5L
y <- 6L
z <- x + y 
w <- x + 1.1
  1. If there are 3 lions, 5 tigers, 7 birds and 2 monkeys in the zoo, please write R code to create a named numeric vector zoo_1 to represent the information. Then, get its "names" attribute.

  2. The zoo manager found out a tiger ran away. Use assignment operator to update the named vector zoo_1.

  3. Looking at the following codes without running in R, what are the storage types of mix_1, mix_2, mix_3, mix_4, mix_5, mix_6, and mix_7? Verify your answers by running the code in R and explain the reason.

int_1 <- 5L
int_2 <- 6L
num_1 <- 2
char_1 <- "pig"
logi_1 <- TRUE
mix_1 <- int_1 + int_2 
mix_2 <- int_1 + num_1
mix_3 <- int_1/int_2
mix_4 <- c(num_1, char_1)
mix_5 <- c(num_1, logi_1)
mix_6 <- c(num_1, char_1, logi_1)
mix_7 <- paste(int_1, logi_1)