2.2 Vectors: Storage Types, Attributes and Coercion
Having learned vectors in Section 2.1, we first introduce an important concept called storage types and use two commonly used numeric classes (integers and doubles) as examples, then introduce the concept of attributes from a named vector, and discuss the coercion rule when you combine values of different types into a single vector.
2.2.1 Storage Types
Having learned the numeric vector, character vector, and logical vector, it is time to introduce how they are stored in R. To find the internal storage type of an R object, you can use the typeof()
function. First, let’s see an example of numeric vector.
<- c(1, 3, 4)
my_double class(my_double) #class
#> [1] "numeric"
is.numeric(my_double)
#> [1] TRUE
typeof(my_double) #storage type
#> [1] "double"
We can see that the internal storage type of my_num
is double, meaning that my_num
is stored as a double precision numeric value. Looking at the values of my_num
, it is easy to see that they are all integers. You may be wondering it is necessary to store the integers in a double type. The answer is no. You can definitely store the integers in a integer type, which offers great memory savings compared to doubles. The tricky part is that you usually need to explicitly tell R that you are storing them as integers.
To create an integer vector, you can still use the c()
function with the integers separated by comma as arguments. However, you need to put an “L” after each integer. Let’s create an integer and check its typeof()
.
<- c(1L, 3L, 4L)
my_int typeof(my_int)
#> [1] "integer"
class(my_int)
#> [1] "integer"
is.numeric(my_double)
#> [1] TRUE
You can see that my_int
is indeed of integer
type, with the class
of it being integer
as well.
It is also worth noting that the displaying value of my_double
and my_int
are the same.
my_double#> [1] 1 3 4
my_int#> [1] 1 3 4
In addition to class()
and typeof()
, another useful function is str()
, which gives the detailed structure of an R object along with the first few values.
Whenever we have an R object, it is useful to apply class()
, typeof()
, and str()
on it.
str(my_double)
#> num [1:3] 1 3 4
str(my_int)
#> int [1:3] 1 3 4
From the str()
results, my_int
is stored as integers while my_double
is stored as double precision numeric values.
Despite the differences between integers and doubles, you can usually ignore their differences unless you are working on a very big data set. R will automatically convert objects between integers and doubles when necessary.
Let’s look at the internal storage type and structure of character and logical vectors.
<- c("pig", "monkey")
my_char <- c(TRUE, FALSE, TRUE)
my_logi typeof(my_char)
typeof(my_logi)
str(my_char)
str(my_logi)
The results are align with our expectations with character vectors stored as "character"
and logical vectors stored as "logical"
.
2.2.2 Named Vectors and Attributes
In addition to storing the values of a vector, you can also create named vectors. To do that, the first option is to give each element a name in the processing of creating the vector using the form of name = value
.
<- c(165, 60, 22)
x_wo_name
x_wo_name#> [1] 165 60 22
<- c(height = 165, weight = 60, BMI = 22)
x_w_name
x_w_name#> height weight BMI
#> 165 60 22
For a named vector, you can access its elements via the names, and update the values via the assignment operator.
"height"]
x_w_name[#> height
#> 165
"weight"] <- x_w_name["weight"] + 10
x_w_name[
x_w_name#> height weight BMI
#> 165 70 22
A second way to assign names to a vector is to use the names()
function. For example, if we want to represent whether it snows on each day using a logical vector.
<- c(TRUE, FALSE, TRUE)
y
y#> [1] TRUE FALSE TRUE
names(y) <- c("Jan 1", "Jan 2", "Jan 3")
y#> Jan 1 Jan 2 Jan 3
#> TRUE FALSE TRUE
Note that the assignment operation looks similar to the object assignment operation. The values for names need to be a character vector.
The names of a vector is a type of attributes of R Objects. We will introduce other types of attributes as we encounter them. The name attribute provides additional information regarding the meaning of each element, and enables us to extract values using the names (to be discussed in Section 2.6.3).
To examine the attributes of an R object, you can use the attributes()
function. The str()
also displays the attributes.
attributes(x_w_name)
#> $names
#> [1] "height" "weight" "BMI"
str(x_w_name)
#> Named num [1:3] 165 70 22
#> - attr(*, "names")= chr [1:3] "height" "weight" "BMI"
str(x_wo_name)
#> num [1:3] 165 60 22
You can see that x_w_name
is a named numeric vector, with the names attribute. In contrast, str()
function tells us x_wo_name
is a plain numeric vector with no attributes.
To directly extract certain attributes of an R object, you can use the attr()
function on it with the second argument being the specific attribute you wish to extract.
attr(x_w_name, "names")
#> [1] "height" "weight" "BMI"
2.2.3 The Coercion Rule
So far, you know that vectors are objects that have values of the same type, including numeric values (integers or doubles), strings, or logical values. But in practice, you may have values with a mix of different types. If you still want to combine them into a vector, R will unify all values into the most complex one, which is usually called the coercion rule. Specifically, R uses the following order of complexity (from simple to complex). \[\mbox{logical} < \mbox{numeric} < \mbox{character}\]
Let’s see a few examples to learn how the coercion works. The first example mixes logical values with numbers.
<- c(TRUE, 7, 24, FALSE)
mix_1
mix_1 #> [1] 1 7 24 0
typeof(mix_1)
#> [1] "double"
class(mix_1)
#> [1] "numeric"
You can see that the logical values are converted to numbers, in particular, TRUE
will be converted to 1 and FALSE
will be converted to 0 when they appear with numbers, that’s because numbers are more complex than logical values, and R will unify all values into the most complex one. Then you will see that mix_1
is a numeric vector with four numbers. This is the most commonly usage of coercion rule in R.
Besides the coercion rule which automatically converts all elements into the most complex type, you can also use functions to do the conversion manually. In particular, as.numeric()
converts its argument into numeric type. And as.logical()
converts its argument into logical values.
as.numeric(c(TRUE, FALSE))
#> [1] 1 0
as.logical(c(1, 0))
#> [1] TRUE FALSE
as.logical(c("TRUE", "FALSE"))
#> [1] TRUE FALSE
In addition to using the combine function c()
, the coercion rule also applies with you apply other operators or functions between two types of value.
typeof(7 * TRUE)
#> [1] "double"
typeof(5 ^ FALSE)
#> [1] "double"
paste(7, TRUE)
#> [1] "7 TRUE"
paste(c(7, TRUE), collapse = " ")
#> [1] "7 1"
It is worth noting the mechanism of paste()
. It automatically converts all the arguments into strings and then concatenate them. In the second paste()
code, the coercion occurred during the creation of c(7, TRUE)
, resulting in a 1
instead of TRUE
in the final string.
The second example mixes numbers with strings.
<- c(8, "happy", 26, "string")
mix_2
mix_2 #> [1] "8" "happy" "26" "string"
class(mix_2)
#> [1] "character"
class(paste(8, "happy"))
#> [1] "character"
You can see that both 8
and 26
are converted into strings since strings are more complex than numbers. Then mix_2
will be a character vector.
The same thing happens when you use the paste()
function with arguments being a number and a string.
To manually converts an input into a character type, you can use the as.character()
function.
as.character(1:5)
#> [1] "1" "2" "3" "4" "5"
The next example mixes logical values, numbers and strings.
<- c(16, TRUE, "pig")
mix_3
mix_3#> [1] "16" "TRUE" "pig"
class(mix_3)
#> [1] "character"
<- paste(16, TRUE, "pig")
mix_4
mix_4#> [1] "16 TRUE pig"
class(mix_4)
#> [1] "character"
You can see in both mix_3
and mix_4
, both 97
and TRUE
are converted to strings! That’s because values of character type are the most complex among all values.
Here, you can use as.character()
on the logical values.
as.character(c(TRUE, FALSE))
#> [1] "TRUE" "FALSE"
Next, let’s see an interesting example in which we have two layers of coercion.
<- c(c(16, TRUE), "pig")
mix_5
mix_5#> [1] "16" "1" "pig"
However, if you create another vector mix_4
, you first have c(16, TRUE)
which will be converted to c(16, 1)
since numbers are more complex than logical values. Then, c(16, 1)
will be converted to c("16", "1")
when you combine it with "pig"
, leading to the results of mix_4
.
Lastly, let’s talk about the coercion within numeric values. In particular, we have learned that there are two kinds of types numeric values are stored: namely integers
and doubles
. In the coercion rule, we have
\[\mbox{integer} < \mbox{double}.\]
Let’s see the following examples.
typeof(c(1, 5L))
#> [1] "double"
typeof(c(TRUE, 5L))
#> [1] "integer"
typeof(1 + 5L)
#> [1] "double"
Let’s now summarize the coercion rule of all types we have learned. \[\mbox{logical} < \mbox{integer} < \mbox{double} < \mbox{character}\]
2.2.4 Complex Vectors
Another vector classes in R is complex, which stores complex numbers.
<- c(1 + 2i, 3 + 4i, -3 - 4i)
my_complex
my_complex#> [1] 1+2i 3+4i -3-4i
class(my_complex)
#> [1] "complex"
typeof(my_complex)
#> [1] "complex"
str(my_complex)
#> cplx [1:3] 1+2i 3+4i -3-4i
You can use the functions Re()
, Im()
, and Mod()
to get the real part, imaginary part, and the modulus of the complex vector, respectively.
Re(my_complex)
#> [1] 1 3 -3
Im(my_complex)
#> [1] 2 4 -4
Mod(my_complex)
#> [1] 2.236068 5.000000 5.000000
2.2.5 Exercises
- After running the following code, what do you think are the storage types of
z
andw
? Explain the reason.
<- 5L
x <- 6L
y <- x + y
z <- x + 1.1 w
If there are 3 lions, 5 tigers, 7 birds and 2 monkeys in the zoo, please write R code to create a named numeric vector
zoo_1
to represent the information. Then, get its"names"
attribute.The zoo manager found out a tiger ran away. Use assignment operator to update the named vector
zoo_1
.Looking at the following codes without running in R, what are the storage types of
mix_1
,mix_2
,mix_3
,mix_4
,mix_5
,mix_6
, andmix_7
? Verify your answers by running the code in R and explain the reason.
<- 5L
int_1 <- 6L
int_2 <- 2
num_1 <- "pig"
char_1 <- TRUE
logi_1 <- int_1 + int_2
mix_1 <- int_1 + num_1
mix_2 <- int_1/int_2
mix_3 <- c(num_1, char_1)
mix_4 <- c(num_1, logi_1)
mix_5 <- c(num_1, char_1, logi_1)
mix_6 <- paste(int_1, logi_1) mix_7