3.4 Tibble

Having learned data frames in Section 3.3, we would like to introduce a modern version of data frame, named tibbles. Tibbles are data frames with modifications for easier coding. To use the tibble class, you need to install the tibble package, which is part of the tidyverse package.

install.packages("tibble")

3.4.1 Introduction to tibbles

After installing the tibble package, you can load the package and create a tibble using the tibble() function using all vectors as its argument, a process similar to how you create a data frame.

library(tibble)
animal <- rep(c("sheep", "pig"), c(3, 3))
year <- rep(2019:2021, 2)
healthy <- c(rep(TRUE, 5), FALSE)
my_tibble <- tibble(animal, year, healthy)
my_tibble
#> # A tibble: 6 × 3
#>   animal  year healthy
#>   <chr>  <int> <lgl>  
#> 1 sheep   2019 TRUE   
#> 2 sheep   2020 TRUE   
#> 3 sheep   2021 TRUE   
#> 4 pig     2019 TRUE   
#> 5 pig     2020 TRUE   
#> 6 pig     2021 FALSE

Another way to create a tibble is using the as_tibble() function on an already-created data frame.

my_data_frame <- data.frame(animal, year, healthy)
as_tibble(my_data_frame)
#> # A tibble: 6 × 3
#>   animal  year healthy
#>   <chr>  <int> <lgl>  
#> 1 sheep   2019 TRUE   
#> 2 sheep   2020 TRUE   
#> 3 sheep   2021 TRUE   
#> 4 pig     2019 TRUE   
#> 5 pig     2020 TRUE   
#> 6 pig     2021 FALSE

From the output, we can see that, under the columns’ names, their types are also shown, which is very helpful. Another useful feature of tibble compare to data frame is that when you check its value, the output only shows at most the first 10 rows and the number of columns that can fit the output window, which avoids the console to be overcrowded.

x <- 1:1e+05
tibble(id = x, value = sin(x))
#> # A tibble: 100,000 × 2
#>       id  value
#>    <int>  <dbl>
#>  1     1  0.841
#>  2     2  0.909
#>  3     3  0.141
#>  4     4 -0.757
#>  5     5 -0.959
#>  6     6 -0.279
#>  7     7  0.657
#>  8     8  0.989
#>  9     9  0.412
#> 10    10 -0.544
#> # ℹ 99,990 more rows

Be prepared that your console output will be flooded with numbers before running the following code.

data.frame(id = x, value = sin(x))

Once we have a tibble, let’s learn its class and structure.

class(my_tibble)
#> [1] "tbl_df"     "tbl"        "data.frame"
str(my_tibble)
#> tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ animal : chr [1:6] "sheep" "sheep" "sheep" "pig" ...
#>  $ year   : int [1:6] 2019 2020 2021 2019 2020 2021
#>  $ healthy: logi [1:6] TRUE TRUE TRUE TRUE TRUE FALSE

From the result, you can see that in addition to "data.frame", the tibble also has classes of "tbl_df" and "tbl", which contain many useful functions. We will be using tibbles extensively throughout the rest of book due to its advantages over the original data frames.

For your convenience, we’ve summarized below the different variables types a tibble can include.

Type	Section
`<chr>`	character vector
`<int>`	integer
`<dbl>`	double
`<ord>`	ordered factor
`<fct>`	unordered factor
`<lgl>`	logical vector
`<date>`	dates
`<dttm>`	date-times

Since tibble belongs to data frame, all the functions we learned for data frames including adding observations or variables, and subsetting operations can be used in the exact same format. However, the tibble class offers additional functions which makes some tasks easier.

3.4.2 Adding Observations or Variables in Tibbles

In a tibble, adding observations has an easier method than that in a data frame, via the add_row() function in the tibble package.

add_row(my_tibble, animal = "pig", year = c(2017, 2018), healthy = TRUE)
#> # A tibble: 8 × 3
#>   animal  year healthy
#>   <chr>  <dbl> <lgl>  
#> 1 sheep   2019 TRUE   
#> 2 sheep   2020 TRUE   
#> 3 sheep   2021 TRUE   
#> 4 pig     2019 TRUE   
#> 5 pig     2020 TRUE   
#> 6 pig     2021 FALSE  
#> 7 pig     2017 TRUE   
#> 8 pig     2018 TRUE

From the results, we can see that multiple rows can be added at the same time by specifying the corresponding values for each variable name. Note the recycling rule applies for other variables with only one value specified.

To add an additional variable, in addition to using the $ operator followed by a name as in data frames, you can also use the function add_column().

add_column(my_tibble, weight = c(110, 120, 140, NA, 300, 800), height = c(2.2, 2.4,
    2.7, 2, 2.1, 2.3))
#> # A tibble: 6 × 5
#>   animal  year healthy weight height
#>   <chr>  <int> <lgl>    <dbl>  <dbl>
#> 1 sheep   2019 TRUE       110    2.2
#> 2 sheep   2020 TRUE       120    2.4
#> 3 sheep   2021 TRUE       140    2.7
#> 4 pig     2019 TRUE        NA    2  
#> 5 pig     2020 TRUE       300    2.1
#> 6 pig     2021 FALSE      800    2.3

3.4.3 Tibble subsetting and modifying

While the tibble subsetting and modifying is very similar to those for data frame, we would like to point out a key difference.

First of all, when you use the [ and ] to do tibble subsetting, it always returns a tibble by default, even if only one column is selected. This behavior is different from subsetting data frames using [ and ]. If you are particularly interested in selecting only one column and returning it as a vector, you need to add drop = TRUE in the subsetting process. You can also subset a single row and convert it into a vector by adding the same argument.

my_tibble[, 1]  #6*1 tibble
#> # A tibble: 6 × 1
#>   animal
#>   <chr> 
#> 1 sheep 
#> 2 sheep 
#> 3 sheep 
#> 4 pig   
#> 5 pig   
#> 6 pig
my_data_frame[, 1]  #vector
#> [1] "sheep" "sheep" "sheep" "pig"   "pig"   "pig"
my_tibble[, 1, drop = TRUE]  #vector
#> [1] "sheep" "sheep" "sheep" "pig"   "pig"   "pig"

3.4.4 Exercises

Consider the following tibble,

animal <- rep(c("sheep", "pig"), c(3, 3))
weight <- c(110, NA, 140, NA, 300, 800)
condition <- c("excellent", "good", NA, "excellent", "good", "average")
healthy <- c(rep(TRUE, 5), FALSE)
my_tibble <- tibble(animal, weight, condition, healthy)
my_data_frame <- data.frame(animal, weight, condition, healthy)
my_tibble
#> # A tibble: 6 × 4
#>   animal weight condition healthy
#>   <chr>   <dbl> <chr>     <lgl>  
#> 1 sheep     110 excellent TRUE   
#> 2 sheep      NA good      TRUE   
#> 3 sheep     140 <NA>      TRUE   
#> 4 pig        NA excellent TRUE   
#> 5 pig       300 good      TRUE   
#> 6 pig       800 average   FALSE

Add the following observation to my_tibble: animal = "pig", weight = 900, condition = average, and healthy = FALSE.
Without running in R, what do you think are the difference between my_tibble[, 1] and my_data_frame[, 1]? How can you reproduce my_data_frame[, 1] by subsetting my_tibble?