3.4 Tibble
Having learned data frames in Section 3.3, we would like to introduce a modern version of data frame, named tibbles. Tibbles are data frames with modifications for easier coding. To use the tibble class, you need to install the tibble package, which is part of the tidyverse package.
3.4.1 Introduction to tibbles
After installing the tibble package, you can load the package and create a tibble using the tibble()
function using all vectors as its argument, a process similar to how you create a data frame.
library(tibble)
animal <- rep(c("sheep", "pig"), c(3, 3))
year <- rep(2019:2021, 2)
healthy <- c(rep(TRUE, 5), FALSE)
my_tibble <- tibble(animal, year, healthy)
my_tibble
#> # A tibble: 6 × 3
#> animal year healthy
#> <chr> <int> <lgl>
#> 1 sheep 2019 TRUE
#> 2 sheep 2020 TRUE
#> 3 sheep 2021 TRUE
#> 4 pig 2019 TRUE
#> 5 pig 2020 TRUE
#> 6 pig 2021 FALSE
Another way to create a tibble is using the as_tibble()
function on an already-created data frame.
my_data_frame <- data.frame(animal, year, healthy)
as_tibble(my_data_frame)
#> # A tibble: 6 × 3
#> animal year healthy
#> <chr> <int> <lgl>
#> 1 sheep 2019 TRUE
#> 2 sheep 2020 TRUE
#> 3 sheep 2021 TRUE
#> 4 pig 2019 TRUE
#> 5 pig 2020 TRUE
#> 6 pig 2021 FALSE
From the output, we can see that, under the columns’ names, their types are also shown, which is very helpful. Another useful feature of tibble compare to data frame is that when you check its value, the output only shows at most the first 10 rows and the number of columns that can fit the output window, which avoids the console to be overcrowded.
x <- 1:1e+05
tibble(id = x, value = sin(x))
#> # A tibble: 100,000 × 2
#> id value
#> <int> <dbl>
#> 1 1 0.841
#> 2 2 0.909
#> 3 3 0.141
#> 4 4 -0.757
#> 5 5 -0.959
#> 6 6 -0.279
#> 7 7 0.657
#> 8 8 0.989
#> 9 9 0.412
#> 10 10 -0.544
#> # … with 99,990 more rows
Be prepared that your console output will be flooded with numbers before running the following code.
Once we have a tibble, let’s learn its class and structure.
class(my_tibble)
#> [1] "tbl_df" "tbl" "data.frame"
str(my_tibble)
#> tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
#> $ animal : chr [1:6] "sheep" "sheep" "sheep" "pig" ...
#> $ year : int [1:6] 2019 2020 2021 2019 2020 2021
#> $ healthy: logi [1:6] TRUE TRUE TRUE TRUE TRUE FALSE
From the result, you can see that in addition to "data.frame"
, the tibble also has classes of "tbl_df"
and "tbl"
, which contain many useful functions. We will be using tibbles extensively throughout the rest of book due to its advantages over the original data frames.
For your convenience, we’ve summarized below the different variables types a tibble can include.
Type | Section |
---|---|
<chr> |
character vector |
<int> |
integer |
<dbl> |
double |
<ord> |
ordered factor |
<fct> |
unordered factor |
<lgl> |
logical vector |
<date> |
dates |
<dttm> |
date-times |
Since tibble belongs to data frame, all the functions we learned for data frames including adding observations or variables, and subsetting operations can be used in the exact same format. However, the tibble
class offers additional functions which makes some tasks easier.
3.4.2 Adding Observations or Variables in Tibbles
In a tibble, adding observations has an easier method than that in a data frame, via the add_row()
function in the tibble package.
add_row(my_tibble, animal = "pig", year = c(2017, 2018), healthy = TRUE)
#> # A tibble: 8 × 3
#> animal year healthy
#> <chr> <dbl> <lgl>
#> 1 sheep 2019 TRUE
#> 2 sheep 2020 TRUE
#> 3 sheep 2021 TRUE
#> 4 pig 2019 TRUE
#> 5 pig 2020 TRUE
#> 6 pig 2021 FALSE
#> 7 pig 2017 TRUE
#> 8 pig 2018 TRUE
From the results, we can see that multiple rows can be added at the same time by specifying the corresponding values for each variable name. Note the recycling rule applies for other variables with only one value specified.
To add an additional variable, in addition to using the $
operator followed by a name as in data frames, you can also use the function add_column()
.
add_column(my_tibble, weight = c(110, 120, 140, NA, 300, 800), height = c(2.2, 2.4,
2.7, 2, 2.1, 2.3))
#> # A tibble: 6 × 5
#> animal year healthy weight height
#> <chr> <int> <lgl> <dbl> <dbl>
#> 1 sheep 2019 TRUE 110 2.2
#> 2 sheep 2020 TRUE 120 2.4
#> 3 sheep 2021 TRUE 140 2.7
#> 4 pig 2019 TRUE NA 2
#> 5 pig 2020 TRUE 300 2.1
#> 6 pig 2021 FALSE 800 2.3
3.4.3 Tibble subsetting and modifying
While the tibble subsetting and modifying is very similar to those for data frame, we would like to point out a key difference.
First of all, when you use the [
and ]
to do tibble subsetting, it always returns a tibble by default, even if only one column is selected. This behavior is different from subsetting data frames using [
and ]
. If you are particularly interested in selecting only one column and returning it as a vector, you need to add drop = TRUE
in the subsetting process. You can also subset a single row and convert it into a vector by adding the same argument.
3.4.4 Exercises
Consider the following tibble,
animal <- rep(c("sheep", "pig"), c(3, 3))
weight <- c(110, NA, 140, NA, 300, 800)
condition <- c("excellent", "good", NA, "excellent", "good", "average")
healthy <- c(rep(TRUE, 5), FALSE)
my_tibble <- tibble(animal, weight, condition, healthy)
my_data_frame <- data.frame(animal, weight, condition, healthy)
my_tibble
#> # A tibble: 6 × 4
#> animal weight condition healthy
#> <chr> <dbl> <chr> <lgl>
#> 1 sheep 110 excellent TRUE
#> 2 sheep NA good TRUE
#> 3 sheep 140 <NA> TRUE
#> 4 pig NA excellent TRUE
#> 5 pig 300 good TRUE
#> 6 pig 800 average FALSE
- Add the following observation to
my_tibble
:animal = "pig"
,weight = 900
,condition = average
, andhealthy = FALSE
. - Without running in R, what do you think are the difference between
my_tibble[, 1]
andmy_data_frame[, 1]
? How can you reproducemy_data_frame[, 1]
by subsettingmy_tibble
?