12.4 The Apply Family and Purrr
In Chapters 11 and 12, you learned how to write loops and define functions. A natural next question is: how can we apply a function to every element of a vector or list without writing an explicit for loop? R provides two powerful approaches: the base R apply family of functions and the purrr package from the tidyverse.
12.4.1 The Apply Family
The base R apply family includes several functions that apply a given function to elements of a data structure. The most commonly used ones are lapply(), sapply(), and vapply().
12.4.1.1 lapply(): Apply and Return a List
The function lapply(X, FUN) applies the function FUN to each element of X and always returns a list.
my_vec <- c(1, 4, 9, 16, 25)
lapply(my_vec, sqrt)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
#>
#> [[4]]
#> [1] 4
#>
#> [[5]]
#> [1] 5You can also use lapply() on a list.
12.4.1.2 sapply(): A Simplified lapply()
The function sapply() works like lapply(), but tries to simplify the result into a vector or matrix when possible.
Notice that sapply() returns a named numeric vector instead of a list, which is often more convenient.
12.4.1.3 vapply(): A Safer sapply()
The function vapply() is similar to sapply(), but requires you to specify the expected return type. This makes it safer for programming because it will throw an error if the output doesn’t match the expected type.
Here, numeric(1) tells R that each result should be a single numeric value.
12.4.2 Introduction to Purrr
The purrr package provides a more consistent and flexible alternative to the apply family. Its core function is map(), which always returns a list.
12.4.2.1 Typed Map Functions
One advantage of purrr is that it offers typed variants of map() that specify the output type:
map_dbl(): returns a double (numeric) vectormap_chr(): returns a character vectormap_lgl(): returns a logical vectormap_int(): returns an integer vector
map_dbl(my_list, mean)
#> a b c
#> 3 8 13
map_chr(my_list, ~ paste("Mean:", round(mean(.x), 1)))
#> a b c
#> "Mean: 3" "Mean: 8" "Mean: 13"In the second example, we used a formula notation (~ ...) to define an anonymous function inline. The .x refers to the current element being processed.
12.4.3 Comparison: For Loops vs. Apply vs. Purrr
Let’s compare the three approaches for computing the mean of each column in a data frame.
library(r02pro)
sahp_numeric <- sahp[, c("sale_price", "liv_area", "lot_area")]
# Using a for loop
result_loop <- numeric(ncol(sahp_numeric))
for (i in seq_along(sahp_numeric)) {
result_loop[i] <- mean(sahp_numeric[[i]], na.rm = TRUE)
}
names(result_loop) <- names(sahp_numeric)
result_loop
#> sale_price liv_area lot_area
#> 179.8901 1481.2667 9832.3152
# Using sapply
sapply(sahp_numeric, mean, na.rm = TRUE)
#> sale_price liv_area lot_area
#> 179.8901 1481.2667 9832.3152
# Using purrr
map_dbl(sahp_numeric, mean, na.rm = TRUE)
#> sale_price liv_area lot_area
#> 179.8901 1481.2667 9832.3152All three approaches produce the same result. The sapply() and map_dbl() versions are more concise and clearly express the intent: apply a function to each element.
When to use which:
- Use
forloops when the iteration has side effects or when each step depends on the previous one. - Use
sapply()/vapply()for quick, base-R-only scripts. - Use
map()and its variants for tidyverse workflows, especially when chaining with the pipe operator.
12.4.4 Exercises
Given the list
my_data <- list(x = c(1, 3, 5, NA), y = c(2, 4, 6, 8), z = c(10, 20, 30)), usesapply()to compute the mean of each element, handlingNAvalues withna.rm = TRUE.Repeat Exercise 1 using
map_dbl()from the purrr package.Create a matrix with 4 rows and 5 columns filled with random numbers (use
matrix(rnorm(20), nrow = 4)). Useapply()to compute the standard deviation of each column.Using the
sahpdataset, select the numeric columnssale_price,liv_area,lot_area, andoa_qual. Usemap_df()to create a summary data frame with the mean, median, and standard deviation of each variable.Write a function
scale_vec(x)that scales a numeric vector to have mean 0 and standard deviation 1. Uselapply()to apply this function to each numeric column of thesahpsubset from Exercise 4.