12.4 The Apply Family and Purrr

In Chapters 11 and 12, you learned how to write loops and define functions. A natural next question is: how can we apply a function to every element of a vector or list without writing an explicit for loop? R provides two powerful approaches: the base R apply family of functions and the purrr package from the tidyverse.

12.4.1 The Apply Family

The base R apply family includes several functions that apply a given function to elements of a data structure. The most commonly used ones are lapply(), sapply(), and vapply().

12.4.1.1 lapply(): Apply and Return a List

The function lapply(X, FUN) applies the function FUN to each element of X and always returns a list.

my_vec <- c(1, 4, 9, 16, 25)
lapply(my_vec, sqrt)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3
#> 
#> [[4]]
#> [1] 4
#> 
#> [[5]]
#> [1] 5

You can also use lapply() on a list.

my_list <- list(a = 1:5, b = 6:10, c = 11:15)
lapply(my_list, mean)
#> $a
#> [1] 3
#> 
#> $b
#> [1] 8
#> 
#> $c
#> [1] 13

12.4.1.2 sapply(): A Simplified lapply()

The function sapply() works like lapply(), but tries to simplify the result into a vector or matrix when possible.

sapply(my_vec, sqrt)
#> [1] 1 2 3 4 5
sapply(my_list, mean)
#>  a  b  c 
#>  3  8 13

Notice that sapply() returns a named numeric vector instead of a list, which is often more convenient.

12.4.1.3 vapply(): A Safer sapply()

The function vapply() is similar to sapply(), but requires you to specify the expected return type. This makes it safer for programming because it will throw an error if the output doesn’t match the expected type.

vapply(my_list, mean, numeric(1))
#>  a  b  c 
#>  3  8 13

Here, numeric(1) tells R that each result should be a single numeric value.

12.4.1.4 apply(): For Matrices and Arrays

The function apply(X, MARGIN, FUN) applies a function over rows (MARGIN = 1) or columns (MARGIN = 2) of a matrix.

my_mat <- matrix(1:12, nrow = 3)
my_mat
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    4    7   10
#> [2,]    2    5    8   11
#> [3,]    3    6    9   12
apply(my_mat, 1, sum)  # row sums
#> [1] 22 26 30
apply(my_mat, 2, sum)  # column sums
#> [1]  6 15 24 33

12.4.2 Introduction to Purrr

The purrr package provides a more consistent and flexible alternative to the apply family. Its core function is map(), which always returns a list.

library(purrr)
map(my_list, mean)
#> $a
#> [1] 3
#> 
#> $b
#> [1] 8
#> 
#> $c
#> [1] 13

12.4.2.1 Typed Map Functions

One advantage of purrr is that it offers typed variants of map() that specify the output type:

  • map_dbl(): returns a double (numeric) vector
  • map_chr(): returns a character vector
  • map_lgl(): returns a logical vector
  • map_int(): returns an integer vector
map_dbl(my_list, mean)
#>  a  b  c 
#>  3  8 13
map_chr(my_list, ~ paste("Mean:", round(mean(.x), 1)))
#>          a          b          c 
#>  "Mean: 3"  "Mean: 8" "Mean: 13"

In the second example, we used a formula notation (~ ...) to define an anonymous function inline. The .x refers to the current element being processed.

12.4.2.2 map_df(): Return a Data Frame

When each element returns a data frame (or a named vector), you can use map_df() to combine the results into a single data frame.

library(r02pro)
map_df(my_list, ~ tibble(mean = mean(.x), sd = sd(.x)), .id = "group")
#> # A tibble: 3 × 3
#>   group  mean    sd
#>   <chr> <dbl> <dbl>
#> 1 a         3  1.58
#> 2 b         8  1.58
#> 3 c        13  1.58

12.4.3 Comparison: For Loops vs. Apply vs. Purrr

Let’s compare the three approaches for computing the mean of each column in a data frame.

library(r02pro)
sahp_numeric <- sahp[, c("sale_price", "liv_area", "lot_area")]

# Using a for loop
result_loop <- numeric(ncol(sahp_numeric))
for (i in seq_along(sahp_numeric)) {
  result_loop[i] <- mean(sahp_numeric[[i]], na.rm = TRUE)
}
names(result_loop) <- names(sahp_numeric)
result_loop
#> sale_price   liv_area   lot_area 
#>   179.8901  1481.2667  9832.3152

# Using sapply
sapply(sahp_numeric, mean, na.rm = TRUE)
#> sale_price   liv_area   lot_area 
#>   179.8901  1481.2667  9832.3152

# Using purrr
map_dbl(sahp_numeric, mean, na.rm = TRUE)
#> sale_price   liv_area   lot_area 
#>   179.8901  1481.2667  9832.3152

All three approaches produce the same result. The sapply() and map_dbl() versions are more concise and clearly express the intent: apply a function to each element.

When to use which:

  • Use for loops when the iteration has side effects or when each step depends on the previous one.
  • Use sapply()/vapply() for quick, base-R-only scripts.
  • Use map() and its variants for tidyverse workflows, especially when chaining with the pipe operator.

12.4.4 Exercises

  1. Given the list my_data <- list(x = c(1, 3, 5, NA), y = c(2, 4, 6, 8), z = c(10, 20, 30)), use sapply() to compute the mean of each element, handling NA values with na.rm = TRUE.

  2. Repeat Exercise 1 using map_dbl() from the purrr package.

  3. Create a matrix with 4 rows and 5 columns filled with random numbers (use matrix(rnorm(20), nrow = 4)). Use apply() to compute the standard deviation of each column.

  4. Using the sahp dataset, select the numeric columns sale_price, liv_area, lot_area, and oa_qual. Use map_df() to create a summary data frame with the mean, median, and standard deviation of each variable.

  5. Write a function scale_vec(x) that scales a numeric vector to have mean 0 and standard deviation 1. Use lapply() to apply this function to each numeric column of the sahp subset from Exercise 4.


Buy Me A Coffee