Iterations in R with lapply (and variants) and purrr

Mark Andrews. July 6, 2021

A common feature of all programming languages is the ability to perform iterations whereby a code statement is executed repeatedly. In most programming languages, though not all, the standard way to do this is with a loop, usually a so-called for-loop. R allows us to write for-loops, and these are commonly used. However, R also provides us with various functionals, which are functions that takes functions as input. In many cases, functionals can replace for-loops with single short statements that are often far easier to read, write, and understand. In base R, functionals are provided by the *apply family, including lapply, sapply, vapply, etc. More recently, the purrr package provides replacements for these *apply functionals that are arguably simultaneously more powerful and easier to use than the *apply family.

The aim of this post is to ultimately introduce the purrr functionals, which are powerful tools that every serious R user should know about and use. To properly introduce purrr functionals, it is necessary and otherwise generally useful to also know about the *apply functionals they aim to replace. And to understand the *apply functionals, it is necessary to understand for-loops.

for-loops

As a simple but non-trivial example of for-loops in R, let us consider how we would use the following sawtooth function.

sawtooth <- function(x){
  if (x < 0){
    0
  } else if (x < 1/3) {
    3 * x
  } else if (x < 2/3) {
    3 * x - 1
  } else if (x <= 1) {
    3 * x - 2
  } else {
    0
  }
}

This function implements a piecewise linear map: for any value \(x \in \mathbb{R}\), it maps it in a piecewise linear manner, to a value between 0 and 1. It is illustrated in the following figure.

Suppose we had the following vector of 1000 elements to which we wished to apply the sawtooth function.

N <- 1000
x <- seq(-0.1, 1.1, length.out = N)

Although it may seem like the obvious choice, we can not do the following because of the if ... else statements in the function’s body.

sawtooth(x) # this won't work

In order to apply sawtooth to all elements of x, we could in principle apply sawtooth to eac element of x, one at a time, as follows.

# Create a vector of 0's of same length as x
# This can also be done with `y <- vector('double', N)`
y <- numeric(N)

y[1] <- sawtooth(x[1])
y[2] <- sawtooth(x[2])
y[3] <- sawtooth(x[3])
...
y[N] <- sawtooth(x[N]) # where N = 1000

It should be obvious that we want to avoid this code duplication at all costs. Instead, we can create a for-loop as follows.

y <- numeric(N)
for (i in 1:N) {
  y[i] <- sawtooth(x[i])
}

Essentially, this loop repeatedly executes the statement y[i] <- sawtooth(x[i]). On the first iteration, i takes the value of 1. On the second iteration, i takes the value of 2, and so on, until the final iteration where i takes the values of N. In other words, for each value of i from 1 to N, we execute y[i] <- sawtooth(x[i]).

The general form of a for loop is as follows.

for (<var> in <sequence>) {
  <code body>
}

The for loop iteration begins with the for keyword followed by a round bracketed expression of the form (<var> in <sequence>) where <var> is what we’ll call the loop variable and <sequence> is (usually) a vector or list of items. After the round brackets is some code enclosed by {}. For each value in the <sequence>, the <var> is set to this value and the <code body> is executed. We could write this in pseudo-code as follows.

for each value in <sequence>
  set <var> equal to this value
  execute <code body>

In other words, the for loop executes the <code body> for each value in <sequence>, setting <var> to this value on each iteration.

From for-loops to apply functionals

Functionals are functions that take a function as input and return a vector. They play an important role in programming in R, can can replace many, though not all, for-loop constructions. There are many functionals in the base R language. Here, we will look at the most useful or widely used ones.

`lapply`

One of the most widely used functionals in base R is the lapply function, which takes two required arguments, a vector or list and a function, and then applies the function to each element in the vector or list and returns a new list. As an example, instead of using a for-loop, we could use lapply to apply the sawtooth function to each element of a vector x. Here’s the original for loop.

y <- numeric(N)
for (i in 1:N) {
  y[i] <- sawtooth(x[i])
}

We can replace this entirely using lapply as follows.

y <- lapply(x, sawtooth)

The lapply function returns a list with as many elements as there are elements in x. Should we prefer that y be a vector rather than a list, we can unlist it, but there are other options too for returning a vector from a functional, as we will see shortly.

The general form of the lapply function is as follows.

returned_list <- lapply(<vector or list>, <function>)

Sometimes, we may need to use a function that takes arguments in lapply. As an example, let us imagine that we wish to calculate the trimmed mean of each vector, and also ignoring missing values, in a list of vectors. The trimmed mean is where we compute the mean after a certain proportion of the high and low elements have been trimmed. The trimmed mean of a vector, where we remove 5% of values on the upper and lower extremes of the vector, and also ignoring missing values, is as follows.

s <- c(10, 20, NA, 125, 35, 15)
mean(s, trim = 0.05, na.rm = T)

## [1] 41

To use the trim = 0.05 and na.rm = T arguments when we use lapply, we supply them as optional arguments after the function name as follows.

data_vectors <- list(x = c(NA, 10, 11, 12, 1001, -20),
                     y = c(5, 10, 7, 2500, 6),
                     z = c(2, 4, 1000, 8, 5, 7, NA)
)

lapply(data_vectors, mean, trim = 0.05, na.rm = T)

## $x
## [1] 202.8
## 
## $y
## [1] 505.6
## 
## $z
## [1] 171

Note that because data frames (and tibbles) are essentially lists of vectors, lapply can be easily used to apply a function to all columns of a data frame.

library(tidyverse)

data_df <- tibble(x = rnorm(10),
                  y = rnorm(10),
                  z = rnorm(10))

lapply(data_df, mean) %>%
  unlist()

##           x           y           z 
##  0.24504010 -0.43967492 -0.05399682

The above function can also be accomplished with dplyr’s summarise, except that summarise returns a tibble.

data_df %>%
  summarise(across(everything(), mean))

## # A tibble: 1 × 3
##       x      y       z
##   <dbl>  <dbl>   <dbl>
## 1 0.245 -0.440 -0.0540

`sapply` & `vapply`

As we’ve seen, lapply always returns a list. Sometimes, this returned list is a list of single values or list of vectors of the same length and type. In these cases, it would be preferable to convert these lists to vectors or matrices. We saw this in the case of using sawtooth with lapply above, where we mentioned that we could manually convert the returned list into a vector using unlist. Variants of lapply, sapply and vapply, can facilitate doing these conversions. The sapply function works like lapply but will attempt to simplify the list as a vector or a matrix if possible. In the following example, we use sapply to apply sawtooth to each element of x.

y <- sapply(x, sawtooth)
head(y, 5)

## [1] 0 0 0 0 0

Here, because the list that would have been returned by lapply, had we used it here, is a list of length N of single numeric values (or numeric vectors of length 1), sapply can produce and return a numeric vector of length N.

If the results of lapply is a list of numeric vectors, then this list could be simplified to a matrix. In the following example, the list returned by lapply is like this.

data_df <- tibble(x = rnorm(100),
                  y = rnorm(100),
                  z = rnorm(100))
lapply(data_df, quantile)

## $x
##          0%         25%         50%         75%        100% 
## -2.31932737 -0.70960382 -0.05431694  0.72976278  2.13348636 
## 
## $y
##          0%         25%         50%         75%        100% 
## -2.82300012 -0.48106048  0.04061442  0.61636766  2.13286973 
## 
## $z
##         0%        25%        50%        75%       100% 
## -2.4662529 -0.6690109 -0.1111398  0.5306826  2.1873352

If we use sapply here instead, the result is a matrix.

sapply(data_df, quantile)

##                x           y          z
## 0%   -2.31932737 -2.82300012 -2.4662529
## 25%  -0.70960382 -0.48106048 -0.6690109
## 50%  -0.05431694  0.04061442 -0.1111398
## 75%   0.72976278  0.61636766  0.5306826
## 100%  2.13348636  2.13286973  2.1873352

Note that in some cases, sapply can not simplify the list. For example, in the following example, the list returned by lapply has elements of different lengths, and sapply can not simplify it.

X <- list(x = rnorm(2),
          y = rnorm(3),
          z = rnorm(4)
)
sapply(X, function(x) x^2)

## $x
## [1] 0.0503656 0.1913169
## 
## $y
## [1] 1.7629691 0.2194715 1.7954867
## 
## $z
## [1] 2.7736662 0.2904671 0.9987699 0.1166150

The vapply function is a safer version of sapply because it specifies the nature of the returned values of each application of the function. For example, we know the default returned value of quantile will be a numeric vector of length 5. We can specify this as the FUN.VALUE argument to vapply.

vapply(data_df, quantile, FUN.VALUE=numeric(5))

##                x           y          z
## 0%   -2.31932737 -2.82300012 -2.4662529
## 25%  -0.70960382 -0.48106048 -0.6690109
## 50%  -0.05431694  0.04061442 -0.1111398
## 75%   0.72976278  0.61636766  0.5306826
## 100%  2.13348636  2.13286973  2.1873352

`mapply` & `Map`

The functionals lapply, sapply, vapply take a function and apply it to each element in a vector or list. In other words, each element in the vector or list is supplied as the argument to the function. While we may, as we described above, have other arguments to the function set to fixed values for each function application, we can not use lapply, sapply or vapply to apply functions to two or more lists or vectors at the same time. Consider the following function.

power <- function(x, k) x^k

If we had a vector of x values, and set k to e.g. 3, we could do the following.

x <- c(2, 3, 4, 5)
sapply(x, power, k=5)

## [1]   32  243 1024 3125

However, if we had the vector of x values and a vector of k values, and wish to apply each element of x and the corresponding element of k to power, we need to use mapply, as in the following example.

x <- c(2, 3, 4, 5)
k <- c(2, 3, 2, 2)
mapply(power, x = x, k = k)

## [1]  4 27 16 25

This mapply is therefore equivalent to the following for loop.

for (i in seq_along(x)){
  power(x[i], k[i])
}

With mapply, we can iterate over any number of lists of input arguments simultaneously. As an example, the random number generator rnorm function takes 3 arguments: n, mean, and sd. In the following, we apply rnorm to each value of of lists of these three arguments.

n <- c(2, 3, 5)
mu <- c(10, 100, 200)
sigma <- c(1, 10, 10)
mapply(rnorm, n = n, mean = mu, sd = sigma)

## [[1]]
## [1] 8.941373 9.958401
## 
## [[2]]
## [1]  87.07172  91.06136 105.41909
## 
## [[3]]
## [1] 209.6964 197.5501 208.6018 193.2416 194.9996

As we can see, we effectively execute rnorm(n=3, mean=10, sd=1), rnorm(n=5, mean=5, sd = 10), and so on.

The Map function works just like mapply, with minor differences, such as not ever simplifying the results.

set.seed(101)
Map(rnorm, n = n, mean = mu, sd = sigma)

## [[1]]
## [1]  9.673964 10.552462
## 
## [[2]]
## [1]  93.25056 102.14359 103.10769
## 
## [[3]]
## [1] 211.7397 206.1879 198.8727 209.1703 197.7674

As we can see, Map and mapply are identical in their usage and in what they do. However, by default, mapply will attempt to simply its output like sapply, if possible. We saw this with the use of the power function with mapply above. However, if we replace mapply with Map, no simplification is applied, and we obtain a list as output.

Functionals with `purrr`

The purrr package in the tidyverse provides functionals like those just covered, but which are consistent with one another in terms of how they are used, and also with how other tidyverse functions are used. In addition, purrr provides additional functional tools beyond those in base R. We can load purrr with library(purrr), but it is also loaded by library(tidyverse).

`map`

One of the main tools in purrr is map and its variants. It is very similar to lapply. It takes a list (or vector) and a function, and applies the function to each element of the list. To re-use an example from above, in the following, we apply sawtooth to each element in a vector x and return the results in a list.

y <- map(x, sawtooth)

This is identical to how we use lapply. As with lapply, we can supply arguments for the function being applied as optional arguments. For example, the trimmed mean example above where we used lapply can also be done with map.

map(data_vectors, mean, trim = 0.05, na.rm = T)

## $x
## [1] 202.8
## 
## $y
## [1] 505.6
## 
## $z
## [1] 171

When the list that is returned by map or its variants can be simplified to a vector, we can use the map_dbl, map_int, map_lgl, map_chr variants of map to simplify the list to a vector of doubles, integers, Booleans, or characters, respectively. In the following example, we apply functions that produce integers, logicals, doubles, or characters, to each column of a data frame.

data_lst <- list(x = c(1, 2, 3),
                 y = c(1L, 2L, 3L, 4L),
                 z = rnorm(100)
)

map_int(data_lst, length)

##   x   y   z 
##   3   4 100

map_lgl(data_lst, is_integer)

##     x     y     z 
## FALSE  TRUE FALSE

map_dbl(data_lst, mean)

##           x           y           z 
##  2.00000000  2.50000000 -0.03547806

map_chr(data_lst, class)

##         x         y         z 
## "numeric" "integer" "numeric"

purrr style anonymous functions

We saw above that we can use anonymous functions in *apply family functionals. This can be done in purrr functionals like the map family too, as we see in the following example.

map_dbl(data_lst,
        function(x) mean(log(abs(x)))
)

##          x          y          z 
##  0.5972532  0.7945135 -0.6096516

However, purrr provides syntactic sugar to allow us to rewrite this as follows.

map_dbl(data_lst,
        ~ mean(log(abs(.)))
)

##          x          y          z 
##  0.5972532  0.7945135 -0.6096516

In other words, in place of the function(x), we have ~, and in place of the anonymous function’s input variable we have ..

`map2` & `pmap`

When we have two or more sets of input arguments, we can use map2 and pmap, respectively. Both of these functions also have the _lgl, _int, _dbl, _chr, and other variants that we saw with map.

As an example of a map2 function, we’ll use the power function that takes two input arguments.

x <- c(2, 3, 4, 5)
k <- c(2, 3, 2, 2)
map2_dbl(x, k, power)

## [1]  4 27 16 25

As an example of a pmap function, we can reimplement the rnorm based sampler that we originally wrote with mapply. The first argument to pmap is a list whose length is the number of arguments being passed to the function. If want to iterate over different values of the n, mean and sd arguments for rnorm, as we did in the example above, we’d set up a list like the following.

args <- list(n = c(2, 3, 5),
             mean = c(10, 100, 200),
             sd = c(1, 10, 10))

We then use pmap as follows.

pmap(args, rnorm)

## [[1]]
## [1] 9.432079 9.953257
## 
## [[2]]
## [1]  98.43019 116.02242 107.68654
## 
## [[3]]
## [1] 192.2837 193.6932 191.6972 194.0889 209.8109

Iterations in R with lapply (and variants) and purrr

Mark Andrews. July 6, 2021

for-loops

From for-loops to apply functionals

lapply

sapply & vapply

mapply & Map

Functionals with purrr

map