10.3 Random Permutation and Random Sampling

Now, you have learned to work with distributions in R with the four useful functions for each distribution in Sections 10.1 and 10.2. In this section, we explore how to perform random permutations and random sampling in R. These techniques are widely used in statistics, machine learning, and data analysis for tasks like model validation and resampling.

10.3.1 Random Permutation

A random permutation rearranges the elements of a vector in a random order. This is often required in machine learning for splitting data into training and validation sets or in bootstrapping.

10.3.1.1 Example in R

To generate a random permutation of a vector, use the sample() function:

x <- 6:10
set.seed(0)  # Set seed for reproducibility
sample(x)  # Random permutation of x
#> [1]  6  9  8 10  7

You can reproduce the same random permutation by using the same seed:

set.seed(0)
sample(x)  # Reproduces the random permutation
#> [1]  6  9  8 10  7

10.3.2 Random Sampling Without Replacement

Sampling without replacement selects elements from a vector without repeating any of them. This is useful when you need a subset of elements from the vector.

Let’s randomly sample 3 elements from x:

set.seed(1)
sample(x, size = 3, replace = FALSE)
#> [1] 6 9 8

Here, the size argument specifies the number of elements to sample. If size is greater than the length of the vector, an error occurs:

sample(x, size = 6, replace = FALSE)
#> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'

In addition to using a vector in the first argument of sample, you can also use a positive integer (e.g., 10), which will be equivalent to x = 1:10. See the following code for an example.

sample(10, size = 4)  #sample 4 integers from 1 to 10.
#> [1] 1 2 5 7
sample(1:10, size = 4)  #sample 4 integers from 1 to 10.
#> [1] 2 3 1 5

10.3.3 Random Sampling with Replacement

Sometimes, you may want to get a sample with replacements. You will still be using the sample() function, but with the argument replace = TRUE. The following code samples 10 elements with replacement from x.

sample(x, size = 10, replace = TRUE)
#>  [1] 10  7  7  6 10 10  6  6 10 10

As expected, you will see some duplicated elements in the output vector.

A very important application of random sampling with replacement is bootstrap. A bootstrap sample is a sample of the same size as the original data with replacement. So, if you want to get a bootstrap sample from x, you will sample 5 elements with replacement from x.

sample(x, replace = TRUE)  #a bootstrap sample
#> [1] 7 7 6 9 6

Note that, when the argument size is not provided, it will take the default value: the length of x.

10.3.4 Random Sampling with Unequal Probabilities

By default, the sample() function will draw each element with the same probability. In some cases, you may want to assign different probabilities for different elements.

To draw elements with different probabilities, the first method is to use the random number generator (RNG) for Binomial distribution or Bernoulli distribution. Let’s say we want to randomly sample 100 elements from a Bernoulli distribution with success probability \(p=0.2\).

rbinom(100, size = 1, prob = 0.2)
#>   [1] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0
#>  [38] 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0
#>  [75] 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

In addition to using the rbinom function introduced in Section 10.2, you can use the sample function with the prob argument inside to achieve the same goal.

sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.8, 0.2))
#>   [1] 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1
#>  [38] 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
#>  [75] 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0

Here, you will samples 100 elements with replacement from c(0,1) , and the probability of drawing 0 is 0.8, the probability of drawing 1 is 0.2.

10.3.5 Exercise

  1. Randomly permute the vector x <- 1:20. Set the seed = 66 and verify that the permutation is reproducible.
  2. Create a random sample of 5 elements from the vector letters without replacement.
  3. Simulate 1000 random samples with replacement from the vector 1:10. Compute the frequency of each number in the sample.
  4. Split the vector 1:100 into two random subsets of size 70 and 30 without replacement. Calculate the mean of each subset.