2.14 Set Operations on Vectors
2.14.1 Set operations on vectors
We’ve spent a couple sections introducing numeric, character, and logical vectors separately. In this section, we’ll continue discussing set operations between two vectors of the same type. However, all set operators in this section are applicable to all three types off vectors. For later convenience, let’s create some numeric, character, and logical vectors now.
num1 <- c(1, 2, 1, 3, 1)
num2 <- c(1, 1, 3, 4, 4, 5)
char1 <- c("sheep", "monkey", "sheep", "chicken")
char2 <- c("sheep", "pig", "pig")
log1 <- c(T, F, F, T)
log2 <- c(T, T, T)
a. Intersection
When attempting to inspect values appeared in both vectors, you can use the intersect()
function. Such values can be anything that can be put inside numeric vectors, character vectors, or logical vectors.
intersect(num1, num2)
#> [1] 1 3
intersect(char1, char2)
#> [1] "sheep"
intersect(log1, log2)
#> [1] TRUE
Taking a closer look at the outputs, you will notice that the intersection procedure discards duplicate values in two vectors of the same type. In other words, only the unique elements are retained in the output.
b. Union
While intersect()
gives values appeared in both vectors of interest, the union()
function outputs all values that appear in at least one vector in the arguments.
union(num1, num2)
#> [1] 1 2 3 4 5
union(char1, char2)
#> [1] "sheep" "monkey" "chicken" "pig"
union(log1, log2)
#> [1] TRUE FALSE
Again, only one copy of each value is retained in the output.
c. Set difference
To get values that are only included in one of the two vectors of interest, you can use the setdiff()
function. Now, it’s important to keep in mind that, inside the function, the first argument will be the one you’d like to inspect its unique values.
Let’s start with num1
and num2
, two numeric vectors we created at the beginning of this section. To get values that only appear in num1
but not num2
, num1
will become the first argument and num2
will be the second/last argument.
Then you will get the result of 2
! Reflecting on the output, you should realize that the setdiff()
function only cares whether a value only appears, not if a value appears more or less frequently, in the specific vector of interest (in our case, num1
). The rationale is that the setdiff()
function will get unique elements in the arguments before setting the difference between them.
Similarly, if you want to get values in num2
but not in num1
, num2
should be in the first argument and num1
is in the second. These rules also apply to character vectors and logical vectors.
setdiff(num2, num1)
#> [1] 4 5
setdiff(char1, char2)
#> [1] "monkey" "chicken"
setdiff(log2, log1)
#> logical(0)
d. Set equality
To check whether two vectors are the same, you can use the setequal()
function. Similar to the setdiff()
function, the setequal()
function works by looking at whether the two vectors have same set of unique values.
setequal(num1, num2)
#> [1] FALSE
setequal(char1, char2)
#> [1] FALSE
setequal(log1, log2)
#> [1] FALSE
Of course you will get FALSE
in each operation: num1
has a unique value 2
, char1
and char1
each have their unique values, and "F"
only appears in log1
. However, you will get get TRUE
in the following examples,
setequal(c(1, 1, 2), c(1, 2))
#> [1] TRUE
setequal(c("apple", "apple", "peach"), c("apple", "peach"))
#> [1] TRUE
setequal(c("T", "T", "F", "F"), c("T", "F"))
#> [1] TRUE
e. Membership determination
Finally, to check whether each element of one vector is inside the other vector in the arguments, you can use the is.element()
function or the %in%
operator. They are identical to each other. The order of vectors is also important for membership determination.
is.element(num1, num2)
#> [1] TRUE FALSE TRUE TRUE TRUE
char2 %in% char1
#> [1] TRUE FALSE FALSE
log1 %in% log2
#> [1] TRUE FALSE FALSE TRUE
For is.element()
and %in%
, the output will be a logical vector and its length will be the same as the first argument.
In the first example above, the output is a logical vector of length-5, the same length as num1
. The first element of num1
is 1
, and num2
also has elements with value 1, so the first element of the logical vector is TRUE
. The second element of num1
is 2
, but num2
doesn’t have any elements with value 2, hence the result is FALSE
. You can verify the other elements by yourself.
Even if the first value of num2
is not 1
, as long as 1
appears somewhere in num2
, the first element of the output would still be TRUE
. In other words, is.element()
and %in%
don’t take the values’ indices into consideration.
2.14.2 Applicance of the coercion rule
At the beginning of this section, we’ve stressed that set operations are performed on two vectors of the same type. What about set operations on two vectors of different types? Are such operations achievable?
The answer is Yes! Remember in Section 2.4, you learned the coercion rule, which basically indicates that R will unify all elements into the most complex type. When you apply set operations on two vectors of different types, R will coerce the simpler vector to the more complex vector’s type and subsequently perform set operations of the same type. Below is how R recognize each vector type’s complexity, from the simplest to the most complex. \[\mbox{logical} < \mbox{numeric} < \mbox{character}\] Let’s try some examples.
x <- 1:6
y <- c(T, T, F, T)
intersect(x, y)
#> [1] 1
union(x, y)
#> [1] 1 2 3 4 5 6 0
setdiff(x, y)
#> [1] 2 3 4 5 6
setequal(x, y)
#> [1] FALSE
is.element(x, y)
#> [1] TRUE FALSE FALSE FALSE FALSE FALSE
In the example above, x
and y
are a numeric vector and a logical vector, respectively. When they become the arguments of set operations, the vector of the simpler type, y
, is coerced to the more complex type, numeric type in this case. Specifically, TRUE
is coerced to 1
and FALSE
is coerced to 0
. This is why you get a value of 0
when performing intersect()
, when 0
doesn’t look like extant in either x
or y
. From there, set operations are basically performed on two numeric vectors.
2.14.3 Summary
Please find a summary of the set operations between x
and y
in the following table. x
and y
are two vectors of the same type. If two vectors are of different vector types, the coercion rule will be applied as introduced above.
Operation | Code | Argument_Order |
---|---|---|
Intersection | intersect(x, y) |
insensitive |
Union | union(x, y) |
insensitive |
Set Difference | setdiff(x, y) |
sensitive |
Set Equality | setequal(x, y) |
insensitive |
Membership Determination | is.element(x, y) |
sensitive |
2.14.4 Exercises
Consider the vector s1 <- seq(from = 1, to = 100, length.out = 7)
.
Compare
s1
to 50 to see whether the values ofs1
are bigger than 50, then assign the result to name s2. Compares1
to 80 to see whether the values ofs1
are less or equal to 80, then assign the result to name s3.Use two methods (logical operators and set operations) to find the subvector of
s1
with values bigger than 50 and less or equal to 80.For
x <- 1:200
, use two methods (logical operators and set operations) to find the subvector ofx
that is divisible by 7, but not divisible by 2.