&& and ||

This tutorial extends the Control Flow tutorial, where you learned how to use `if`, `else`, `return()`, and `stop()`.

Here you will learn how to

1. combine logical tests in an if statement
2. write if statements that work with vectors, which is a prerequisite if you want to write vectorized functions.

Here’s what `clean()` looked like at the end of the Control Flow tutorial. Do you notice that all of the if statements have the same outcome?

``````clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99) return(NA)
if (x == ".") return(NA)
if (x == "NaN") return(NA)
x
}``````

Let’s use your knowledge of logical tests to trim them down to a single if statement.

• Write a logical test that returns TRUE when x is -99 OR x is “.” (Let’s ignore the “NaN” case to keep things simple). Then click Submit Answer.
``"You can combine two logical tests in R with `&` (and) and `|` (or), e.g. x < 0 & x > 1."``
``x == -99 | x == "."``
``"This is the correct way to combine logical tests in R, but it has some downsides when you use it in an if statement."``

& and |

`&` and `|` are R’s boolean operators for combining logical tests.

• `&` stands for “and” will return `TRUE` if both tests return `TRUE` and will return `FALSE` otherwise.
• `|` stands for “or” will return `TRUE` if one or both tests returns `TRUE` and will return `FALSE` otherwise.

So,

``````x <- -99
x == -99 | x == "."``````
``## [1] TRUE``

However, it is bad practice to use `&` and `|` to combine logical tests within an `if` condition. Why? Because:

1. there is something better (as you’ll see in a minute)
2. `&` and `|` tend to generate warning messages when used with `if`

As R operators, both `&` and `|` are vectorized which means that you can use them with vectors. This is very useful.

``````x <- c(-99, 0 , 1)
x == -99``````
``## [1]  TRUE FALSE FALSE``
``x == "."``
``## [1] FALSE FALSE FALSE``
``x == -99 | x == "."``
``## [1]  TRUE FALSE FALSE``

However, `if` conditions are not vectorized. `if` expects the logical test contained within its parentheses to return a single `TRUE` or `FALSE`. If the condition returns a vector of `TRUE` or `FALSE`s, `if` will use the first value and show a warning message.

``````x <- c(-99, 0 , 1)
if (x == -99 | x == ".") NA``````
``````## Warning in if (x == -99 | x == ".") NA: the condition has length > 1 and
## only the first element will be used``````
``## [1] NA``

&& and ||

You can avoid this by always using `&&` and `||` within your `if` conditions. `&&` and `||` are lazy substitutes for `&` and `|`. They are lazy in two ways.

First, `&&` and `||` always return a single `TRUE` or `FALSE`. If you give `&&` or `||` vectors, they will compare only the first elements of the vectors—and they will not return a warning message.

``````x <- c(-99, 0 , 1)
x == -99 || x == "."``````
``## [1] TRUE``

Use ||

Let’s use this to our immediate advantage.

• Replace the two `if` statements below with a single statement that tests whether x is `-99` or `"."` without throwing error messages.
``````clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99) return(NA)
if (x == ".") return(NA)
x
}``````
``"Like |, || expects a _complete_ logical test on each side of ||."``
``````clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99 || x == ".") return(NA)
x
}``````
``strict_check("Now lets see what happens if you use clean() with a vector of values.")``
``````clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99 || x == ".") return(NA)
x
}``````

Computation

The most important reason to use `||` instead of `|` is that `||` saves unnecessary computation when possible. This is the second way that `&&` and `||` are lazy.

When possible, `&&` and `||` jump to the correct conclusion after evaluating the first of the two logical tests (not so with `&` and `|`).

• `&&` will return `FALSE` if the test on the left returns `FALSE` (because the combined test would return `FALSE`).
• `||` will return `TRUE` if the test on the left returns `TRUE` (because the combined test would return `TRUE`)

In either case, `&&` and `||` will not evaluate the test on the right.

``````x <- -99
if (x == -99 || stop("if you evaluate this.")) "I didn't evaluate stop()."``````
``## [1] "I didn't evaluate stop()."``

How could you use this?

Remember how this code returns an error because `if` cannot handle the result of `NULL == -99`?

``````clean <- function(x) {
if (x == -99) return(NA)
x
}
clean(NULL)``````
``## Error in if (x == -99) return(NA): argument is of length zero``

Quiz

Suppose we redefine `clean()` like this:

``````clean <- function(x) {
if (is.null(x) || x == -99) return(NA)
x
}``````

Vectorized if

Buried in the last section is an interesting question: what if you do want `clean()` to work with vectors? i.e.

``clean(c(-99, 0, 1))``
``## [1] NA  0  1``

That would be a handy way to clean whole columns of data. How could you do it?

Compare these two functions (one should seem familiar). What is different?

``````clean <- function(x) {
if (x == -99) NA else x
}

clean2 <- function(x) {
ifelse(x == -99, NA, x)
}``````

ifelse()

`ifelse()` is a function that replicates an if else statement. It takes three arguments: a logical test followed by two pieces of code. If the test returns `TRUE`, `ifelse()` will return the results of the first piece of code. If the test returns `FALSE`, `ifelse()` will return the results of the second piece of code.

So `clean(-99)` and `clean2(-99)` both return `NA`.

``clean(-99)``
``## [1] NA``
``clean2(-99)``
``## [1] NA``

However, unlike `if` and `else`, `ifelse` is vectorized. As a result, you can pass `ifelse()` a vector of values and it will apply the implied if else statement separately to each element of the vector.

``````x <- c(-99, 0, 1)
ifelse(x == -99, NA, x)``````
``## [1] NA  0  1``

`clean2()` inherits this vectorized property from `ifelse()`.

``clean2(c(-99, 0, 1))``
``## [1] NA  0  1``

Compare that to `clean()` (which is non-vectorized because it relies on `if` and `else`, which are non-vectorized).

``clean(c(-99, 0, 1))``
``````## Warning in if (x == -99) NA else x: the condition has length > 1 and only
## the first element will be used``````
``## [1] NA``

if_else

The dplyr package offers a slight improvement on `ifelse()` named `if_else()`. `if_else()` is faster than `ifelse()`, but it requires you to make sure that each case in the if else statement returns the same type of object. For example, the statement needs to return a real number (or a string, or a logical, etc.) whether or not the condition is `TRUE`.

No big deal, right? Well kind of.

``````x <- c(-99, 0, 1)
if_else(x == -99, NA, x)``````
``## Error: `false` must be a logical vector, not a double vector``

NA

What happened? Recall that data in R comes in six atomic types.

It is true:

``typeof(NA)``
``## [1] "logical"``

So when you write `if_else(x == -99, NA, x)`, `if_else()` returns a logical in the first case and a double (real number) in the second (assuming `x` is a real number).

You can get around this mishap in two ways:

1. Stick to `ifelse()`
2. Use a NA that comes with a type

Types of NA

You may not realize it, but R comes with five types of NA. They all appear as `NA` when printed, but they are each saved with a separate data type. These are:

``NA # logical``
``## [1] NA``
``NA_integer_ # integer``
``## [1] NA``
``NA_real_ # double``
``## [1] NA``
``NA_complex_ # complex``
``## [1] NA``
``NA_character_ # character``
``## [1] NA``

You can fix `if_else()` by being precise about which NA to use (most other R functions will convert the type of NA without bothering you).

``````x <- c(-99, 0, 1)
if_else(x == -99, NA_real_, x)``````
``## [1] NA  0  1``

Use if_else

• Fix the `if_else()` statement of `clean2()` to work with real numbers. Then click Submit Answer.
``````clean2 <- function(x) {
ifelse(x == -99, NA, x)
}``````
``````clean2 <- function(x) {
ifelse(x == -99, NA_real_, x)
}``````
``strict_check("Notice that this version of `clean2()` will work with real numbers, but not other types of data like characters. What if you want `clean2()` to work with all types of data? That's simple: stick with `ifelse()`.")``

Vectorized else if

What if you want to write a vectorized version of a multi-part if else tree? Like the tree in this function:

``````clean <- function(x) {
if (x == -99) NA
else if (x == ".") NA
else if (x == "") NA
else if (x == "NaN") NA
else x
}``````

In this case, neither `ifelse()` or `if_else()` will do. Why? Because each can only handle a single if condition, but our tree has four.

case_when()

You can vectorize multi-part if else statements with dplyr’s `case_when()` function. Here is how you would use `case_when()` to rewrite our `foo()` function from the Control Flow tutorial.

Here is the masterpiece in its original form

``````foo <- function(x) {
if (x > 2) "a"
else if (x < 2) "b"
else if (x == 1) "c"
else "d"
}``````

And here it is with `case_when()`.

``````foo2 <- function(x) {
case_when(
x > 2  ~ "a",
x < 2  ~ "b",
x == 1 ~ "c",
TRUE   ~ "d"
)
}``````

And here are our foos in action to prove that `foo2()` is vectorized.

``````x <- c(3, 2, 1)
foo(x)``````
``````## Warning in if (x > 2) "a" else if (x < 2) "b" else if (x == 1) "c" else
## "d": the condition has length > 1 and only the first element will be used``````
``## [1] "a"``
``foo2(x)``
``## [1] "a" "d" "b"``

Notice that

1. `case_when()` returns a single case for each element, the first case whose left hand side evaluates to `TRUE`
2. The left hand side of the last case evaluates to `TRUE` no matter what the value of `x` is (In fact, the left hand side is `TRUE`). This is an easy way to add an `else` clause to the end of `case_when()`.

Now let’s look at the unusual syntax of `case_when()`.

case_when() syntax

``````foo2 <- function(x) {
case_when(
x > 2  ~ "a",
x < 2  ~ "b",
x == 1 ~ "c",
TRUE   ~ "d"
)
}``````

Each argument of `case_when()` is a pair that consists of a logical test on the left hand side and a piece of code on the right hand side. The two are always separated by a `~`.

Like `if_else()`, `case_when()` expects each case to return the same type of output. So keep those NA types handy: `NA`, `NA_integer_`, `NA_real_`, `NA_complex_`, `NA_character_`.

Final Challenge

• Rewrite the multi-part version of `clean()` to use `case_when()`, which will allow `clean()` to handle vectors. Retain each case. Assume where necessary that `clean()` will only work with real numbers. Then click Submit Answer.
``````clean <- function(x) {
if (x == -99) NA
else if (x == ".") NA
else if (x == "") NA
else if (x == "NaN") NA
else x
}``````
``"Use NA's that have the right type."``
``````clean <- function(x) {
case_when(
x == -99 ~ NA_real_,
x == "." ~ NA_real_,
x == "" ~ NA_real_,
x == "NaN" ~ NA_real_,
TRUE ~ x
)
}``````
``strict_check('And if you noticed that a vector of real numbers would never contain ".", "", and "Nan" because they are strings, you are of course right. Thanks for playing along with the charade.')``

Congratulations!

You’ve learned how to alter the control flow of your functions with:

• `if`
• `else`
• `return()`
• `stop()`
• `stopifnot()`
• `ifelse()`

Not only that, you tackled two advanced methods: dplyr’s `if_else()` and dplyr’s `case_when()`.