Tidy Eval Meets ggplot2

The Bang Bang Plots

Almost a year ago I wrote about my My First Steps into The World of Tidy Eval. At the end I tweeted asking Hadley Wickham and Lionel Henrey whether ggplot2 was compatible with the tidy eval, They said that it was on the todo list.

Finally, ggplot2 3.0.0 got released last week with the support of tidy eval, so I thought it was time to write about it!

In this post, I will go over the traditional ways of using ggplot2 then I will highlight some cases, in which tidy eval solves problems and provides more flexibility.

Dataset

In all the following examples We will use a simple dataset, which is tips from reshape2 package. The dataset contains information about tips received by a waiter over a period of a few months. We will add a new column with tip_percent to use in our plots.

## Load packages
library(reshape2)
library(tidyverse)
## add a new column with tip_percent
tips <- tips %>% 
  mutate(tip_percent = tip/total_bill)

as_tibble(tips)
# A tibble: 244 x 8
   total_bill   tip sex    smoker day   time    size tip_percent
        <dbl> <dbl> <fct>  <fct>  <fct> <fct>  <int>       <dbl>
 1      17.0   1.01 Female No     Sun   Dinner     2      0.0594
 2      10.3   1.66 Male   No     Sun   Dinner     3      0.161 
 3      21.0   3.5  Male   No     Sun   Dinner     3      0.167 
 4      23.7   3.31 Male   No     Sun   Dinner     2      0.140 
 5      24.6   3.61 Female No     Sun   Dinner     4      0.147 
 6      25.3   4.71 Male   No     Sun   Dinner     4      0.186 
 7       8.77  2    Male   No     Sun   Dinner     2      0.228 
 8      26.9   3.12 Male   No     Sun   Dinner     4      0.116 
 9      15.0   1.96 Male   No     Sun   Dinner     2      0.130 
10      14.8   3.23 Male   No     Sun   Dinner     2      0.219 
# ... with 234 more rows

ggplot2 the common way

aes()

The common way of using ggplot2 functions, and the first example most people try is using Non standard Evaluation (NSE), by passing variable names unquoted. For instance if you want to plot tip_percent versus total_bill you could simply write the following chunk of code.

ggplot(tips, aes(x = total_bill,
                 y = tip_percent,
                 alpha = 0.5))+
  geom_point()+
  theme_minimal()+
  guides(alpha = FALSE)

aes_string()

Another way is using aes_string() instead of aes(). In this case, you will need to pass column names quoted as follows, which will give the same result.

ggplot(tips, aes_string(x = "total_bill",
                 y = "tip_percent",
                 alpha = 0.5))+
  geom_point()+
  theme_minimal()+
  guides(alpha = FALSE)

This version could be helpful when you want to pass variables to ggplot2 functions. For instance, you can handle a scenario, when you have pre-defined variables to use, or in case you take the variable names from a user through an interface. The following examples will work fine and give the same result.

## define variables
x_var <- "total_bill"
y_var <- "tip_percent"
alpha_var <- 0.5

ggplot(tips, aes_string(x = x_var,
                 y = y_var,
                 alpha = alpha_var))+
  geom_point()+
  theme_minimal()+
  guides(alpha = FALSE)

And usually the use case for this, is the need to write a reusable function that wraps several ggplot2 layers to work with whatever variables passed to it. For example, you can wrap the previous chunk in a function as follows

plot_points <- function(x, y, alpha){
  ggplot(tips, aes_string(x = x, y = y, alpha = alpha))+
  geom_point()+
  theme_minimal()+
  guides(alpha = FALSE)
}

Then you can either pass column names directly or the variable names.

## pass column names directly
plot_points(x = "total_bill",
            y = "tip_percent",
            alpha = 0.5)

## pass variable names
plot_points(x = x_var,
            y = y_var,
            alpha = alpha_var)

And you can reuse the function to create similar plots with different variables. For instance tip on the y-axis instead of tip_percent

## plot tip vs. total_bill
plot_points(x = "total_bill",
            y = "tip",
            alpha = 0.5)

ggplot2 the tidy eval way

The previous options are good and valid in many cases, but there are some limitations. For instance, you cannot create a function and pass column names unquoted as follows.

plot_points2 <- function(x, y, alpha){
  ggplot(tips, aes(x = x, y = y, alpha = alpha))+
    geom_point()+
    theme_minimal()+
    guides(alpha = FALSE)
}

## invalid way
plot_points2(x = total_bill,
             y = tip_percent,
             alpha = 0.5)

This will just break and gives an error like:

Error in FUN(X[[i]], ...) : object 'total_bill' not found

Previously there was no straightforward way to handle this, but with tidy eval it became possible and easy.

Unquoted column names / NSE

You can use enquo() and the bang bang operator !! to create a function like plot_points_tidyeval_01() which takes the column names unquoted, and it will work properly.

plot_points_tidyeval_01 <- function(x, y, alpha){
  
  x_var <- enquo(x)
  y_var <- enquo(y)
  
  ggplot(tips, aes(x = !!x_var, y = !!y_var, alpha = alpha))+
    geom_point()+
    theme_minimal()+
    guides(alpha = FALSE)
}

## pass column names unquoted 
plot_points_tidyeval_01(x = total_bill,
                        y = tip_percent,
                        alpha = 0.5)

You might still see this as a simple use case that could be handled by aes_string(). But there are more things you can do, like variable facets.

Variable Facets

Suppose that you wanted to define a variable facet, you wouldn’t have this flexibility even with aes_string(). But now you can implement it as follows.

Notice that vars() is a new helper inside facet_grid instead of the ~.

plot_points_tidyeval_02 <- function(x, y, alpha, facet){
  
  x_var <- enquo(x)
  y_var <- enquo(y)
  facet_var <- enquo(facet)
  
  ggplot(tips, aes(x = !!x_var, y = !!y_var, alpha = alpha))+
    geom_point()+
    theme_minimal()+
    guides(alpha = FALSE)+
    facet_grid(cols = vars(!!facet_var))}

So you can pass sex to the facet variable in the plot_points_tidyeval_02() function.

## use sex for facet
plot_points_tidyeval_02(x = total_bill,
                        y = tip_percent,
                        alpha = 0.5,
                        facet = sex)

or a different variable like day

## use day for facet
plot_points_tidyeval_02(x = total_bill,
                        y = tip_percent,
                        alpha = 0.5,
                        facet = day)

You can even use the dots ... to have more flexibility and allow the function to take more than one variable for faceting. You can pass the dots ... directly to vars() inside facet_wrap() as follows.

plot_points_tidyeval_03 <- function(x, y, alpha, ...){
  
  x_var <- enquo(x)
  y_var <- enquo(y)
  
  ggplot(tips, aes(x = !!x_var, y = !!y_var, alpha = alpha))+
    geom_point()+
    theme_minimal()+
    guides(alpha = FALSE)+
    facet_wrap(vars(...), nrow = 2)
}


## use both sex and day for facets
plot_points_tidyeval_03(x = total_bill,
                        y = tip_percent,
                        alpha = 0.5,
                        sex, day)

In Conclusion

tidy eval with ggplot2 offers new options and more capabilities. It solves problems people used to struggle with and gives a higher level of flexibilty in writing functions and programming. I just covered a couple of options in this post, but I am sure I will discover more stuff and hopefully write about them later.


See also