`ggplot2`

, as I’ve already made clear, is one of my favourite packages for `R`

. And since that original post about `ggplot2`

remains one of my most frequently visited, I thought I would proceed with starting a series of posts called “Make It Pretty”, all about sharing ways of visualizing data that I think are attractive/effective/comprehensive. So with this inaugural MIP post, I will be covering how to plot 2-way interactions using `ggplot2`

.

2-way interactions can come in one of three general forms, and I will be providing code for plotting each. This will be a pretty lengthy post (lots of code/explanation), so if you’re only interested in learning how to plot a particular form, just click the the one below. Oh, and each uses an APA format theme that I’ve shared before, but just click here to quickly flip to the code if you need it.

- Two categorical variables (e.g., an experimental manipulation and participant gender) interact to predict some outcome
- One categorical variable (e.g., an experimental manipulation) interacts with a continuous variable (e.g., participant age) to predict some outcome
- Two continuous variables (e.g., participant age and participant income) interact to predict some outcome (Updated Mar. 22, 2016)

**Disclaimer About My Plotting Philosophy**

I think it is important for visualizations of data to plot both signal and noise. Often times, I encounter plots–mainly from regression analyses–where all I can see are the trends (e.g., lines of best fit), with no attempts made to visualize uncertainty or error in the those trends. As such, you will notice that with my scatterplots, in particular, I am adamant about plotting the raw data points along linear trends, as I think it is important to recognize that significant associations between variables often still leave much to be explained. This approach may not be for you–I’ve had some colleagues say it’s not a “clean” approach to visualization (especially for continuous x continuous interactions)–but it is how I will proceed, and wanted to at least provide some rationale for this practice.

**2-Way Interactions with Two Categorical Variables**

2-way interactions between categorical variables will most commonly be analyzed using a factorial ANOVA approach. Visualizing 2-way interactions from this kind of design actually takes more coding effort, because you will not be plotting the raw data. Instead, you will need to first summarize the data (means, standard deviations, n per group) via the `psych`

package, then calculate a standard error for each group’s mean, and finally, create a bar graph.

Start by installing and calling the necessary packages (`psych`

for summary statistics, and `ggplot2`

for plotting). For all the example figures, I will be using the `mtcars`

dataset, which we will store in a data frame called `dat`

.

install.packages('psych') install.packages('ggplot2') library(psych) library(ggplot2) dat=mtcars

For this particular example, we are going to plot the interaction between the transmission type (variable name: `am`

; 0 = automatic, 1 = manual) and V/S* (variable name: `vs`

), for vehicle fuel efficiency (variable name: `mpg`

). A precautionary first step in creating a bar graph is ensuring that `R`

knows to treat your independent variables as categorical factors and not continuous variables (this matters for plotting with `ggplot2`

):

dat$am=as.factor(dat$am) dat$vs=as.factor(dat$vs)

Next, we are going to use the `psych`

package’s `describeBy`

function, in order to calculate summary statistics for `mpg`

, for each combination of `am`

and `vs`

levels, and we will store these summary statistics in a new data frame called `dat2`

.

dat2 = describeBy(dat$mpg,list(dat$am,dat$vs), mat=TRUE,digits=2)

Have a look at the new data frame, `dat2`

. `describeBY`

will give very generic names (“group1” and “group2”) to your two independent variables. The next few lines of code will give more meaningful names to your variables (`Transmission`

and `VS`

, respectively, because we listed `am`

before `vs`

in the previous line of code), and then we will recode the levels of `Transmission`

to be more descriptive (from 0 and 1 to Automatic and Manual).

names(dat2)[names(dat2) == 'group1'] = 'Transmission' names(dat2)[names(dat2) == 'group2'] = 'VS' levels(dat2$Transmission)[levels(dat2$Transmission)=='0'] = 'Automatic' levels(dat2$Transmission)[levels(dat2$Transmission)=='1'] = 'Manual'

Lastly, we will create a new variable for the standard errors of each group (called `se`

) by dividing the group standard deviations (`sd`

) by the sqrt group sizes (`n`

):

dat2$se = dat2$sd/sqrt(dat2$n)

Now we can finally dive into some plotting-related code. We are going to quickly define values for our 95% confidence interval bars (in `limits`

), the spacing for the bars in our bar graph (in `dodge`

), and a theme that encapsulate a number of formatting changes to make our figure compliant with APA style (in `apatheme`

):

limits = aes(ymax = mean + (1.96*se), ymin=mean - (1.96*se)) dodge = position_dodge(width=0.9) apatheme=theme_bw()+ theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(), panel.border=element_blank(), axis.line=element_line(), text=element_text(family='Times'))

And finally, the main event: coding the plot itself (into an object called `p`

). We will map levels of Transmission to the x-axis, the mean of MPG to the y-axis, and fill bars with different color depending on levels of VS. We will then add bars to the plot (via `geom_bar`

), spacing them by the `dodge`

parameter we specified earlier. Then add the 95% CIs (via `geom_errorbar`

), also spaced by the same `dodge`

parameter. Finally, take care of some minor formatting by applying our APA theme, relabeling the y-axis, and using a grey-scale color-scheme for the bars, appropriate for publication.

p=ggplot(dat2, aes(x = Transmission, y = mean, fill = VS))+ geom_bar(stat='identity', position=dodge)+ geom_errorbar(limits, position=dodge, width=0.25)+ apatheme+ ylab('Mean MPG')+ scale_fill_grey() p

If you like how it looks, use the `ggsave`

command to save a high-res .png image file to your home directory.

ggsave('barplot.png', width=6, height=6, unit='in', dpi=300)

**2-Way Interactions with One Categorical and One Continuous Variable**

Visualizing an interaction between a categorical variable and a continuous variable is the easiest of the three types of 2-way interactions to code (usually done in regression models). In this example, we will visualize the interaction between the same transmission type variable as before (variable name: `am`

) and the weight of vehicle (variable name: `wt`

) for vehicle fuel efficiency (variable name: `mpg`

).

We will start off, as before, by ensuring that R accurately treats transmission type as a categorical variable:

dat$am=as.factor(dat$am)

Then, we will actually need to run the analysis (note the significant interaction, using the `summary`

function) underlying this visualization (output stored in `model1`

), in order to create a column of predicted values (called `predicted`

) in our data frame, for plotting lines of best fit:

model1=lm(mpg~am*wt, data=dat) summary(model1) dat$predicted=predict(model1)

Next, we’ll store the same APA-format template to use for this plot:

apatheme=theme_bw()+ theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(), panel.border=element_blank(), axis.line=element_line(), text=element_text(family='Times'))

And then we can go right into creating our scatterplot (called `p`

), mapping weight to the x-axis, mpg to the y-axis, and using different shapes for the data points based on transmission type. We’ll add data points (via `geom_point`

), and manually specify shape numbers corresponding to solid and hollow circles (good for two-group plots; hollow for Automatic cars, solid for Manual cars), and creating a more informative legend with descriptive labels. Add a separate line of best fit each group (solid for Automatic, and dashed for Manual), relabel the x- and y-axes, and apply the APA template:

p=ggplot(dat, aes(x=wt, y=mpg, shape=am))+ geom_point()+ scale_shape_manual(values=c(1,16), name='Transmission', labels=c('Automatic','Manual'))+ geom_line(aes(x = wt, y = predicted, linetype=am)) + scale_linetype_discrete(name='Transmission', labels=c('Automatic','Manual'))+ labs(x = 'Vehicle Weight', y = 'Vehicle MPG')+ apatheme p

Save the plot, and Bob’s your Uncle:

ggsave('scatter1.png', width=6, height=6, unit='in', dpi=300)

One word of caution: be sure to use the same name and labels when specifying the legend details for the shape and line-type parameters, otherwise it will create two separate legends (in this case: … name=”Transmission”, labels=c(“Automatic”,”Manual”)).

**2-Way Interactions with Two Continuous Variables (updated Mar. 22, 2016)**

Visualizing an interaction between two continuous variables presents a bit of a problem. Specifically, because there is a continuous range of moderator values, it’s not as if there are discrete lines of best–there are no groups here. Instead, there are a number of simple slopes–presumably varying in magnitude and direction–for the entire range of the moderator variable. Often times, only a couple simple slopes are plotted, from one standard deviation above and below the mean of the moderator variable. I am going to keep with this practice, though I am very open to suggestions for how to improve this plot (and either of the others too!).

We are going to be looking at the interaction between vehicle weight (variable name: `wt`

) and number of gears (variable name: `gear`

) for predicting vehicle fuel efficiency (variable name: `mpg`

). A quick disclaimer: `gear`

is a variable with only three different values: 3, 4, and 5. As such, it’s pretty tenuous to call it continuous, but I’m just using it for the sake of a plotting example.

The first thing we will need to do is run our regression model in order to calculate the simple slopes of `wt`

on `mpg`

at different levels of `gear`

. I like to use Kris Preacher’s simple intercepts/slopes calculator, but do what works for you–but you need BOTH the simple slopes AND (less commonly reported by some programs, like the PROCESS Macro) the simple intercepts.

model2=lm(mpg~gear*wt, data=dat)

After running the model, I used the `vcov`

command to get the parameter variances and covariances for Kris’s interaction utility. Then, I got the degrees of freedom for the model from the `summary`

of the regression model, and used the `psych`

package’s `describe`

function to get the mean and standard deviation of `gear`

, so that I could plug in the values for 1 SD below the mean, the mean itself, and 1 SD above the mean, as the conditional values of the moderator. I also scooped the min and max of `wt`

as the x-variable points.

summary(model2) vcov(model2) describe(dat)

Click “Calculate”, and scroll down to the “Simple Intercepts and Slopes at Conditional Values of Z” section of output in the first window of the interaction utility–these are the values we will use to plot our simple slope lines.

Thanks to a question/suggestion by James Conigrave, I’ve been able to update the code make the plot more pretty. Make a plot called `p`

, and map `wt`

to the x-axis, `mpg`

to the y-axis, and our continuous moderator `gear`

to size–this will produce smaller or bigger points in our scatter plot depending on the value of `gear`

. Give your axes more informative labels using the `labs`

command. We’re going to be creating a legend for different styles of regression lines, and data point size is a pretty intuitive visualization (you can just mention its interpretation in a table note, if necessary), so silence the default size-based legend using “guide = FALSE” in the `scale_size_continuous`

command (otherwise the plot looks cluttered). Now we manually add our simple slopes using the `geom_abline`

command, for which we need to–at the very least in this case–specify intercept and slope values. We will do this three times: first for the simple intercept and slope of `wt`

on `mpg`

at 1 SD below the mean of `gear`

, then at the mean of `gear`

, and then at 1 SD above the mean of `gear`

. In order to specify a different looking line for each (so we can distinguish them), we will enter labels for our lines in the “linetype” sub-command; this simultaneously assigns different linestyles to each simple slope, while controlling how the lines are labeled in the resulting legend. Then, we use `scale_linetype_manual`

in order to specify the particular styles of lines we want for each simple slope, the order we wish for each to appear in our legend, and a name for our legend. Lastly, add our APA format theme, and call the plot.

p=ggplot(dat, aes(x = wt, y = mpg, size = gear))+ geom_point()+ labs(x = 'Vehicle Weight', y = 'Vehicle MPG')+ scale_size_continuous(guide = FALSE)+ geom_abline(aes(intercept=33.965, slope=-4.3985, linetype='-1SD Gear'))+ geom_abline(aes(intercept=38.1208, slope=-5.854, linetype='Mean Gear'))+ geom_abline(aes(intercept=42.2767, slope=-7.3095, linetype='+1SD Gear'))+ scale_linetype_manual(values=c('dotted','dashed','solid'), breaks=c('-1SD Gear','Mean Gear','+1SD Gear'),name='Simple\nSlope')+ apatheme p

If you’re happy, save the plot.

ggsave('scatter2.png', width=6, height=6, unit='in', dpi=300)

**Wrap Up and Extras**

We’ve made some pretty plots, but you might have suggestions for how to improve them.

My buddy Matt Baldwin, for example, likes to put a vertical line(s) (based on the Johnson-Neyman technique; see Preacher, Curran, & Bauer, 2006, for a nice writeup) showing at what value(s) for a moderator that simple slopes diverge (see below). This can be easily done by using the `geom_vline()`

command, and entering in the particular J-N value for the `xintercept`

value (e.g., `geom_vline(xintercept=4.45)`

).

I’m sure there are a number of other ways to improve these plots. Feel free to post suggestions and/or code in the comments; I’ll amend the code in the core of the post based on really good suggestions, or for others, might provide example code and plots down here in the extras. And if you have any requests for a Make It Pretty post, let me know–I’m thinking I might do forest plots next.

**Footnotes**

*Full disclosure: I have no idea what V/S refers to. I’m pretty car-stupid, and the online documentation for the `mtcars`

dataset isn’t helpful for this particular variable. I simply selected it for the sake of example, as it produces a significant interaction with transmission type.

These are beautiful thank you for this post, I’ve been battling to add a legend for the three lines to your latter interaction plot.

Any ideas?

LikeLiked by 1 person

Thank you–glad you like it! I ended up finding a solution with the help of this post on StacksOverflow (http://stackoverflow.com/questions/18060116/adding-legend-to-ggplot-when-lines-were-added-manually). Amended code is below (and Ill make the default in the post later, because it’s much prettier than the original (so thank you for indirectly improving the plot)!

In a nutshell, for each abline you add, use the aesthetics subcommand to capture the intercept and slope values, but then for the linetype aestethics, specify text for the legend labels you’d like for each line. ggplot2 will automatically assign different linetypes to ablines with different text, so we add a scale_linetype_manual option, where we manually specify the linetypes we want for each simple slope (I think solid line for mean makes most sense), reorder the legend using the breaks option, and give the legend a sensible name. I also silenced the sizing legend because it already was ugly (sizing interpretation can just be conveyed in a note under your table), and it looks cluttered with both present–do this by adding the “guide=false) option to scale_size_continuous:

p=ggplot(dat, aes(x = wt, y = mpg, size = gear))+

geom_point()+

labs(x = ‘Vehicle Weight’, y = ‘Vehicle MPG’)+

scale_size_continuous(name=’# of Gears’,

guide = FALSE)+

geom_abline(aes(intercept=33.965,

slope=-4.3985,

linetype=’-1SD’))+

geom_abline(aes(intercept=38.1208,

slope=-5.854,

linetype=’Mean’))+

geom_abline(aes(intercept=42.2767,

slope=-7.3095,

linetype=’+1SD’))+

scale_linetype_manual(values=c(‘dotted’,’dashed’,’solid’),

breaks=c(‘-1SD’,’Mean’,’+1SD’),

name=’Simple\nSlope’)+

apatheme

p

LikeLike

Thank you, this is a great help!

LikeLiked by 1 person

Hi, I am really thankful for such a nice explanation about interaction mode. Could you please do a favor by describing the statistical significant effect of each graph you created? More precise, how can I understand, looking at the figure, whether the result is statistically significant particularly for the one continuous and one categorical interaction? Many thanks again!

LikeLike

Hi Winnipeg,

Unfortunately, there is no simple, foolproof way to eye-ball the figure to determine what, if anything, is significant. To get this kind of information, you first need to test the interaction term (in regression) to see if it is significant, and then, if so, follow up by testing the simple-slopes for significance. In the present example, this amounts to evaluating whether the slope of weight is significant among automatic cars, and then separately, whether the slope of weight is significant among manual cars. You can do this using some available simple-slope calculators (such as the one, here, supplied by Kris Preacher: http://quantpsy.org/interact/index.htm), or you can use a re-centering method. With the re-centering method, you simply conduct the same regression model two different ways, changing the way the categorical variable is dummy coded each time. For example, if you coded automatic as 0 and manual as 1, the slope of weight in the full model would represent the simple slope of weight for automatic cars; changing the dummy coding so that automatic is 1 and manual is 0 and running the same model again, will result in the slope of weight representing the simple slope of weight for manual cars.

Hope that helps! If you need a more thorough walkthrough of testing simple slopes, I recommend checking out Andrew Hayes’s newish book on testing mediation, moderation, and conditional process models in regression.

LikeLike

Hi jksakaluk! Very much appreciated for the explanation. Thanks again!

LikeLike

Hi, I was wondering if you could do an example for a three-way interaction plot, e.g. continuous x continuous x categorical interaction. Thanks!

LikeLike

Hi Syuhada, I’ll consider it, but even if I do, it probably won’t happen for awhile (I’m in the midst of preparing to move). However, I’m a little apprehensive; 3-way interactions are UGLY, and oftentimes research (at least in my area–psychology) is too terribly underpowered to be even attempting to look at such nuanced effects, though I appreciate this might not be the case in other areas of research. I’ll think it over. If you wanted to give it a shot yourself, though, I would think about creating separate plots (using facets) showing the 2-way continuous interactions (like I’ve plotted above) for separate levels of your third continuous moderators (e.g., a 2-way interaction plot for 1 SD below, a 2-way interaction plot for mean, and a 2-way interactionplot for 1 SD above, for the third moderator)

LikeLike

Thanks for the quick reply and your suggestions! I am in psychology too, though specifically psycholinguistics. I’ll try out your suggestion. 🙂 Hope your move goes smoothly!

LikeLike

[…] post. I’ve been quietly thrilled with how well my other two Make It Pretty posts have done. My post on visualizing various 2-way interactions (easily my most popular not-current-issue post) has been viewed over 1000 times, and more […]

LikeLike

[…] it pretty: Plotting 2-way interactions with GGplot2. Nice tutorial & […]

LikeLike