January. Is. Over. Though the world is on fire/everything is spiders, in my work-life, I am quite relieved: I have escaped the most gruelling month of continuos large deadlines that I’ve ever had to manage (#NewFacultyLife), while (mostly) getting everything done that I needed to. Hurray!
So now that I am entering a month when I will encounter the thing-of-myth referred to as, free time, I wanted to quickly hammer out a blog post to start getting back in the swing of posting more regularly. With that, it’s a new MIP post, this time focusing on making DIY “Pirate Plots”.
I’ve been wanting to blog about DIY Pirate-Plotting for awhile, but initially felt some ambivalence about it. Why? Because someone named Nathaniel Phillips has already made a package for creating Pirate Plots. And I don’t want to seem like I’m trying to throw shade, because dude seems really nice, talented, and willing to create free helpful resources (I’ve heard really good things about this book) for the R-using community. But I made a Twitter poll to solicit input for whether blogging about how to make a Pirate Plot, manually, using
ggplot2(), is a dick-move, and the overwhelming majority of the folks who answered indicated that it’s not.
Still, all glory and good-things to Nathaniel for introducing us to Pirate Plots because they look like an awesome thing. But then you might be asking yourselves, “Why’s he writing/sharing code to do something there’s already a package for…?” My reasoning is this: I don’t know the ins-and-outs of the
yarrr() package; I do know the ins-and-outs (relatively speaking) of
ggplot2(); and sometimes folks might want to create/tweak Pirate Plots and with code they are already familiar with (e.g., someone who prefers making visualizations in base-R might want to write/share code for that, if they aren’t down with
ggplot2()). Different folks, strokes, and all that good stuff. So with that…
Pirate Plots, as Phillips describes them, are a form of “RDI” visualization (Raw data, Descriptive statistic, and Inferential statistic). Check out the image from his site, below, and let the plotting goodness sink in:
There’s lots to like about this kind of plot, I think, if you’re trying to visualize group comparisons. The bar/band combo, in effect, reproduces what you would get with a boring ol’ bar graph with 95% CIs. But that bean! And those raw data! Not only do you get a visually intuitive characterization of the shape the distribution of data in each condition, but you also get raw data points to help with visual inspection of potentially influential cases. So much information, crammed into one visualization. What’s not to like?
Okay, so let’s start off with some made-up data: you want to visualize the data on some dependent variable (“dv”) by some grouping variable (“group”; I know; I’m killing it with these creative variable names).
dv = c(2,4,2,3,4,1,1,1,3,5, 3,4,2,4,5,6,7,7,4,3, 8,9,2,8,10,10,4,6,7,8) group = c('A','A','A','A','A','A','A','A','A','A', 'B','B','B','B','B','B','B','B','B','B', 'C','C','C','C','C','C','C','C','C','C') dat = data.frame(dv, group)
One of the problems needing a solution, with Pirate Plotting with
ggplot2() is that we don’t just want to visualize the Raw data above; we also need to visualize the Descriptive and Inferential statistics (i.e., group means and 95% CIs). To do this, we’ll make use of the
describeBy function, in order to produce descriptive statistics (we’ll only use the mean and standard error) of
dv for each level of
group. We’ll then save these summary statistics in a separate data frame called “descriptives”, and save the relevant bits we want in a another data frame called “plotdat”.
library(psych) #Save descriptive statistics by group in data frame called 'descriptives' descriptives = describeBy(x = dat$dv, group = dat$group) #Save vectors of group names, and the group means/se's from 'descriptives'. #Store them in 'plotdat' group = c('A', 'B', 'C') means = c(descriptives$A$mean, descriptives$B$mean,descriptives$C$mean) se = c(descriptives$A$se, descriptives$B$se,descriptives$C$se) plotdat=data.frame(group, means, se)
If you were so inclined, at this point you might run an ANOVA to compare the three groups, and do some follow up tests (e.g., with Tukey’s HSD).
model1 = aov(data = dat, dv ~ group) summary(model1) TukeyHSD(model1)
Great. We reject the null that the groups are equal, and Tukey’s HSD tests inform us that the significance differences lie between groups A and C, and groups B and C. But what about our pretty Pirate Plot?
We’ll start, as per usual, with saving an APA-format compliant theme, which we’ll use later when we specify the plot:
library(ggplot2) apatheme=theme_bw()+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), panel.border = element_blank(), text=element_text(family='Arial'), legend.title=element_blank(), legend.position=c(.7,.8), axis.line.x = element_line(color='black'), axis.line.y = element_line(color='black'))
As for the Pirate Plot syntax, there’s not much to it. We’ll specify the major elements of the plot based on the data frame with our summary statistics,
plotdat , mapping
groups to the x-axis, and
means to the y-axis. Then we’ll add our bean and raw data elements (via
geom_jitter(), respectively–the jitter will provide us with a bit of random scatter to better see the raw data point), but we’ll specify the coordinates of these elements using the data frame containing our raw data,
dat. Then, we can revert back to using our
plotdat data frame, and add larger data points for our group means, via
geom_point(), and 95% CIs using
geom_errorbar()(by multiplying each standard error by +/-1.96 and adding that value to each group mean). Finally, the
apatheme object will make everything nice and pretty for your manuscript submission. Just remember to export a high-res version using
#Use data frame of summary statistics ('plotdat') #and map group to x-axis and means to y-axis p1 = ggplot(data = plotdat, aes(x = group, y = means))+ #Insert bean plot based on data frame of raw data ('dat') #and map group to x-axis and raw dv scores to y-axis geom_violin(data= dat, aes(x = group, y = dv))+ #Likewise, add raw data points (jittered) with same mappings geom_jitter(data= dat, aes(x = group, y = dv), shape = 1, width = .1)+ #Add data points (with no unique mapping manually specified, the data #points will fall back on reflecting our originally-specified data, #the means of each group) geom_point(size = 3)+ #Add error bars for 95% CIs of each group mean geom_errorbar(ymax= means+(1.96*se), ymin=means+(-1.96*se), width = 0.25)+ #Apply the APA-format theme object apatheme #Save high-res .png formatted version of your plot in your working directory ggsave("pirate.png", width=6, height=6, unit="in", dpi=300) p1
Granted, this is a pretty simple/sparse dataset. If you want to see what this code will yield with bigger datasets, here’s the same code (with one exception) applied to a dataset I have with Relationship Quality scores for four different types of sexual relationships:
The *only* slight alteration I made to the above code, was that I added
alpha = .3 to the
geom_jitter() layer, which helps to make datapoints a bit transparent, so that the raw data don’t overwhelm the entire plot–you might want to consider this option if you have lots of raw data to plot.
Note, that if you are of the Bayes persuasion, I can’t help you. Despite Alex Etz’s best efforts (seriously, have him give a Bayes/JASP workshop for your department/program/lab. He’s both a Bayes wizard AND gives examples about wizards!!!), I can’t even tell you what a Bayesian HDI is, let alone show you how to calculate one. So if you like HDIs, stick to the
yarrr() package, or write your own code for doing this in
ggplot2() if you are so inclined.
*This actually never happened, but it’s funny to think about Etz walking around in Gandalf-garb, screaming at people about Bayesian stuff.