15. Make It Pretty: DIY Pirate Plots in ggplot2

January. Is. Over. Though the world is on fire/everything is spiders, in my work-life, I am quite relieved: I have escaped the most gruelling month of continuos large deadlines that I’ve ever had to manage (#NewFacultyLife), while (mostly) getting everything done that I needed to. Hurray!

depression xalt.png
DJT is the Lord of Spiders

So now that I am entering a month when I will encounter the thing-of-myth referred to as, free time, I wanted to quickly hammer out a blog post to start getting back in the swing of posting more regularly. With that, it’s a new MIP post, this time focusing on making DIY “Pirate Plots”.

I’ve been wanting to blog about DIY Pirate-Plotting for awhile, but initially felt some ambivalence about it. Why? Because someone named Nathaniel Phillips has already made a package for creating Pirate Plots. And I don’t want to seem like I’m trying to throw shade, because dude seems really nice, talented, and willing to create free helpful resources (I’ve heard really good things about this book) for the R-using community. But I made a Twitter poll to solicit input for whether blogging about how to make a Pirate Plot, manually, using ggplot2(), is a dick-move, and the overwhelming majority of the folks who answered indicated that it’s not.

Still, all glory and good-things to Nathaniel for introducing us to Pirate Plots because they look like an awesome thing. But then you might be asking yourselves, “Why’s he writing/sharing code to do something there’s already a package for…?” My reasoning is this: I don’t know the ins-and-outs of the yarrr() package; I do know the ins-and-outs (relatively speaking) of ggplot2(); and sometimes folks might want to create/tweak Pirate Plots and with code they are already familiar with (e.g., someone who prefers making visualizations in base-R might want to write/share code for that, if they aren’t down with ggplot2()). Different folks, strokes, and all that good stuff. So with that…

Pirate Plots, as Phillips describes them, are a form of “RDI” visualization (Raw data, Descriptive statistic, and Inferential statistic). Check out the image from his site, below, and let the plotting goodness sink in:

Pirate.png

There’s lots to like about this kind of plot, I think, if you’re trying to visualize group comparisons. The bar/band combo, in effect, reproduces what you would get with a boring ol’ bar graph with 95% CIs. But that bean! And those raw data! Not only do you get a visually intuitive characterization of the shape the distribution of data in each condition, but you also get raw data points to help with visual inspection of potentially influential cases. So much information, crammed into one visualization. What’s not to like?

Okay, so let’s start off with some made-up data: you want to visualize the data on some dependent variable (“dv”) by some grouping variable (“group”; I know; I’m killing it with these creative variable names).

dv = c(2,4,2,3,4,1,1,1,3,5,
3,4,2,4,5,6,7,7,4,3,
8,9,2,8,10,10,4,6,7,8)

group = c('A','A','A','A','A','A','A','A','A','A',
'B','B','B','B','B','B','B','B','B','B',
'C','C','C','C','C','C','C','C','C','C')

dat = data.frame(dv, group)

One of the problems needing a solution, with Pirate Plotting with ggplot2() is that we don’t just want to visualize the Raw data above; we also need to visualize the Descriptive and Inferential statistics (i.e., group means and 95% CIs). To do this, we’ll make use of the psych() package’s describeBy  function, in order to produce descriptive statistics (we’ll only use the mean and standard error) of dv for each level of group. We’ll then save these summary statistics in a separate data frame called “descriptives”, and save the relevant bits we want in a another data frame called “plotdat”.

library(psych)
#Save descriptive statistics by group in data frame called 'descriptives'
descriptives = describeBy(x = dat$dv, group = dat$group)

#Save vectors of group names, and the group means/se's from 'descriptives'.
#Store them in 'plotdat'
group = c('A', 'B', 'C')
means = c(descriptives$A$mean, descriptives$B$mean,descriptives$C$mean)
se = c(descriptives$A$se, descriptives$B$se,descriptives$C$se)
plotdat=data.frame(group, means, se)

If you were so inclined, at this point you might run an ANOVA to compare the three groups, and do some follow up tests (e.g., with Tukey’s HSD).

model1 = aov(data = dat, dv ~ group)
summary(model1)
TukeyHSD(model1)

Great. We reject the null that the groups are equal, and Tukey’s HSD tests inform us that the significance differences lie between groups A and C, and groups B and C. But what about our pretty Pirate Plot?

We’ll start, as per usual, with saving an APA-format compliant theme, which we’ll use later when we specify the plot:

library(ggplot2)

apatheme=theme_bw()+
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        text=element_text(family='Arial'),
        legend.title=element_blank(),
        legend.position=c(.7,.8),
        axis.line.x = element_line(color='black'),
        axis.line.y = element_line(color='black'))

As for the Pirate Plot syntax, there’s not much to it. We’ll specify the major elements of the plot based on the data frame with our summary statistics,plotdat , mapping groups to the x-axis, and means to the y-axis. Then we’ll add our bean and raw data elements (via geom_violin() and geom_jitter(), respectively–the jitter will provide us with a bit of random scatter to better see the raw data point), but we’ll specify the coordinates of these elements using the data frame containing our raw data, dat. Then, we can revert back to using our plotdat data frame, and add larger data points for our group means, via geom_point(), and 95% CIs using geom_errorbar()(by multiplying each standard error by +/-1.96 and adding that value to each group mean). Finally, the apatheme object will make everything nice and pretty for your manuscript submission. Just remember to export a high-res version using ggsave().

#Use data frame of summary statistics ('plotdat')
#and map group to x-axis and means to y-axis
p1 = ggplot(data = plotdat, aes(x = group, y = means))+
#Insert bean plot based on data frame of raw data ('dat')
#and map group to x-axis and raw dv scores to y-axis
geom_violin(data= dat, aes(x = group, y = dv))+
#Likewise, add raw data points (jittered) with same mappings
geom_jitter(data= dat, aes(x = group, y = dv), shape = 1, width = .1)+
#Add data points (with no unique mapping manually specified, the data
#points will fall back on reflecting our originally-specified data,
#the means of each group)
geom_point(size = 3)+
#Add error bars for 95% CIs of each group mean
geom_errorbar(ymax= means+(1.96*se), ymin=means+(-1.96*se), width = 0.25)+
#Apply the APA-format theme object
apatheme

#Save high-res .png formatted version of your plot in your working directory
ggsave("pirate.png", width=6, height=6, unit="in", dpi=300)
p1

Very nice.

pirate.png
Yarrrr! B be less than C, see?

Granted, this is a pretty simple/sparse dataset. If you want to see what this code will yield with bigger datasets, here’s the same code (with one exception) applied to a dataset I have with Relationship Quality scores for four different types of sexual relationships:

pirate.png

The *only* slight alteration I made to the above code, was that I added alpha = .3 to the geom_jitter() layer, which helps to make datapoints a bit transparent, so that the raw data don’t overwhelm the entire plot–you might want to consider this option if you have lots of raw data to plot.

Note, that if you are of the Bayes persuasion, I can’t help you. Despite Alex Etz’s best efforts (seriously, have him give a Bayes/JASP workshop for your department/program/lab. He’s both a Bayes wizard AND gives examples about wizards!!!), I can’t even tell you what a Bayesian HDI is, let alone show you how to calculate one. So if you like HDIs, stick to the yarrr() package, or write your own code for doing this in ggplot2() if you are so inclined.

1ive07.jpg
Alex the Bayes has banned me from using Bayes/JASP until I know more*

*This actually never happened, but it’s funny to think about Etz walking around in Gandalf-garb, screaming at people about Bayesian stuff.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s