I am a frequent
R user, but this wasn’t always the case. Though I have had
R installed on my computer for about four years now, it has only been in the last year and a half that I turned a corner and began to use
R on the near-daily. The
ggplot2 package was a huge reason why, and I think if you are looking to become a more regular
R user too,
ggplot2 is a fantastic place to start.
ggplot2 is an
R package for creating attractive visualizations of data. The first time I really noticed
ggplot2 was when I stumbled across this blog post on Academic Inflation in Academic Literature. Damn. Those are some pretty figures. Even better,
ggplot2 syntax is, in my opinion, some of the easiest
R code to understand.
Let’s say, for example, that I had some experimental data on the association between jealousy and relationship satisfaction; pretend I randomly assigned participants to an attachment security priming condition (v. a control condition) to see if that would decrease the level of jealousy they feel, thereby increasing the level of satisfaction they feel. At the end of the post, there is some code you can use to simulate data in R to follow along with this example:
One of the main qualities that I love about
ggplot2 is just how little code is needed to produce an initial visualization of data–often just two lines are needed. The first line is common to many different types of ggplots, and involves you mapping variables from your dataset to the x and y axes of your to-be-made plot. After that, you can just add box-plots, data points, error bars, etc., as you see fit. For example, in six lines of code, I have made a box plot, a violin plot, and a scatter plot.
#If you haven't already, install and call the ggplot2 package install.packages('ggplot2') library(ggplot2) #create a plot with security condition on x, and jealousy score on y bp=ggplot(dat, aes(x=security, y=jealousy))+ #add a boxplot geom_boxplot() #show the boxplot bp #create a plot with security condition on x, and jealousy score on y vp=ggplot(dat, aes(x=security, y=jealousy))+ #add a violin plot geom_violin() #show the violin plot vp #create a plot with jealousy score on x, and satisfaction score on y sp=ggplot(dat, aes(x=jealousy, y=satisfaction))+ #add data points geom_point() #show the scatter plot sp
But the other quality of
ggplot2 that I really love is how much control you have over the way your plots are created. And I must warn you, it gets addictive to start fussing over your plots until they appear just the way you want them. As academics, we have so few sources of immediate gratification in the research process; I’ve found tweaking code and finally getting my plots to appear as I envisioned to be really encouraging.
For example, let’s dress up our violin plots a bit, by combining violin and box plots, adding some color, and improving the formatting:
#fancy violin plot: map security condition to x axis and jealousy score to y axis vp.fancy=ggplot(dat, aes(x=security, y=jealousy))+ #add violins, but scale their size to the number of observations in each, #and fill them in color based on security condition geom_violin(scale='count',aes(fill=security))+ #overlay boxplots over each violin, color them black, and whiskers grey, and notch them at median geom_boxplot(width=.12, fill=I('black'), notch=T, col='grey40')+ #add a 'point' representing the mean, and color it white stat_summary(fun.y='mean', geom='point', shape=20, col='white')+ #label your axes with full titles labs(x='Priming Condition', y='Jealousy Rating')+ #the next line controls the coloring/styling--this is a pre-programmed pallete theme_classic()+ #remove the redunant legend since we already have condition mapped to x theme(legend.position='none') #show the plot vp.fancy
Colorful plots like these are nice for online visualizations of data, but in print, we’re often confined to using black and white figures that must match a number of other discipline-specific formatting requirements. I created an APA-themed template to make formatting my ggplots quicker and easier:
#Save some time and store APA format-related code in an object so you can easily #use it in multiple plots apatheme=theme_bw()+ theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(), panel.border=element_blank(), axis.line=element_line(), text=element_text(family='Times'), legend.title=element_blank())
For example, if I wanted to make my simple scatter plot a bit fancier, perhaps by showing which data points belong to which condition, and plotting fitted regression lines for both groups, all the while using APA format, I could use:
#Create fancy scatter plot: map jealousy to x, satisfaction to y. #We also want to control the apperance of regression lines (linetype), #data points (shape), and the color of both (color), based on security condition. #Create fancy scatter plot: map jealousy to x, satisfaction to y. #We also want to control the apperance of regression lines (linetype), #data points (shape), and the color of both (color), based on security condition. sp.fancy=ggplot(dat, aes(x=jealousy, y=satisfaction, linetype=security, shape=security, color=security))+ #Add data points geom_point()+ #Manually set color of lines to both be black; they are blue by default scale_color_manual(values=c('black','black'))+ #Manually set the shape to hollow and solid dots; #though these might look like the same shape with different colors, #they actually consitute different shapes scale_shape_manual(values=c(1,16)) + #Add fitted regression lines for each group. #Remove 'se=FALSE' to create confidence bands. #Remove 'fullrange=TRUE' to create unextrapolated lines. geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+ #Label x and y axes labs(x='Jealousy Rating', y='Satisfaction Rating')+ #Apply APA theme apatheme #Show the plot sp.fancy
Next time you have some data that you’d like to visualize for your next conference presentation or journal submission, I dare you to try making your figure with
ggplot2 and not find yourself addicted to making beautiful visualizations of data. If you need a good reference to get you started, I’ve found the R Graphics Cookbook to be an invaluable resource–it’s a book I now recommend to any of my friends getting into
R. And StackOverflow‘s
ggplot2 site is always there for Q&A.
Happy plotting 🙂
Use the code below to generate the same data I used for the plots in this post.
#Set the randomization seed to 705, so you will get the same values as I do set.seed(705) #First simulate 100 participants randomly assigned to security (1) or control (0) condition ina dataframe called &amp;amp;amp;amp;quot;dat&amp;amp;amp;amp;quot;... dat=data.frame(security = rbinom(n = 100, size = 1, prob = .5)) #then simulate their scores on jealousy, based on their condition (security = less jealousy)... dat$jealousy=0 + -.5*dat$security + rnorm(100, sd = 1) #and simulate relationship satisfaction scores, based on their level of jealousy (jealousy = less satifaction) dat$satisfaction=0 + -.2*dat$jealousy + rnorm(100, sd=1) #finally, label values of 1 as the security condition, and values of 0 as the control condition dat$security=factor(dat$security, levels = c(0,1), labels = c('Control', 'Security')) #Check to make sure your first 6 rows are the same as mine head(dat)
The code above should produce a dataset of 100 participants with values identical to the ones I’m working with. Check the output of your
head(dat) command; the plotting code I share below will still work if you have different values, but it will probably be more straightforward if you’re working with the exact same data see below.