2. The ggplot2 package: Your Gateway Drug to Becoming an R User

I am a frequent R user, but this wasn’t always the case. Though I have had R installed on my computer for about four years now, it has only been in the last year and a half that I turned a corner and began to use R on the near-daily. The ggplot2 package was a huge reason why, and I think if you are looking to become a more regular R user too, ggplot2 is a fantastic place to start.

ggplot2 is an R package for creating attractive visualizations of data. The first time I really noticed ggplot2 was when I stumbled across this blog post on Academic Inflation in Academic Literature. Damn. Those are some pretty figures. Even better, ggplot2 syntax is, in my opinion, some of the easiest R code to understand.

Let’s say, for example, that I had some experimental data on the association between jealousy and relationship satisfaction; pretend I randomly assigned participants to an attachment security priming condition (v. a control condition) to see if that would decrease the level of jealousy they feel, thereby increasing the level of satisfaction they feel. At the end of the post, there is some code you can use to simulate data in R to follow along with this example:

One of the main qualities that I love about ggplot2 is just how little code is needed to produce an initial visualization of data–often just two lines are needed. The first line is common to many different types of ggplots, and involves you mapping variables from your dataset to the x and y axes of your to-be-made plot. After that, you can just add box-plots, data points, error bars, etc., as you see fit. For example, in six lines of code, I have made a box plot, a violin plot, and a scatter plot.

#If you haven't already, install and call the ggplot2 package
install.packages('ggplot2')
library(ggplot2)

#create a plot with security condition on x, and jealousy score on y
bp=ggplot(dat, aes(x=security, y=jealousy))+
  #add a boxplot
  geom_boxplot()
#show the boxplot
bp

#create a plot with security condition on x, and jealousy score on y
vp=ggplot(dat, aes(x=security, y=jealousy))+
  #add a violin plot
  geom_violin()
#show the violin plot
vp

#create a plot with jealousy score  on x, and satisfaction score on y
sp=ggplot(dat, aes(x=jealousy, y=satisfaction))+
  #add data points
  geom_point()
#show the scatter plot
sp
blogplots
Simple box, violin, and scatter plots. Oooooh!

But the other quality of ggplot2 that I really love is how much control you have over the way your plots are created. And I must warn you, it gets addictive to start fussing over your plots until they appear just the way you want them. As academics, we have so few sources of immediate gratification in the research process; I’ve found tweaking code and finally getting my plots to appear as I envisioned to be really encouraging.

For example, let’s dress up our violin plots a bit, by combining violin and box plots, adding some color, and improving the formatting:

#fancy violin plot: map security condition to x axis and jealousy score to y axis
vp.fancy=ggplot(dat, aes(x=security, y=jealousy))+
  #add violins, but scale their size to the number of observations in each,
  #and fill them in color based on security condition
  geom_violin(scale='count',aes(fill=security))+
  #overlay boxplots over each violin, color them black, and whiskers grey, and notch them at median
  geom_boxplot(width=.12, fill=I('black'), notch=T, col='grey40')+
  #add a 'point' representing the mean, and color it white
  stat_summary(fun.y='mean', geom='point', shape=20, col='white')+
  #label your axes with full titles
  labs(x='Priming Condition', y='Jealousy Rating')+
  #the next line controls the coloring/styling--this is a pre-programmed pallete
  theme_classic()+
  #remove the redunant legend since we already have condition mapped to x
  theme(legend.position='none')
#show the plot
vp.fancy
Oh so pretty.
Oh so pretty.

Colorful plots like these are nice for online visualizations of data, but in print, we’re often confined to using black and white figures that must match a number of other discipline-specific formatting requirements. I created an APA-themed template to make formatting my ggplots quicker and easier:

#Save some time and store APA format-related code in an object so you can easily
#use it in multiple plots
apatheme=theme_bw()+
  theme(panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),
        panel.border=element_blank(),
        axis.line=element_line(),
        text=element_text(family='Times'),
        legend.title=element_blank())

For example, if I wanted to make my simple scatter plot a bit fancier, perhaps by showing which data points belong to which condition, and plotting fitted regression lines for both groups, all the while using APA format, I could use:

#Create fancy scatter plot: map jealousy to x, satisfaction to y.
#We also want to control the apperance of regression lines (linetype),
#data points (shape), and the color of both (color), based on security condition.
#Create fancy scatter plot: map jealousy to x, satisfaction to y.
#We also want to control the apperance of regression lines (linetype),
#data points (shape), and the color of both (color), based on security condition.
sp.fancy=ggplot(dat, aes(x=jealousy, y=satisfaction, linetype=security, shape=security, color=security))+
  #Add data points
  geom_point()+
  #Manually set color of lines to both be black; they are blue by default
  scale_color_manual(values=c('black','black'))+
  #Manually set the shape to hollow and solid dots;
  #though these might look like the same shape with different colors,
  #they actually consitute different shapes
  scale_shape_manual(values=c(1,16)) +
  #Add fitted regression lines for each group.
  #Remove 'se=FALSE' to create confidence bands.
  #Remove 'fullrange=TRUE' to create unextrapolated lines.
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
  #Label x and y axes
  labs(x='Jealousy Rating', y='Satisfaction Rating')+
  #Apply APA theme
  apatheme
#Show the plot
sp.fancy
fancyscatter
Ta-da! A much fancier APA-formatted scatterplot

Next time you have some data that you’d like to visualize for your next conference presentation or journal submission, I dare you to try making your figure with ggplot2 and not find yourself addicted to making beautiful visualizations of data. If you need a good reference to get you started, I’ve found the R Graphics Cookbook to be an invaluable resource–it’s a book I now recommend to any of my friends getting into R. And StackOverflow‘s ggplot2 site is always there for Q&A.

Happy plotting 🙂

Vince from Slap Plot:
Vince from Slap Plot: “Stop making boring plots; stop having a boring life!”

**********************************************************

Use the code below to generate the same data I used for the plots in this post.

#Set the randomization seed to 705, so you will get the same values as I do
set.seed(705)

#First simulate 100 participants randomly assigned to security (1) or control (0) condition ina dataframe called "dat"...
dat=data.frame(security = rbinom(n = 100, size = 1, prob = .5))

#then simulate their scores on jealousy, based on their condition (security = less jealousy)...
dat$jealousy=0 + -.5*dat$security + rnorm(100, sd = 1)

#and  simulate relationship satisfaction scores, based on their level of jealousy (jealousy = less satifaction)
dat$satisfaction=0 + -.2*dat$jealousy + rnorm(100, sd=1)

#finally, label  values of 1 as the security condition, and values of 0 as the control condition
dat$security=factor(dat$security,
                       levels = c(0,1),
                       labels = c('Control', 'Security'))

#Check to make sure your first 6 rows are the same as mine
head(dat)

The code above should produce a dataset of 100 participants with values identical to the ones I’m working with. Check the output of your head(dat) command; the plotting code I share below will still work if you have different values, but it will probably be more straightforward if you’re working with the exact same data see below.

If your first six rows match mine, you're good good to go.
If your first six rows match mine, you’re good good to go.
Advertisements

7 thoughts on “2. The ggplot2 package: Your Gateway Drug to Becoming an R User

  1. hi, Jksakaluk, thank you so much for sharing this!
    I am trying to use ggplot2 to get an APA sytle graph now, but I copied you code and run in my own PC, and found that the x and y axis are missing. Hope that you could give me some clues to fix it.
    thanks a lot!

    Like

    • Hi Chuanpeng–I *think* the problem is that ggplot2 updated and the APA theme code is now broken because of a few changes. Try using the new APA theme code in my most recent MIP post (scree plot & parallel analysis) and re-running the plotting code and see if that works for you.

      Liked by 1 person

      • Hi, John, Thank you so much for you quick reply!!
        Shame of me to get your name wrong, sorry for that! I’ve read your post about “nobody and somebody”!!
        Thanks for sharing your experience with R.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s