7. Make It Pretty: Forest and Funnel Plots for Meta-Analysis Using ggplot2

By now, I’ve made it pretty clear: I absolutely love the ggplot2 package for plotting visualizations of data. In fact, I’m pretty sure I’m addicted. But in the last couple of years, I’ve discovered another love–meta-analysis. Meta-analyses are often accompanied by two popular forms of data visualization: forest plots and funnel plots. In this post, I’ll show how quick-and-dirty forest and funnel plots can be created with the metafor package. After, I’ll show how we can instead use the ggplot2 package to create forest plots and use the ggplot2 package to create funnel plots, so that we can have pretty plots that are easy to change/stylize, and that can be produced regardless of which meta-analysis package for R that you elect to use.

Forest plots help to visualize both the raw data (alongside citation information) and summary statistics of a given meta-analysis. Individual effect sizes and their confidence intervals (usually 95%) are plotted for each study in the meta-analysis, as well as the meta-analytic average effect size and its confidence interval. And sometimes sub-group meta-analytic averages are plotted too (see below for an example).

burns1Funnel plots, alternatively, help to visualize the presence (or absence) of publication bias.  Typically, a measure of effect size precision (most commonly, standard error)  is plotted on the y-axis, and effect sizes are plotted on the x-axis. As studies increase in precision, their location around the meta-analytic average tends to narrow, which produces a funnel shape. Funnel plots are then assessed–either subjectively, or empirically–for asymmetry, which can be indicative of publication bias and other small study effects (see below for an example of both a symmetrical and asymmetrical funnel plot).

kjhugr-15-73-g002-l

 

Creating Quick-and-Dirty Forest/Funnel Plots with metafor

metafor (Viechtbauer, 2010) is a package for conducting meta-analysis in R that can make both forest and funnel plots quickly, and easily:

#Install and call the metafor and ggplot2 packages (we'll use ggplot2 later)
install.packages('metafor')
install.packages('ggplot2')
library(metafor)
library(ggplot2)

#Load data from Raudenbush's 1984 meta-analysis of teacher
#expectation effects on student IQ scores. yi contains standardized mean differences,
#sei contains standard error of standardized mean differences. There are also a number of moderator
#variables (some we'll use), and bibliographic info
dat = get(data(dat.raudenbush1985))

#Fit a simple random-effects meta-analytic model
random = rma(yi, sei=se, data=dat)
random

The rma function returns an estimated mean standardized difference of 0.0837, 95% CI (-0.0175, 0.1849). If we want to get forest and funnel plots for this model, we simply request the following:

#Provide forest plot for the model "random"
forest(random)
#Provide funnel plot for the model "random"
funnel(random)

These functions, respectively, return the following plots:

Not bad, but by no means would I call the plots created by these quick functions “pretty”. And in the case of the funnel plot, things get out of hand pretty quickly if you have many effect sizes (see below for one from a meta-analysis of my own with >200 effect sizes).  Further, though it is possible, I find the metafor syntax for manually altering the appearance of these plots to be really inaccessible. What if I want to show moderation of effect sizes by some categorical quality of the articles? Or change the axes? Or what if I want to produce forest and funnel plots for models that arent fit using metafor (e.g., I’m using metaSEM for mine)?

Rplot07
Can you tell if this plot is symmetrical or not? Good luck…

Creating Forest Plots with ggplot2

In order to code a pretty Forest Plot, I called in for help from my buddy Matt Baldwin. Awhile back, Matt was working on a meta-analysis and I supplied him with some forest plot code. But since then, Matt has made some changes that make for a much prettier plot than the one I had originally generated.

Making a pretty forest plot starts with some code to add the meta-analytic estimates for the overall dataset, and any subgroups of moderators that you are interested in (I’m using the Raudenbush (1985) example dataset that is built into the metafor package. The coding is a bit arduous, so I’m going to punt it to the end of the post. It’s probably more expeditious to just manually amend your meta-analytic datafile using your favourite spreadsheet software…

#Make a plot called 'p', and map citation data to y-axis, effect sizes to x-axis
#specify the min and max of the CIs, and give different shapes based on levels of tester
p=ggplot(dat, aes(y=cite, x=yi, xmin=lowerci, xmax=upperci, shape = tester))+
  #Add data points and color them black
  geom_point(color = 'black')+
  #Add 'special' points for the summary estimates, by making them diamond shaped
  geom_point(data=subset(dat, tester=='Summary'), color='black', shape=18, size=4)+
  #add the CI error bars
  geom_errorbarh(height=.1)+
  #Specify the limits of the x-axis and relabel it to something more meaningful
  scale_x_continuous(limits=c(-2,2), name='Standardized Mean Difference (d)')+
  #Give y-axis a meaningful label
  ylab('Reference')+
  #Add a vertical dashed line indicating an effect size of zero, for reference
  geom_vline(xintercept=0, color='black', linetype='dashed')+
  #Create sub-plots (i.e., facets) based on levels of setting
  #And allow them to have their own unique axes (so authors don't redundantly repeat)
  facet_grid(setting~., scales= 'free', space='free')+
  #Apply my APA theme
  apatheme
p

#Save plot in your working directory
ggsave(p, file='ggforest.png', width = 8, height=8, dpi=300)
ggforest
Pretty forest plot–thanks Baldwin!

Creating Funnel Plots with ggplot2

Creating basic funnel plots with ggplot2 is simple enough; they are, after all, just scatter plots with precision (e.g., standard error) on the y-axis, and effect size on the x-axis. But a plot so basic leaves much to be desired (see below for an example). Namely, a 95% confidence interval region for the meta-analytic estimate–as indicated by the shaded funnel-shaped region (see other example below)–is missing. And dammit, I wanted my pretty funnel-shaped region!

My plot’s salvation, much to my delight, was to be found–where else?–in a post on CrossValidated (check out the second answer if you’re interested). Their code needs a bit of a face-lift: (1) we typically prefer to plot standard error as our measure of precision (not sample size), (2) we typically orient with effect size plotted on the x-axis, not y-axis (3) the plot falls a bit short of APA-format compliant, (4) the plot would be prettier if the line indicating the meta-analytic effect was the same length as the lines indicating the 95% CI region, and (5) the plot could be more informative if the 95% CI for the meta-analytic estimate itself was plotted. Here we go…assume a data frame named “dat” with a vector of Fisher-transformed correlations (Zr), and a vector of standard errors for those transformed correlations (corr_zi_se).

#Call the ggplot2 package, if you haven't already done so
library(ggplot2)

#Store the meta-analytic estimate and its standard error from whatever model you run (substitute your own values)
estimate = 0.0892
se = 0.0035

#Store a vector of values that spans the range from 0
#to the max value of impression (standard error) in your dataset.
#Make the increment (the final value) small enough (I choose 0.001)
#to ensure your whole range of data is captured
se.seq=seq(0, max(dat$corr_zi_se), 0.001)

#Now, compute vectors of the lower-limit and upper limit values for
#the 95% CI region, using the range of SE that you generated in the previous step, and the stored value of your meta-analytic estimate.
ll95 = estimate-(1.96*se.seq)
ul95 = estimate+(1.96*se.seq)

#You can do this for a 99% CI region too
ll99 = estimate-(3.29*se.seq)
ul99 = estimate+(3.29*se.seq)

#And finally, do the same thing except now calculating the confidence interval
#for your meta-analytic estimate based on the stored value of its standard error
meanll95 = estimate-(1.96*se)
meanul95 = estimate+(1.96*se)

#Now, smash all of those calculated values into one data frame (called 'dfCI').
#You might get a warning about '...row names were found from a short variable...'
#You can ignore it.
dfCI = data.frame(ll95, ul95, ll99, ul99, se.seq, estimate, meanll95, meanul95)

#Now we can actually make the funnel plot.
#Using your original data-frame, map standard error to your x-axis (for now) and Zr to your y-axis
fp = ggplot(aes(x = se, y = Zr), data = dat) +
#Add your data-points to the scatterplot
  geom_point(shape = 1) +
#Give the x- and y- axes informative labels
  xlab('Standard Error') + ylab('Zr')+
#Now using the 'dfCI' data-frame we created, plot dotted lines corresponding
#to the lower and upper limits of your 95% CI region,
#And dashed lines corresponding to your 99% CI region
  geom_line(aes(x = se.seq, y = ll95), linetype = 'dotted', data = dfCI) +
  geom_line(aes(x = se.seq, y = ul95), linetype = 'dotted', data = dfCI) +
  geom_line(aes(x = se.seq, y = ll99), linetype = 'dashed', data = dfCI) +
  geom_line(aes(x = se.seq, y = ul99), linetype = 'dashed', data = dfCI) +
#Now plot dotted lines corresponding to the 95% CI of your meta-analytic estimate
   geom_segment(aes(x = min(se.seq), y = meanll95, xend = max(se.seq), yend = meanll95), linetype='dotted', data=dfCI) +
  geom_segment(aes(x = min(se.seq), y = meanul95, xend = max(se.seq), yend = meanul95), linetype='dotted', data=dfCI) +
#Reverse the x-axis ordering (se) so that the tip of the funnel will appear
#at the top of the figure once we swap the x- and y-axes...
  scale_x_reverse()+
#Specify the range and interval for the tick-marks of the y-axis (Zr);
#Choose values that work for you based on your data
  scale_y_continuous(breaks=seq(-1.25,2,0.25))+
#And now we flip the axes so that SE is on y- and Zr is on x-
  coord_flip()+
#Finally, apply my APA-format theme (see code at end of post).
#You could, alternatively, specify theme_bw() instead.
  apatheme

#Call the pretty funnel plot
fp

#Save the funnel plot to working directory using ggsave
ggsave('funnel.png', width=6, height=6, unit="in", dpi=300)
funnel
Putting the “pretty fun” in “pretty funnel plot”

Much better–pretty CI regions for both 95% and 99% confidence. From here, it would be pretty easy to amend the above code to add any number of additional features. For example, perhaps you will want to map the shape of data-points to some categorical moderator of effect sizes. Or maybe you have identified effect-sizes that are outliers, and want to highlight them in a different color by mapping outlier-codes to the fill or color of data points. Or maybe you’d like another confidence region around an effect size of zero. Whatever you do to your forest and funnel plots from this point onward, stay pretty my friends.

4282183.jpg

Below are the ggplot2 settings that I program into my APA-format theme.

#My APA-format theme
apatheme=theme_bw()+
  theme(panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),
        panel.border=element_blank(),
        axis.line=element_line(),
        text=element_text(family='Times'),
        legend.position='none')

Below is the code for whipping the Raudenbush data into shape for pretty forest-plotting–you should be able to easily amend for your own dataset.

#Call the metafor package
library(metafor)

#Use the Raudenbush1985 dataset
dat=get(data(dat.raudenbush1985))

#Create 'cite' vector by merging author and year columns
dat$cite=NA
dat$cite=paste(dat$author, dat$year)

#Reorder bibliographic info based on value of d (yi), so effect sizes can be plotted in descending order
dat$cite=reorder(dat$cite, dat$yi, FUN=mean)

#Look at your overall meta-analytic estimate (random-effects)
random=rma(yi, vi, data=dat)
random

#If you want to look at estimates for levels of categorical moderators (e.g., tester and setting) simply subtract the intercept from each model
setting=rma(yi, vi, mods= ~factor(setting)-1, data=dat)
setting
tester=rma(yi, vi, mods= ~factor(tester)-1, data=dat)
tester

#Turn off annoying option that prevents you from binding rows of text
options(stringsAsFactors = FALSE)

#You need to create a matrix for the new data row you wish to insert, give it the same column names as dat
#and then bind it to dat. You will need to do this for each row you wish to insert.
#In this case, 4 groups + 1 overall estimate = 5 entries...
group.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', setting$b[1], setting$se[1]^2, 'Group Studies'), nrow = 1)
group.row.df = as.data.frame (group.row)
names(group.row.df) = names(dat)
dat=rbind(dat, group.row.df)

individual.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', setting$b[2], setting$se[2]^2, 'Individual Studies'), nrow = 1)
individual.row.df = as.data.frame (individual.row)
names(individual.row.df) = names(dat)
dat=rbind(dat, individual.row.df)

aware.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', tester$b[1], tester$se[1]^2, 'Aware Studies'), nrow = 1)
aware.row.df = as.data.frame (aware.row)
names(aware.row.df) = names(dat)
dat=rbind(dat, aware.row.df)

blind.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', tester$b[2], tester$se[2]^2, 'Blind Studies'), nrow = 1)
blind.row.df = as.data.frame (blind.row)
names(blind.row.df) = names(dat)
dat=rbind(dat, blind.row.df)

overall.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', random$b, random$se^2, 'Overall Effect'), nrow = 1)
overall.row.df = as.data.frame (overall.row)
names(overall.row.df) = names(dat)
dat=rbind(dat, overall.row.df)

#Make sure everything that is numeric is numeric, and everything that is a factor is a factor
dat$yi = as.numeric (dat$yi)
dat$vi = as.numeric (dat$vi)
dat$setting = as.factor (dat$setting)
dat$tester = as.factor (dat$tester)
dat$cite= as.factor (dat$cite)

#Get standard errors from variances
dat$se = sqrt(dat$vi)

#Calculate 95% CI values
dat$lowerci = (-1.96*dat$se)+dat$yi
dat$upperci = (1.96*dat$se)+dat$yi

#Rename levels of Setting factor
levels(dat$setting) = c('Groups', 'Individ.', 'Summary')
Advertisements

10 thoughts on “7. Make It Pretty: Forest and Funnel Plots for Meta-Analysis Using ggplot2

  1. Great code! I used it to obtain funnel plots for my upcoming paper. A thought though: Do you think that you should name ‘se’, the one below ‘estimate’ differently than the ‘se’ in the data used for plotting purposes. If I use your code directly, I was getting points (open circles) all at the value of the first ‘se’ in the graph.

    Liked by 1 person

    • Thanks–glad you find it useful! Good call on the naming conventions. I’ll admit that this is not my best example of easy reproducibility, as the code is somewhat bound to the example data that I was working from. I just moved to the West Coast for a new job, but I’ll work on updating the code here soon in order to make it more seamlessly adaptable to others’ data!

      Like

  2. Something is wrong with this line: geom_point(data=subset(dat, tester==’Summary’), color=’black’, shape=18, size=4)+

    Like

  3. With this line I always got “Error: Aesthetics must be either length 1 or the same as the data (19): x, y, xmin, xmax”. When I deleted it, the code worked but I couldn’t get the summary section.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s