By now, I’ve made it pretty clear: I absolutely love the
ggplot2 package for plotting visualizations of data. In fact, I’m pretty sure I’m addicted. But in the last couple of years, I’ve discovered another love–meta-analysis. Meta-analyses are often accompanied by two popular forms of data visualization: forest plots and funnel plots. In this post, I’ll show how quick-and-dirty forest and funnel plots can be created with the
metafor package. After, I’ll show how we can instead use the
ggplot2 package to create forest plots and use the
ggplot2 package to create funnel plots, so that we can have pretty plots that are easy to change/stylize, and that can be produced regardless of which meta-analysis package for
R that you elect to use.
Forest plots help to visualize both the raw data (alongside citation information) and summary statistics of a given meta-analysis. Individual effect sizes and their confidence intervals (usually 95%) are plotted for each study in the meta-analysis, as well as the meta-analytic average effect size and its confidence interval. And sometimes sub-group meta-analytic averages are plotted too (see below for an example).
Funnel plots, alternatively, help to visualize the presence (or absence) of publication bias. Typically, a measure of effect size precision (most commonly, standard error) is plotted on the y-axis, and effect sizes are plotted on the x-axis. As studies increase in precision, their location around the meta-analytic average tends to narrow, which produces a funnel shape. Funnel plots are then assessed–either subjectively, or empirically–for asymmetry, which can be indicative of publication bias and other small study effects (see below for an example of both a symmetrical and asymmetrical funnel plot).
Creating Quick-and-Dirty Forest/Funnel Plots with metafor
metafor (Viechtbauer, 2010) is a package for conducting meta-analysis in R that can make both forest and funnel plots quickly, and easily:
#Install and call the metafor and ggplot2 packages (we'll use ggplot2 later) install.packages('metafor') install.packages('ggplot2') library(metafor) library(ggplot2) #Load data from Raudenbush's 1984 meta-analysis of teacher #expectation effects on student IQ scores. yi contains standardized mean differences, #sei contains standard error of standardized mean differences. There are also a number of moderator #variables (some we'll use), and bibliographic info dat = get(data(dat.raudenbush1985)) #Fit a simple random-effects meta-analytic model random = rma(yi, sei=se, data=dat) random
rma function returns an estimated mean standardized difference of 0.0837, 95% CI (-0.0175, 0.1849). If we want to get forest and funnel plots for this model, we simply request the following:
#Provide forest plot for the model &amp;amp;amp;amp;amp;quot;random&amp;amp;amp;amp;amp;quot; forest(random) #Provide funnel plot for the model &amp;amp;amp;amp;amp;quot;random&amp;amp;amp;amp;amp;quot; funnel(random)
These functions, respectively, return the following plots:
Not bad, but by no means would I call the plots created by these quick functions “pretty”. And in the case of the funnel plot, things get out of hand pretty quickly if you have many effect sizes (see below for one from a meta-analysis of my own with >200 effect sizes). Further, though it is possible, I find the
metafor syntax for manually altering the appearance of these plots to be really inaccessible. What if I want to show moderation of effect sizes by some categorical quality of the articles? Or change the axes? Or what if I want to produce forest and funnel plots for models that aren‘t fit using
metafor (e.g., I’m using
metaSEM for mine)?
Creating Forest Plots with ggplot2
In order to code a pretty Forest Plot, I called in for help from my buddy Matt Baldwin. Awhile back, Matt was working on a meta-analysis and I supplied him with some forest plot code. But since then, Matt has made some changes that make for a much prettier plot than the one I had originally generated.
Making a pretty forest plot starts with some code to add the meta-analytic estimates for the overall dataset, and any subgroups of moderators that you are interested in (I’m using the Raudenbush (1985) example dataset that is built into the
metafor package. The coding is a bit arduous, so I’m going to punt it to the end of the post. It’s probably more expeditious to just manually amend your meta-analytic datafile using your favourite spreadsheet software…
#Make a plot called 'p', and map citation data to y-axis, effect sizes to x-axis #specify the min and max of the CIs, and give different shapes based on levels of tester p=ggplot(dat, aes(y=cite, x=yi, xmin=lowerci, xmax=upperci, shape = tester))+ #Add data points and color them black geom_point(color = 'black')+ #Add 'special' points for the summary estimates, by making them diamond shaped geom_point(data=subset(dat, tester=='Summary'), color='black', shape=18, size=4)+ #add the CI error bars geom_errorbarh(height=.1)+ #Specify the limits of the x-axis and relabel it to something more meaningful scale_x_continuous(limits=c(-2,2), name='Standardized Mean Difference (d)')+ #Give y-axis a meaningful label ylab('Reference')+ #Add a vertical dashed line indicating an effect size of zero, for reference geom_vline(xintercept=0, color='black', linetype='dashed')+ #Create sub-plots (i.e., facets) based on levels of setting #And allow them to have their own unique axes (so authors don't redundantly repeat) facet_grid(setting~., scales= 'free', space='free')+ #Apply my APA theme apatheme p #Save plot in your working directory ggsave(p, file='ggforest.png', width = 8, height=8, dpi=300)
Creating Funnel Plots with ggplot2
Creating basic funnel plots with ggplot2 is simple enough; they are, after all, just scatter plots with precision (e.g., standard error) on the y-axis, and effect size on the x-axis. But a plot so basic leaves much to be desired (see below for an example). Namely, a 95% confidence interval region for the meta-analytic estimate–as indicated by the shaded funnel-shaped region (see other example below)–is missing. And dammit, I wanted my pretty funnel-shaped region!
My plot’s salvation, much to my delight, was to be found–where else?–in a post on CrossValidated (check out the second answer if you’re interested). Their code needs a bit of a face-lift: (1) we typically prefer to plot standard error as our measure of precision (not sample size), (2) we typically orient with effect size plotted on the x-axis, not y-axis (3) the plot falls a bit short of APA-format compliant, (4) the plot would be prettier if the line indicating the meta-analytic effect was the same length as the lines indicating the 95% CI region, and (5) the plot could be more informative if the 95% CI for the meta-analytic estimate itself was plotted. Here we go…assume a data frame named “dat” with a vector of Fisher-transformed correlations (Zr), and a vector of standard errors for those transformed correlations (corr_zi_se).
#Call the ggplot2 package, if you haven't already done so library(ggplot2) #Store the meta-analytic estimate and its standard error from whatever model you run (substitute your own values) estimate = 0.0892 se = 0.0035 #Store a vector of values that spans the range from 0 #to the max value of impression (standard error) in your dataset. #Make the increment (the final value) small enough (I choose 0.001) #to ensure your whole range of data is captured se.seq=seq(0, max(dat$corr_zi_se), 0.001) #Now, compute vectors of the lower-limit and upper limit values for #the 95% CI region, using the range of SE that you generated in the previous step, and the stored value of your meta-analytic estimate. ll95 = estimate-(1.96*se.seq) ul95 = estimate+(1.96*se.seq) #You can do this for a 99% CI region too ll99 = estimate-(3.29*se.seq) ul99 = estimate+(3.29*se.seq) #And finally, do the same thing except now calculating the confidence interval #for your meta-analytic estimate based on the stored value of its standard error meanll95 = estimate-(1.96*se) meanul95 = estimate+(1.96*se) #Now, smash all of those calculated values into one data frame (called 'dfCI'). #You might get a warning about '...row names were found from a short variable...' #You can ignore it. dfCI = data.frame(ll95, ul95, ll99, ul99, se.seq, estimate, meanll95, meanul95) #Now we can actually make the funnel plot. #Using your original data-frame, map standard error to your x-axis (for now) and Zr to your y-axis fp = ggplot(aes(x = se, y = Zr), data = dat) + #Add your data-points to the scatterplot geom_point(shape = 1) + #Give the x- and y- axes informative labels xlab('Standard Error') + ylab('Zr')+ #Now using the 'dfCI' data-frame we created, plot dotted lines corresponding #to the lower and upper limits of your 95% CI region, #And dashed lines corresponding to your 99% CI region geom_line(aes(x = se.seq, y = ll95), linetype = 'dotted', data = dfCI) + geom_line(aes(x = se.seq, y = ul95), linetype = 'dotted', data = dfCI) + geom_line(aes(x = se.seq, y = ll99), linetype = 'dashed', data = dfCI) + geom_line(aes(x = se.seq, y = ul99), linetype = 'dashed', data = dfCI) + #Now plot dotted lines corresponding to the 95% CI of your meta-analytic estimate geom_segment(aes(x = min(se.seq), y = meanll95, xend = max(se.seq), yend = meanll95), linetype='dotted', data=dfCI) + geom_segment(aes(x = min(se.seq), y = meanul95, xend = max(se.seq), yend = meanul95), linetype='dotted', data=dfCI) + #Reverse the x-axis ordering (se) so that the tip of the funnel will appear #at the top of the figure once we swap the x- and y-axes... scale_x_reverse()+ #Specify the range and interval for the tick-marks of the y-axis (Zr); #Choose values that work for you based on your data scale_y_continuous(breaks=seq(-1.25,2,0.25))+ #And now we flip the axes so that SE is on y- and Zr is on x- coord_flip()+ #Finally, apply my APA-format theme (see code at end of post). #You could, alternatively, specify theme_bw() instead. apatheme #Call the pretty funnel plot fp #Save the funnel plot to working directory using ggsave ggsave('funnel.png', width=6, height=6, unit=&amp;amp;amp;quot;in&amp;amp;amp;quot;, dpi=300)
Much better–pretty CI regions for both 95% and 99% confidence. From here, it would be pretty easy to amend the above code to add any number of additional features. For example, perhaps you will want to map the shape of data-points to some categorical moderator of effect sizes. Or maybe you have identified effect-sizes that are outliers, and want to highlight them in a different color by mapping outlier-codes to the fill or color of data points. Or maybe you’d like another confidence region around an effect size of zero. Whatever you do to your forest and funnel plots from this point onward, stay pretty my friends.
Below are the
ggplot2 settings that I program into my APA-format theme.
#My APA-format theme apatheme=theme_bw()+ theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(), panel.border=element_blank(), axis.line=element_line(), text=element_text(family='Times'), legend.position='none')
Below is the code for whipping the Raudenbush data into shape for pretty forest-plotting–you should be able to easily amend for your own dataset.
#Call the metafor package library(metafor) #Use the Raudenbush1985 dataset dat=get(data(dat.raudenbush1985)) #Create 'cite' vector by merging author and year columns dat$cite=NA dat$cite=paste(dat$author, dat$year) #Reorder bibliographic info based on value of d (yi), so effect sizes can be plotted in descending order dat$cite=reorder(dat$cite, dat$yi, FUN=mean) #Look at your overall meta-analytic estimate (random-effects) random=rma(yi, vi, data=dat) random #If you want to look at estimates for levels of categorical moderators (e.g., tester and setting) simply subtract the intercept from each model setting=rma(yi, vi, mods= ~factor(setting)-1, data=dat) setting tester=rma(yi, vi, mods= ~factor(tester)-1, data=dat) tester #Turn off annoying option that prevents you from binding rows of text options(stringsAsFactors = FALSE) #You need to create a matrix for the new data row you wish to insert, give it the same column names as dat #and then bind it to dat. You will need to do this for each row you wish to insert. #In this case, 4 groups + 1 overall estimate = 5 entries... group.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', setting$b, setting$se^2, 'Group Studies'), nrow = 1) group.row.df = as.data.frame (group.row) names(group.row.df) = names(dat) dat=rbind(dat, group.row.df) individual.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', setting$b, setting$se^2, 'Individual Studies'), nrow = 1) individual.row.df = as.data.frame (individual.row) names(individual.row.df) = names(dat) dat=rbind(dat, individual.row.df) aware.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', tester$b, tester$se^2, 'Aware Studies'), nrow = 1) aware.row.df = as.data.frame (aware.row) names(aware.row.df) = names(dat) dat=rbind(dat, aware.row.df) blind.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', tester$b, tester$se^2, 'Blind Studies'), nrow = 1) blind.row.df = as.data.frame (blind.row) names(blind.row.df) = names(dat) dat=rbind(dat, blind.row.df) overall.row = matrix(c(NA, NA, NA, NA, 'Summary', 'Summary', random$b, random$se^2, 'Overall Effect'), nrow = 1) overall.row.df = as.data.frame (overall.row) names(overall.row.df) = names(dat) dat=rbind(dat, overall.row.df) #Make sure everything that is numeric is numeric, and everything that is a factor is a factor dat$yi = as.numeric (dat$yi) dat$vi = as.numeric (dat$vi) dat$setting = as.factor (dat$setting) dat$tester = as.factor (dat$tester) dat$cite= as.factor (dat$cite) #Get standard errors from variances dat$se = sqrt(dat$vi) #Calculate 95% CI values dat$lowerci = (-1.96*dat$se)+dat$yi dat$upperci = (1.96*dat$se)+dat$yi #Rename levels of Setting factor levels(dat$setting) = c('Groups', 'Individ.', 'Summary')