“Researchers in your field receive your manuscript for peer review, and make semi-arbitrary recommendations to the editor.”
Peer review seems like a good idea. To publish in a certain journal, your work should pass a certain standard of quality associated with that journal. Peers in your field would seem like the most natural judges of your work, given that they have the expertise to do so. Peer review is also widely supported in the scientific community: as of 2008, 85% of scientists agree that peer review greatly aids in scientific communication, and 93% disagree that peer review is unnecessary.
Some in the media like to focus on peer review scandals: Sixty-four articles retracted after peer reviews found to be fake! Peer-review ring discovered in Taiwan, resulting in resignation of Taiwan cabinet minister! Chinese company found to be selling peer reviews!
But all this attention on outright fraud might obscure a perhaps more consequential question: Does peer review actually result in the selective publication of higher-quality studies?
The evidence bearing on this question is very limited. I was able to find two systematic reviews on the efficacy of peer review, the conclusions of which basically amounted to, “Uh, we don’t really know. More research is needed.” But while we don’t have much direct information on the effects of peer review on study quality, we do have some studies on other things that are relevant to that question.
Like reviewer agreement. If peer reviewers were able to reliably discern some measure of study quality in the papers they were reviewing, then you’d expect them to agree on their recommendations for papers often. But this isn’t actually what you see: A recent meta-analysis of 48 studies on reviewer agreement concluded that, overall, reviewers agreed only 17% more than would be predicted by chance alone. So while you might hope that your paper will be accepted or rejected solely based on its merits, you’ll also be contending with a significant amount of randomness.
…And bias. In a now-famous 1982 study, Peters & Ceci took 12 already-published psychology papers written by authors from prestigious psychology departments, replaced the original authors’ names and institutions with fictitious ones (e.g., “Dr. Wade M. Johnston” from the “Tri-Valley Center for Human Potential”), and re-submitted them to the same journals. As you might expect, some of the editors noticed something fishy going on; three of the papers were rejected because they were just resubmissions. For the remaining nine, you would expect a high acceptance rate if peer reviewers were judging on quality and not on authors’ institutions, given that all of these papers were accepted the first time around. But this time, 8 out of 9 were rejected! Apparently, the reviewers found serious problems with the resubmissions. Said one reviewer: “It is all very confusing…I think the entire presentation of the results needs to be planned more carefully and organized.” Said another: “It is not clear what the results of this study demonstrate…mainly because of several methodological defects in the design of the study.” And finally: “Apparently, this is intended to be a summary. However, the style of writing leaves much to be desired in terms of communicating to the reader.”
Out of the 16 reviewers on the 8 rejected papers, all 16 recommended against publication. Remember, reviewer agreement is supposed to be not much higher than would be predicted by chance! But apparently, at least in this one study, bias against non-prestigious authors and institutions was so strong that it outweighed the inherent randomness in peer review. (To be fair, Dr. Wade M. Johnston from the Tri-Valley Center for Human Potential does sound pretty sketchy.)
One neat way to uncover bias is to do a comparison of blinded and open peer reviews. We can draw an analogy here with the Pepsi Challenge. The Pepsi Challenge blinds participants so they don’t know which brands of soda they’re drinking, asks them to drink two sodas (Pepsi and Coke), and then has them say which soda they preferred. Presumably, participant preference should come down to taste alone. In an open Pepsi Challenge, where participants know which soda they’re drinking beforehand, brand affiliation can bias the results one way or another. If you run a blinded Pepsi Challenge and find that people generally prefer Pepsi, and then run an open Pepsi Challenge and find that people generally prefer Coke, then you can infer some sort of anti-Pepsi bias.
Similarly, if you find in a study that a non-US author’s abstract to an American Heart Association meeting is 22% less likely to be accepted in an open peer review than in a blind peer review, you can attribute that 22% to bias against non-US authors.4 …You can probably guess that this was a real study.
You might object that the quality of papers written by authors outside and inside the US could differ, but remember that that difference is already taken care of by this comparison of blind and open peer review. You couldn’t dispute the hypothetical finding of anti-Pepsi bias above by saying, “Well, maybe Coke does taste better,” because we already know from the blinded Pepsi Challenge that people preferred Pepsi.
A bias of 22% might seem modest, but remember that this is an effect due solely to your address, which takes up a single line in your abstract. If you could write a single line in your abstract that would make it 22% more likely for your abstract to be accepted, you would write that line every time. And now we know that line ends in “…USA.”5
Man, you can feel bad for Dr. Wade M. Johnston all you want, but at least that guy is American.6
This doesn’t mean that everything would be fine and dandy if we just started instituting double-blinded peer reviews, where author information wouldn’t be known to the reviewers. Bias can be content-based, as well. Remember researcher allegiance bias? Well, we’re about to see Researcher Allegiance Bias, 2.0: Peer-Review Edition.
In this pioneering study, 75 peer reviewers reviewed manuscripts with identical methodologies, but with results tweaked to be either “positive” (i.e., consistent with the reviewer’s perspective) or “negative” (i.e., contradicting the reviewer’s perspective). The reviewers were then asked to give their recommendation for the manuscript they reviewed. If you’ve made it this far, you’ve probably become cynical enough to guess what the study found. Per the authors:
“Identical manuscripts suffered very different fates depending on the direction of their data. When they were positive, the usual recommendation was to accept with moderate revisions. Negative results earned a significantly lower evaluation, with the average reviewer urging either rejection or major revision.”
Interestingly enough, when the reviewers were asked to rate just the methodology section on a 6-point scale, manuscripts with “positive” results received an average rating of 4.2, while manuscripts with “negative” results received an average rating of 2.4–an absolute difference of 30%. Remember, the methodologies were identical.
You can imagine how this sort of confirmation bias can quickly lead to the stagnation of a field. Once a scientific paradigm has been established, then studies confirming the paradigm will always have an advantage in publication compared to studies contradicting it. These confirmatory studies will keep piling up, while researchers deviating from the consensus will have trouble publishing, and eventually lose their funding and/or status, even if their studies were just as well-conducted. Most of the time, the prevailing paradigm is pretty much correct, since otherwise the paradigm wouldn’t have become widely accepted in the first place. But if it’s not, we need to be able to have contradictory results see the light of day to let us know that. We need the data to be able to slap us in the face. A slap in the face is pretty hard to feel when you’re wearing a helmet called Confirmation Bias.
So to wrap up this section, we started off with the question of whether or not the peer review process improved average study quality. While there’s not much direct evidence on that question–a fact that is itself troubling, given how central the peer review process is to current scientific practice–we found that the peer review process is riddled with randomness and bias of all sorts. In order for peer review to improve study quality, its algorithm needs to hew as closely as possible to, “Publish high-quality studies, reject low-quality studies.” But right now, that algorithm is pretty corrupted.
4Among the pool of non-US-authors, the situation was even worse for authors from non-English-speaking countries, with an additional 15% of bias tacked on.↵
5Actually, while non-US authors were 22% less likely to have their abstracts accepted, US authors were 28% more likely to have theirs accepted, because math (the percentages differ depending on which way you phrase it).↵
6I’m assuming, here. He is fictional, after all.↵