Meta-Science 101, Part 8: Snakes at the top

Now that we’re at the top of the epistemic pyramid, let’s see what diamonds we can find up here.  Here’s one systematic review that says smoking is causally related to lung cancer.  Here’s another meta-analysis that fails to find a link between childhood vaccinations and the development of autism.  And this meta-analysis shows that people can psychically detect the future via as-of-yet unknown nonphysical processes.

Wait, what?

This story starts in 2011, when esteemed Cornell psychologist Daryl Bem published a paper in the Journal of Personality and Social Psychology that reported the results of nine experiments testing for the existence of “retroactive influence,” or the idea that events in the future can affect people physiologically or behaviorally in the present.

A priori, this idea seems absurd.  Events in the future haven’t happened yet, so how could they affect the present?  The causal arrow goes in only one temporal direction, and to find otherwise would seem to fly in the face of everything we know about physics.

But eight out of nine of Bem’s experiments found statistically significant evidence for “retroactive influence.”

Let’s get a sense of what these experiments entailed.  In a standard psych experiment not testing for retroactive influence, a participant might be asked to practice rehearsing 25 randomly selected words out of a list of 50 words.  Later, they would be asked to type as many of the words from the list as they could remember.  Of course, as you might expect, they generally recall the words that they practiced rehearsing more readily.

Bem basically just time-reverses this protocol.  The participants are still given the list of 50 words initially, but then are asked immediately in a surprise test to type as many of the 50 words they can remember.  Only after that are they asked to practice rehearsing 25 randomly selected words.  What Bem found is that the participants were better at recalling words in the initial surprise test if they were going to practice rehearsing them in the future.

So saying words in the future apparently helps you remember them right now.  Huh?

Bem’s paper consisted of eight other experiments similar to this one, and like I said, eight out of nine found evidence of retroactive influence.  Upon publication, the field of experimental psychology got all in a tizzy, because this felt like someone was making a mockery out of their field.  Article after article followed with criticism of Bem’s statistical techniques, with Bem responding accordingly.  Now, I’m not a statistician, so I have no idea whether or not these criticisms make valid points.  But it all became moot soon, as a couple of high-profile replication attempts failed to reproduce Bem’s findings, and instead concluded that, hey, retroactive influence wasn’t a thing after all.  The lesson that psychologists took away from this was This Is Why Replications Are Important, and they stopped talking about Bem not too long after.

But Bem didn’t just go away.  Like a good storybook villain, he just bided his time…and returned four years later, this time even stronger.

This is where the real fun begins.

In 2015, Bem came out with a meta-analysis of 90 different studies on retroactive influence; 10 of these studies included Bem’s own, 69 were either exact or modified replications of Bem’s original studies, and 11 tested for retroactive influence in alternative ways.  The upshot is that this conglomeration of studies found statistically significant evidence for retroactive influence, p = 1.2 x 10-10.

That’s not a typo.  That really is p = 0.00000000012.

So psychology…you wanted replications?  Well Bem just got a whole truck-full and delivered them to your front door, and together they blew through your standard threshold of p < 0.05 by a factor of almost 500 million.  Satisfied yet?

A standard response here might be that p-values are suboptimal measures anyway.  I mean, didn’t I have a whole thing in Part 3 about p-values measuring the probability of getting the data given the null hypothesis, not the other way around?

The most frequently espoused alternative to p-values would be to use Bayes factors.  Bayes factors tell you how much more likely you are to get the evidence you did in a world where your hypothesis was true, compared to a world in which your hypothesis was false.  For example, say I do a coin-flipping experiment testing to see if a coin is biased, and I flip 10 heads in a row.  Without doing any math, let’s just assume the Bayes factor comes out to 5 in this case.  This would mean that this result (10 heads in a row) was 5 times more likely to happen in a world where the coin was biased than in a world where the coin was fair, so I should revise my beliefs toward the hypothesis that the coin was biased (which doesn’t mean I have to change my mind entirely, just incrementally update my beliefs in proportion to the strength of the evidence).  Generally speaking, obtaining a Bayes factor of 100 is considered to be decisive evidence for a hypothesis.

Bem’s hypothesis was that retroactive influence is a thing.  So what was the Bayes factor in Bem’s meta-analysis?

Oh, you know, just a casual 5.1 billion.


…But p-hacking!  Publication bias!!  These are things that could have made this a bad meta-analysis, right?

Bem is a careful guy and thought of this, too.  He carried out nine statistical tests to test for the presence of p-hacking or publication bias.  Eight out of nine came up empty.  The ninth, which was testing for p-hacking, didn’t prove that there was any, but rather was inconclusive.

So we’re put in the uncomfortable situation where an extremely well-done meta-analysis, certainly in the top tier of meta-analyses in general, came up with pretty damn strong evidence for a phenomenon that’s physically impossible.

We found a snake at the top of the pyramid.

Now, I don’t want to give the impression that nobody had anything to say about this.  There have been a few responses trying to figure out what exactly happened here, some of which I find plausible and most of which I don’t understand.

But the larger point is this: If Bem could, in full conformity with common scientific practice, carry out a meta-analysis that is probably higher quality than 95% of meta-analyses out there and find evidence for something non-real, then what does this say about science?

And it’s not just Bem.  Plenty of high-quality meta-analyses in the past have found evidence for spooky phenomena: This one finds evidence of precognition, with subjects being able to predict which of several potential targets (e.g., faces on a die, cards in a deck) will be selected by a random number generator in the future (sample size = 309 studies, p = 6.3 x 10-25, publication bias tested for and not found to be significant).  And this one finds that people can influence random number generators to become non-random just by thinking about them hard enough (sample size = 832 studies, publication bias tested for and not found to be significant).

In fact, there’s a whole field of parapsychology out there, with its own journals and everything, where researchers test for the existence of psychic phenomena and are able to publish significant findings all the time.  Mainstream scientists keep trying to make them go away, and parapsychologists respond that they’re following all the rules of modern science better than most scientists, so they should be allowed to play, too.

So…what does this mean for mainstream science?

A couple people have put forth the idea of thinking of parapsychology as a control group for science; basically, whatever percentage of papers are published by parapsychologists for non-real phenomena is the percentage of papers that we have to expect are wrong in mainstream science, too.  I think this is a neat idea.10  As we’ve discussed in the previous seven parts of this blog post, there are a ton of problems with modern science.  If all these problems are present in parapsychology, too, then parapsychology gives us a great example of the number of false findings that would be generated as a result of these problems alone.

Like a sugar pill, parapsychology is devoid of any content whatsoever, and yet it keeps producing significant effects.  We’d be wise to take this base level of activity into account when evaluating the rest of science.

(…Oh, speaking of sugar pills, here’s another snake at the top that says homeopathic treatments tend to do quite a bit better than placebo.  I think dealing with parapsychology is enough for one day, though, so let’s leave that one alone.)

Continue to Part 9: A conclusion >>>

10At least in theory, as a way of adjusting your model of science in your mind.  I have no idea what it would even look like in practice.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s