The Hastings Center Report, a prominent bioethics publication, published a special supplement (March/April 2014) on "Interpreting Neuroimaging," which contained an especially useful article by Dr. Martha J. Farah of the University of Pennsylvania Center for Neuroscience and Society. (If you want to know what this has to do with the topic of this blog, hang in with me for a little bit.)
What Dr. Farah is basically up to here is finding a middle course between extreme advocates of the new wave of neuroimaging studies in psychology, who claim incredible insights into human brain function and thinking, and extreme critics, who attack these findings as nothing but misguided statistical manipulations and artifacts.
I'm especially grateful to Dr. Farah's article because it referred me to a fascinating paper by a group of psychologists led by Craig M. Bennett of UC-Santa Barbara:
Dr. Bennett and colleagues are concerned about one particular error that they believe is all too common in the neuroimaging field (with around a quarter of papers in widely cited journals exhibiting the error at the date the article was written). A typical functional magnetic resonance (fMRI) scan of the human brain gathers data on 130,000 units of activity and makes many tens of thousands of comparisons. When we set the standard for statistical significance at the traditional p < 0.05, we accept that there's a 1-in-20 chance that any comparison that produces a positive result could do so by chance rather than because of a true causal link. When there are tens of thousands of comparisons, then a 1-in-20 chance produces an incredibly large number of findings that are actually due to random noise.
Bennett and colleagues recommend two particular statistical tests to avoid this problem, that are especially designed to address the way these large data sets behave. By contrast, the way some investigators typically try to resolve the problem is to set the bar higher for statistical significance, such as p < 0.001. There are so many comparisons in a typical fMRI study that this method, say Bennett et al., that this ploy does not work reliably.
They chose to demonstrate what they are talking about by setting up a parody of a real psychological study, in which a human subject is put in an MRI machine, shown photos of people in social interactions, and then asked to imagine what emotions those people must be experiencing. In their experiment, the subject was a dead salmon (about 18 inches long). They put the salmon in the MRI machine, told it what to do when looking at the photos, and then scanned its brain.
When they processed the scans in the usual way, they found two areas of heightened activity, in the dead salmon's middle brain and upper spinal cord. When they set the higher threshold for statistical significance, they still found these two areas of supposed activity. When they used instead the corrections that they recommended, there was no significant brain activity recorded. Bennett et al. concluded that unless you use the right statistical approaches, it was easy to get spurious results from brain scans.
As I read this study, I was reminded of one of the themes we've previously discussed many times in this blog--how industry-sponsored studies can be doctored to make new drugs look better than they are. One way we've seen this done is--you guessed it--multiple comparisons. If you look at enough different variables in the drug vs. placebo group, by chance, one is likely to be favorable to the drug. If you can spin the study to pretend that that particular measure was the one you were really interested in all along (instead of something you just happened to trip over when you analyzed the data at the end of the study), you can make a study sound impressively positive.
Bennett and colleagues mentioned in passing another statistical flaw that is not addressed by their favored methods, a form of circular reasoning. Because there are so danged many regions of the brain all firing off at various times, most of which have nothing whatever to do with the process psychologists want to investigate, it is tempting to use some statistical tools up front to narrow the field of vision to look only at the brain regions thought most likely to be informative. If one is not careful, one may then use the same statistical tools to decide that the information gleaned from those regions is of significance. (This is a variant of the classic joke about the drunk looking for his lost quarter under the lamppost because the light's better there.)
That trick, too, we have frequently seen in industry-sponsored pharma research. The major tools used are run-in and wash-out periods before the start of the formal data gathering. The pre-trial manipulations are designed to eliminate those subjects who respond too well to the placebo, or those who don't respond well to the drug, or other things the researcher does not want to see. This helps assure that when the "study" begins, the scale has already been tipped in the direction of the drug looking better than it really is.
Dr. Farah admits that neuroimaging data can be spun in incorrect ways due to insufficient attention to these statistical booby traps, but she immediately adds that this does not make neuroimaging research unique (or uniquely unreliable when done well). She gives examples from other fields of research where the same statistical flaws can be found.
She did not mention pharmaceutical clinical trials. But she could have.
Bennett CM, Baird AA, Miller MB, Wolford GL. Neural correlates of interspecies perspective taking in the post-mortem Atlantic salmon: an argument for proper multiple comparisons correction. Journal of Serendipitous and Unexpected Results 1: 1-5, 2010. (Note: for people suspicious that the dead salmon paper is nothing by a hoax, there does indeed appear to be a journal of this title; and one would think that someone as familiar with the neuroscience field as Dr. Farah would know if the paper was phony. For more on the "study" see: http://blogs.scientificamerican.com/scicurious-brain/2012/09/25/ignobel-prize-in-neuroscience-the-dead-salmon-study/)