A Flawed Statistical Method Was Just Banned From A Major Sports Science Journal

Sports performance is a difficult thing to study. There are only so many trained athletes available for experiments, and most of the measurements required to investigate human performance are time-consuming to collect. As a result, most sports science studies are small, and that means it can be difficult to tease out the signal from the noise. In 2006, Will Hopkins and Alan Batterham published a commentary proposing a method for making meaningful inferences in such situations.

Their method, “magnitude-based inference,” or MBI, was controversial from the start. It was rebutted in 2008 by two statisticians who concluded that it was generally unreliable and represented an improper use of existing statistical methods. In 2009, the flagship journal of the American College of Sports Medicine, Medicine and Science in Sports and Exercise, published a set of statistical guidelines for the journal that included a description of MBI, but the journal published it as an invited commentary after peer reviewers would not agree to accept it. Since then, MSSE has published two critiques of MBI that concluded the method was too flawed to be used (the most recent of which arose from reporting by FiveThirtyEight). Now FiveThirtyEight has learned that MSSE has decided to stop accepting papers that use MBI.

The journal’s editor-in-chief, L. Bruce Gladden made the policy change after reviewing the published criticisms of MBI and consulting the journal’s editorial board, numerous statistical experts and ACSM leaders. “Science is self-correcting,” he said by way of explanation. The decision will be formalized in new instructions to authors, “but we’re putting the word out now, informally,” Gladden said.

Over the years, statisticians have identified numerous problems with MBI. A 2015 review of the method published in MSSE concluded that it applies a looser standard of evidence than traditional statistical methods and argued that the small sample sizes it allows are not justified. A preprint published in SportsRxiv last November pointed out that a properly vetted account of the method has never been published in a recognized statistics journal (and recommended that researchers should not use MBI until it is). If MBI were really the revolutionary new method that its inventors claim it is, it should be taken up among many fields, but Gladden notes with concern that MBI is only used in sports and exercise science.

Stanford statistician Kristin Sainani became so worried about the consequences of using MBI that she wrote up a formal analysis of the method. Published in MSSE, her paper showed that MBI produces a false positive rate two to six times higher than that of traditional hypothesis testing. That makes it less reliable.

What makes MBI so problematic is that it’s more likely than traditional methods to produce results that don’t stand up. “We need better reproducibility, not less,” Gladden said.

Will Hopkins, one of the creators of MBI, said MSSE’s decision — and FiveThirtyEight’s coverage of MBI — is misguided. “Apparently you and [MSSE] have the mistaken idea that authors and readers of the publications of such effects will consider that the effects are ‘real’ and that the literature is getting corrupted with fake findings. But such effects are published with probabilistic terms that properly reflect the uncertainty in the true magnitude: not only the confidence intervals but also qualitative terms such as possibly, likely, and so on.”

It’s true that MBI produces probability statements that estimate the likelihood of a finding accurately reflecting its true value. But these statements are based on categories that are very broad and give a false sense of the certainty. In practice, MBI may deem an intervention “likely beneficial” even if the error bars show that it could be almost as likely to be useless. (For a more detailed explanation, see this video from Sainani.) There are existing statistical methods, such as Bayesian analysis and minimal effect testing, that achieve the things that Hopkins and Batterham claim they are trying to do, so their insistence on creating a new approach is puzzling.

Sainani said the real issue is about the unreliability of MBI. “We should all be clear that it has a higher false positive rate. Even they say that their method is less conservative. And that’s really the debate we should be having — is it ok to have higher false positive rates because it’s elite athletes instead of cancer patients?’” Her answer, and now MSSE’s, is no.

Comments