“The p-value was never intended to be a substitute for scientific reasoning,” said Ron Wasserstein, the ASA’s executive director. “Well-reasoned statistical arguments contain much more than the value of a single number and whether that number exceeds an arbitrary threshold. The ASA statement is intended to steer research into a ‘post p<0.05 era’. He further adds, “This is the first time that the 177-year-old ASA has made explicit recommendations on such a foundational matter in statistics. The society’s members had become increasingly concerned that the p-value was “being misapplied in ways that cast doubt on statistics generally.”
The p-value, or calculated probability, is the probability of finding the observed, or more extreme, results when the “null hypothesis” of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested. The null hypothesis is usually a hypothesis of “no difference”, e.g. no difference between blood pressures in group A and group B. A null hypothesis needs to be defined for each study question clearly before the start of your study. According to commonly used conventions, a p-value of 0.05 or less is considered statistically significant for a set of findings. However, the ASA statement notes that it is not necessarily true.
To address its concerns, the association convened a group of experts to formulate a document listing six “principles” regarding p-values for the guidance of “researchers, practitioners and science writers who are not primarily statisticians.” Of those six principles, the most pertinent for people in general (and science journalists in particular) is No. 5: “A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.” Moreover, researchers should describe not only the data analyses that produced statistically significant results, the society says, but all statistical tests and choices made in calculations. Otherwise, results may seem falsely robust.
“Viewed alone, p-values calculated from a set of numbers and assuming a statistical model are of limited value and frequently are meaningless,” wrote biostatistician Donald Berry of MD Anderson Cancer Center in Houston. He cited the serious negative impact that misuse and misinterpretation of p-values has had not only on science, but also on society. “Patients with serious diseases have been harmed. Researchers have chased wild geese, finding too often that statistically significant conclusions could not be reproduced. The economic impacts of faulty statistical conclusions are great.”
A course correction has long been overdue considering its criticism has been going around in the scientific circles for a while now. A study in 2011 manipulated an analysis to prove that: listening to music by the Beatles makes undergraduates younger. This was an attempt to raise awareness about false positives. More recently, in 2015, an article (later retracted) published results from a purposely sloppy clinical trial to show that eating chocolate helps people to lose weight.
“Over time it appears the p-value has become a gatekeeper for whether work is publishable, at least in some fields,” said Jessica Utts, ASA president. “This apparent editorial bias leads to the ‘file-drawer effect,’ in which research with statistically significant outcomes are much more likely to get published, while other work that might well be just as important scientifically is never seen in print. It also leads to practices called by such names as ‘p-hacking’ and ‘data dredging’ that emphasize the search for small p-values over other statistical and scientific reasoning.” Giovanni Parmigiani, a biostatistician at the Dana Farber Cancer Institute in Boston, Massachusetts, chips in “Surely if this happened twenty years ago, biomedical research could be in a better place now.”
On the other hand drastic steps like the ban on publishing papers containing p-values could prove to be counter-productive, says Andrew Vickers, a biostatistician at Memorial Sloan Kettering Cancer Centerin New York City. What is the need of the hour is a better understanding of the p-value. Researchers shouldn’t get carried away to use statistics in order to create an impossible level of confidence.