It’s easy to misunderstand and misuse one of the most common — and important — terms in academic research: statistical significance. We created this tipsheet to help journalists avoid some of the most common errors, which even trained researchers make sometimes.
When scholars analyze data, they look for patterns and relationships between and among the variables they’re studying. For example, they might look at data on playground accidents to figure out whether children with certain characteristics are more likely than others to suffer serious injuries. A high-quality statistical analysis will include separate calculations that researchers use to determine statistical significance, a form of evidence that indicates how consistent the data is with a research hypothesis.
Statistical significance is a highly technical, nuanced concept, but journalists covering research should have a basic understanding of what it represents. Health researchers Steven Tenny and Ibrahim Abdelgawad framed statistical significance like this: “In science, researchers can never prove any statement as there are infinite alternatives as to why the outcome may have occurred. They can only try to disprove a specific hypothesis.”
Researchers try to disprove what’s called the null hypothesis, which is “typically the inverse statement of the hypothesis,” Tenny and Abdelgawad wrote. Statistical significance indicates how inconsistent the data being examined is with the null hypothesis.
If researchers studying playground accidents hypothesize that children under five years old suffer more serious injuries than older kids, the null hypothesis could be that there is no relationship between a child’s age and playground injuries. If a statistical analysis uncovers a relationship between the two variables, and researchers determine that relationship to be statistically significant, the data is not consistent with the null hypothesis.
To be clear, statistical significance is evidence used to decide whether to reject, or fail to reject, the null hypothesis. Getting a statistically significant result doesn’t prove anything.
Here are some other things journalists should know about statistical significance before reporting on academic research:
1. In Academic Research, Significant ≠ Important
Sometimes, journalists mistakenly assume that research findings described as “significant” are important or noteworthy — newsworthy. That’s typically not correct. To reiterate, when researchers call a result “statistically significant,” or simply “significant,” they’re indicating how consistent the data is with their research hypothesis.
It’s worth noting that a finding can be statistically significant but have little or no clinical or practical significance. Let’s say researchers conclude that a new drug drastically reduces tooth pain, but only for a few minutes. Or that students who complete an expensive tutoring program earn higher scores on the college-entrance exam — but only two more points, on average. Although these findings might be significant in a mathematical sense, they’re not very meaningful in the real world.
2. Researchers Can Manipulate the Process for Gauging Statistical Significance
Researchers use sophisticated software to analyze data. For each pattern or relationship detected in the data — for instance, one variable increases as another decreases — the software calculates what’s known as a probability value, or p-value.
P-values range from 0 to 1. If a p-value falls under a certain threshold, researchers deem the pattern or relationship statistically significant. If the p-value is greater than the cutoff, that pattern or relationship is not statistically significant. That’s why researchers hope for low p-values.
Generally speaking, p-values smaller than 0.05 are considered statistically significant.
“P-values are the gatekeepers of statistical significance,” science writer Regina Nuzzo, a statistics professor at Gallaudet University in Washington D.C., wrote in her tipsheet, Tips for Communicating Statistical Significance.
She added: “What’s most important to keep in mind? That we use p-values to alert us to surprising data results, not to give a final answer on anything.”
Journalists should understand that p-values are not the probability that the hypothesis is true. P-values also do not reflect the probability that the relationships in the data being studied are the result of chance. The American Statistical Association warned against repeating these and other errors in its Statement on Statistical Significance and P-Values.
And p-values can be manipulated. One form of manipulation is p-hacking, when a researcher “persistently analyzes the data, in different ways, until a statistically significant outcome is obtained,” explained psychiatrist Chittaranjan Andrade, a senior professor at the National Institute of Mental Health and Neurosciences in India, in a 2021 paper in The Journal of Clinical Psychiatry.
He added that “the analysis stops either when a significant result is obtained or when the researcher runs out of options.”
- Halting a study or experiment to examine the data and then deciding whether to gather more.
- Collecting data after a study or experiment is finished, with the goal of changing the result.
- Putting off decisions that could influence calculations, such as whether to include outliers, until after the data has been analyzed.
As a real-world example, many news outlets reported on problems found in studies by Cornell University researcher Brian Wansink, who announced his retirement shortly after JAMA, the flagship journal of the American Medication Association, and two affiliated journals retracted six of his papers in 2018.
Stephanie Lee, a science reporter at BuzzFeed News, described emails between Wansink and his collaborators at the Cornell Food and Brand Lab, showing they “discussed and even joked about exhaustively mining datasets for impressive-looking results.”
3. Researchers Face Intense Pressure to Produce Statistically Significant Results
Researchers build their careers largely on how often their work is published, and the prestige of the academic journals that publish it. “‘Publish or perish’ is tattooed on the mind of every academic,” Ione Fine, a psychology professor at the University of Washington, and Alicia Shen, a doctoral student there, wrote in a March 2018 article in The Conversation. “Like it or loathe it, publishing in high-profile journals is the fast track to positions in prestigious universities with illustrious colleagues and lavish resources, celebrated awards, and plentiful grant funding.”
Because academic journals often prioritize research with statistically significant results, researchers often focus their efforts in that direction. Multiple studies suggest journals are more likely to publish papers featuring statistically significant findings.
For example, a paper published in Science in 2014 found “a strong relationship between the results of a study and whether it was published.” Of the 221 papers examined, about half were published. Only 20% of studies without statistically significant results were published.
The authors learned that most studies without statistically significant findings weren’t even written up, sometimes because researchers, predicting their results would not be published, abandoned their work.
“When researchers fail to find a statistically significant result, it’s often treated as exactly that — a failure,” science writer Jon Brock wrote in a 2019 article for Nature Index. “Non-significant results are difficult to publish in scientific journals and, as a result, researchers often choose not to submit them for publication.”
4. Many People — Even Researchers — Make Errors Explaining Statistical Significance to Non-Scientists
“With its many technicalities, significance testing is not inherently ready for public consumption,” Jeffrey Spence and David Stanley, associate professors of psychology at the University of Guelph in Canada, wrote in the journal Frontiers in Psychology. “Properly understanding technically correct definitions is challenging even for trained researchers, as it is well documented that statistical significance is frequently misunderstood and misinterpreted by researchers who rely on it.”
Spence and Stanley pointed out three common misinterpretations, which journalists should look out for and avoid. Statistical significance, they note, does not mean:
- “There is a low probability that the result was due to chance.”
- “There is less than a 5% chance that the null hypothesis is true.”
- “There is a 95% chance of finding the same result in a replication.”
Spence and Stanley offer two suggestions for describing statistical significance. Although both are concise, many journalists (or their editors) might consider them too vague to use in news stories.
If all study results are significant, Spence and Stanley suggested writing either:
- “All of the results were statistically significant (indicating that the true effects may not be zero).”
- “All of the results were statistically significant (which suggests that there is reason to doubt that the true effects are zero).”
5. Academics Have Long Debated How to Rethink the Influence of Statistical Significance
Scholars have written about the problems associated with determining and reporting statistical significance for decades. In 2019, the academic journal Nature published a letter, signed by more than 800 researchers and other professionals from fields that rely on statistical modeling, that called “for the entire concept of statistical significance to be abandoned.”
The same year, The American Statistician, a journal of the American Statistical Association, published Statistical Inference in the 21st Century: A World Beyond p < 0.05 — a special edition featuring 43 papers dedicated to the issue. Many propose alternatives to using p-values and designated thresholds to test for statistical significance.
“As we venture down this path, we will begin to see fewer false alarms, fewer overlooked discoveries, and the development of more customized statistical strategies,” three researchers wrote in an editorial that appears on the front page of the issue. “Researchers will be free to communicate all their findings in all their glorious uncertainty, knowing their work is to be judged by the quality and effective communication of their science, and not by their p-values.
John Ioannidis, a Stanford Medicine professor and vice president of the Association of American Physicians, has argued against ditching the process. P-values and statistical significance can provide valuable information when used and interpreted correctly, he wrote in a 2019 letter published in JAMA. He acknowledges improvements are needed — for example, better and “less gameable filters” for gauging significance. He also noted “the statistical numeracy of the scientific workforce requires improvement.”
Professors Deborah Mayo of Virginia Tech and David Hand of Imperial College London asserted that “recent recommendations to replace, abandon, or retire statistical significance undermine a central function of statistics in science.” Researchers need, instead, to call out misuse and avoid it, they wrote in their May 2022 paper, Statistical Significance and Its Critics: Practicing Damaging Science, or Damaging Scientific Practice?
“The fact that a tool can be misunderstood and misused is not a sufficient justification for discarding that tool,” they wrote.
Denise-Marie Ordway is managing editor of The Journalist’s Resource which she joined in 2015 after working as a reporter for newspapers and radio stations in the US and Central America. Her work has appeared in USA TODAY, The New York Times, and The Washington Post. She was a Pulitzer Prize finalist in 2013 and a 2014-15 Harvard Nieman Fellow.