A concise defense of statistical significance

statistics reproducibility

Recent arguments against the use of p-values and significance testing are mostly weak. The weak ones are actually arguments against making decisions or mistakes in general, which is impossible.

Joshua Loftus https://joshualoftus.com/

A letter, signed by over 800 scientists and published in Nature called for an end to using p-values to decide whether data refutes or supports a scientific hypothesis. The letter has received widespread coverage and reignited an old debate.

Weaker arguments

Most of the objections to p-values or the p < 0.05 threshold in these articles can be summarized into two categories:

Banning p-values or “p < 0.05” thresholds wouldn’t address these objections. We will still have to make decisions, we can’t just report a Bayes factor (or a p-value) and refuse to decide whether a drug trial should continue or not. So our decisions will still sometimes be wrong, and in both directions.

Stronger arguments

The last kind of objection is more sensible–though less often the focal point of this debate–and I would summarize it as:

I more or less agree with this, but it is not an objection to using p-values or p < 0.05, it’s an objection to misusing them or using them in isolation.

And any method can be misused! A scientist could do many tests and only report the significant results, or they could compute many effect size estimates or Bayes factors and only report the largest ones.

Real roots of the problem

I think there are two underlying problems that the p-value debate is mostly a distraction from: the rote, uncritical application of any kind of statistical method (because, as Jack Schwartz put it, “… the simple-mindedness … to dress scientific brilliancies and scientific absurdities alike in the impressive uniform of formulae and theorems. Unfortunately however, an absurdity in uniform is far more persuasive than an absurdity unclad.”), and the incentive structure of scientific publication.

From Why most published research findings are false (Ioannidis):

The burden of the burden of proof

Lastly, it’s unfortunate but amusing that the Herald article mentioned many people misunderstand p-values, and the New Yorker article had this correction: “An earlier version of this article incorrectly defined p-value.”

I sympathize with researchers who have done good work but can’t get it published because the result isn’t “significant.” This is a problem with publication standards, not statistical methods. It should be normal to publish negative or unimpressive results, otherwise the literature has little to teach us about what doesn’t work.

It’s also a resource issue. Collecting data is costly. There are many barriers to research other than arbitrary publication standards involving p-values, including many other arbitrary publication standards. While discontent has been focused on p-values, we may find ourselves suddenly facing de facto expectations of applying (often unnecessary) “artificial intelligence” to massive datasets in order to be published.

No particular statistical methodology is the cause or solution of these issues. (More and better statistics education is necessary!)


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

Loftus (2019, Nov. 18). Neurath's Speedboat: A concise defense of statistical significance. Retrieved from http://joshualoftus.com/posts/2020-12-21-concise-defense-of-statistical-significance/

BibTeX citation

  author = {Loftus, Joshua},
  title = {Neurath's Speedboat: A concise defense of statistical significance},
  url = {http://joshualoftus.com/posts/2020-12-21-concise-defense-of-statistical-significance/},
  year = {2019}