Tuesday, November 8, 2016

Ross Pomeroy Reminds us of P-Value Problems

But it is Much Worse

Ross Pomeroy’s article in yesterday’s Real Clear Science was a much needed reminder about the dangers of statistical hypothesis testing. But while Pomeroy rightly points out important problems, particularly with the so-called P-value, out here on the ground, the problem is much worse.

One of Pomeroy’s several legitimate concerns is the use of what is essentially a default value of 0.05 for P. Too often scientists don’t realize that, as David Colquhoun has pointed out, this will lead to false conclusions at least 30 percent of the time. Pomeroy also points out the common fallacy of interpreting the P value as the probability that the null hypothesis is true.

The result of such mishandling of hypothesis testing is that, “Quite simply, a large amount of published research is false.”

Would that it would end there. Unfortunately, when it comes to evolutionary studies, fixing these problems is like rearranging the deck chairs on the Titanic. These concerns about selecting a good alpha value and understanding the nuances of what P actually means, while important, pale in comparison to a much larger infraction: using the P-value to mask what is, in fact, a strawman argument.

One of the key, underlying, assumptions in using the P-value is that there are only two alternatives, the null and alternative hypotheses. These two hypotheses must be complementary—they must be distinct, mutually exclusive, and exhaustive. In other words, one of them must be true, and the other must be false. They cannot both be true, or both be false. They cannot overlap, and there can be no other possibilities.

And while such a perfect pair of hypotheses is possible in simple academic problems such as colored marbles in an urn, real world problems often are more complicated. Take something as seemingly simple as the question of whether or not it will rain today. Is it not binary? Either it will rain, or it will not rain. Right?

Well no. The weather has a multitude of complexities. It is spatially and temporally varying, with an infinite degree of variation. What if it sprinkles? What if the rain evaporates before it reaches the ground? How do you define the time and location? What if it rains in one location but not another?

What the P-value, and its null hypothesis, allows is for trivial null hypotheses to be erected and easily knocked down like strawmen, thus “proving” ones favored explanation.

1 comment:

  1. There is nothing wrong with statistical analysis, or the use of p values. The problem arises from their mis-use. The same can be said for r or r squared values in correlation.

    These are all extremely valuable tools. But they must be taken in context. Much like many of the quote mines used by ID proponents to (supposedly) debunk evolution.