spacer spacer spacer meaningless logo

Some thoughts on Jason's argument in Statistical Inference chapter 7, "Are p-values informative about H?" These thoughts are not immediately relevant to the research question. They are not so much about whether traditional significance testing is inconsistent with LP. It is more about whether statistical significance testing induces strange behaviours because it seems to be inconsistent with LP.

As I understand it, Jason's argument goes like this. The intuition behind traditional significance testing is that, if I observe something that I probably would not have observed if the null hypothesis were true, then I have a reason to reject the null hypothesis. However, the probability of observing some results if the null were true does not have much to do with the value of p. If you fiddle a bit with the null hypothesis sampling distribution, you can increase or decrease the value of p without changing the probability of the observed results under the null. So significance testing doesn't do what we want it to do. The reason it doesn't is because it is inconsistent with LP.

I think this argument goes too easy on significance testing. In most cases (at least in the social sciences), all possible observations count as strong evidence against LP. [Do you mean against H? Jason 2017-08-03] Where the variable under consideration is continuous, the probability of observing any given value of the variable is 0 (or infinitesimal?). Where the variable is discrete but can take on many different values (e.g. GDP) then the probability of observing any given value is close to 0. In both cases, the null should be rejected. (Jason, I think you allude to this point on p. 203 when you talk about the "continuous case" but don't develop it. Please tell me if I've got my maths horribly wrong.)

I think it's pretty clear from this argument that traditional significance testing doesn't adhere to LP. If significance testing is defensible, it's not because it captures our intuition that we should reject hypotheses if they fail to predict observations that have occurred. Significance testing would have to offer similar advice to another measure that did capture this intuition (perhaps Bayes factors, or something of that description). Even if traditional significance testing is not the most informative procedure, at least many scientific findings would be left intact.

[Yes, I agree. Now a tricky point is that I think a similar argument applies to confidence intervals, but it's harder to state for confidence intervals ... at least, I find it harder to state for confidence intervals. Which is one reason I like to go via the LP, because then it's clear how arguments apply to both. But I agree that's not the only way to do it. Jason 2017-08-03]