Notes On Berger And Sellke, Casella And Berger

[BY THE WAY, WHEN YOU WRITE THIS UP FORMALLY DON’T FORGET TO POINT OUT THAT THESE ARE DIFFERENT BERGERS. JASON 2017-08-08–11-59-10]

The main point that comes out of Berger and Sellke (B & S) and Casella and Berger (C & B) is that prior distributions that are concentrate mass at a point null are way more likely to generate p-values that overestimate the likelihood that the null is true (where “likelihood” has an epistemic interpretation). This is hardly surprising. But it is important as it means that p-values are consistent with LP-procedures only for highly specific prior distributions.

Suppose that the priors we are interested in are ‘impartial’ priors. Impartial priors are in some sense not biased in favour of any particular hypothesis. The word ‘bias’ is infelicitous (unless it has a technical meaning I’m unaware of!). It suggests that the prior does not systematically undervalue certain facts. Instead, an impartial prior may be best understood as a prior that incorporates no evidence for or against the null (or any other hypothesis). It might be useful to focus on impartial priors as they may approximate a researcher’s prior beliefs about a theory which has not been studied much or about which there is little agreement in the literature. I emphasise “might” be useful - I can’t think of any other reason why you would care about whether the results from LP-procedures for impartial priors only.

[THAT WOULD BE A REALLY GOOD TOPIC TO THINK ABOUT FOR A SEPARATE PROJECT, THINKING ABOUT IT IN DIFFERENT CONTEXTS LIKE FOR EXAMPLE LEGAL CONTEXTS IN WHICH YOU MIGHT, ARGUABLY, WANT AN OBJECTIVE DEFINITION OF IMPARTIALITY. Jason 2017-08-08]

At first blush, it looks like an impartial prior is a flat (or nearly flat) prior. That is, every possible (or plausible) hypothesis is weighted equally. The trouble is that, given a flat prior, the prior probability of the null is negligible. Possibly 0 in the continuous case. And it is hard to see how any observation could raise the likelihood of the null by much, even if it were close to the mode of the null. So the p-value would seriously underestimate the likelihood of the null. (This is also probably an objection to all point null significance testing, not just traditional.)

[RIGHT. WHICH IS ONE OF SEVERAL REASONS WHY BAYESIANS DON’T OFTEN TRY TO DO SIGNIFICANCE TESTING.

ALSO, A FLAT PRIOR IN ONE PARAMETERISATION IS NOT FLAT IN ANOTHER, AND NOBODY (I THINK) BELIEVES THAT YOUR CONCLUSIONS SHOULD DEPEND ON YOUR PARAMETERISATION. (THIS IS YET ANOTHER ARGUMENT IN FAVOUR OF THE LP, BY THE WAY, BUT THAT’S ANOTHER STORY.) ONE WAY TO SEE THIS IS TO CONSIDER THE BERTRAND PARADOX — HAVE A QUICK SKIM OF https://en.wikipedia.org/wiki/Bertrand_paradox_(probability). THERE’S A BIG LITERATURE ON THIS, ALTHOUGH I THINK A LOT OF THE LITERATURE IS NOT WORTH READING BECAUSE IT’S GOING ROUND IN CIRCLES (HA HA). Jason 2017-08-08]

One way of responding to this objection is to deny that p really does tell you the probability that the observed results would have occurred if the null were true in the actual population.

[????? THAT IS NOT THE DEFINITION OF A P VALUE. I THINK WHAT YOU’RE THINKING IS RIGHT, BUT I CAN THINK OF TWO DIFFERENT THINGS YOU MIGHT MEAN HERE, AND I’M TOO LAZY TO TYPE THEM BOTH OUT — SORRY — SO IF YOU CAN’T EASILY SEE HOW TO REWORD THIS TO BE ACCURATE THEN LET’S DISCUSS IT VERBALLY. Jason 2017-08-08]

[Sorry, that was sloppy. I should have said this: “One way of responding to this objection is to deny that p can only tell us anything about the actual population. That is, to deny that p is really just the probability that the observed results (or something more extreme) would have occurred if the sampling distribution of the actual population mean were the null distribution.” Does that cut it? Mitch 2017-08-08]

[THAT TOTALLY CUTS IT FOR THE POINT YOU’RE TRYING TO MAKE. BUT ON THE OTHER HAND THE DEFINITION OF A P-VALUE IS PRETTY MUCH FIXED IN STONE BY THE FACT THAT IT’S CALCULATED BY COMPUTER PROGRAMS, SO IF ANYONE WANTS IT TO MEAN ‘IN THE LONG RUN’ THEN THEY HAVE TO SAY THAT EXPLICITLY. NO? JASON 2017-08-08–11-57-41]

[You’re right, it’s not a question of how the p-value is defined. p is the the probability that the observed result or something more extreme occurred under the null distribution of some population mean.

[ACTUALLY IT’S NOT EVEN THAT. A P-VALUE IS THE PROBABILITY THAT THE TEST STATISTIC APPLIED TO THE OBSERVED RESULT OR SOMETHING MORE EXTREME OCCURS ACCORDING TO THE DISTRIBUTION OF THE TEST STATISTIC ON THE NULL HYPOTHESIS. IN SOME CASES IT’S OBVIOUS WHAT THE TEST STATISTIC SHOULD BE, BUT IN OTHER CASES IT’S NOT AT ALL, SO THE WHOLE DEFINITION MATTERS.

A FUN PARTY TRICK IS TO RANDOMLY ASK SCIENTISTS TO DEFINE “P VALUE”. YOU CAN PRETTY MUCH GUARANTEE THEY’LL GET IT WRONG, UNLESS THEY’RE THEORETICAL STATISTICIANS. EVEN APPLIED STATISTICIANS OFTEN GET IT WRONG. JASON 2017-08-10–11-26-56]

There is nothing in the definition that says that this population must be the actual one. It could be the mean of “all possible populations (I wonder if there’s a better way of expressing this). The question is whether the p-value is equivalent to an LP-procedure (e.g. the Bayes factor), or induces similar behaviours/cognitive states. And that depends on whether it is sensible to assign H0 a high enough prior probability. 2017-08-08–17-32]

[RIGHT. I AGREE. JASON 2017-08-10–11-27-27]

Maybe the null hypothesis is really the hypothesis that there is no effect in the long-run. If this were true, then the sampling distribution of the null would not only consist of possible samples of the actual population. It would consist of samples from all possible populations. The actual population only a realisation of the random quantity of possible populations. It is plausible that the null hypothesis would be true in the long run if there were no causal relationship between two variables under consideration. So the prior distribution would concentrate mass at the null.

[I LIKE THIS. Jason 2017-08-08]