Eggs

P-Values Know When You're Cheating

7mo

Don't judge a principle by its professors—look to its practitioners.^[1]

"Professor" is an interesting word. At one point in my professional life, I had the opportunity to teach college classes. I often corrected students who called me "Professor Eggs," telling them I was just "Mr. Eggs." "Professor" was a title and a high status I hadn’t earned. But at the same time, the root "profess" often carries the opposite implication. To profess means to declare, sometimes loudly, sometimes without credibility. A profess-er, in this light, sounds less like a scholar and more like a huckster.

An intriguing contrast might be the word "practitioner." Connotatively, it sounds humble, even lowly, the opposite of the high-minded... (read 411 more words →)

9mo

This post is not "the solution to p-hacking" or even "the problem with p-hacking". But I had some recent insights into thinking behind the p-value approach and wanted to see how they apply to p-hacking as an issue. While p-values are a flawed approach to reporting final results, they do provide a useful heuristic for comparing methodologies.

Basic idea: P-values implicitly punish sloppy methodology.

Suppose there are two different ways of testing a hypothesis that happen to yield the same results. When updating our expectations of future experience we won't see a difference because they have the same results. However, if we use p-values to measure our confidence in the result they will differ relative... (read 353 more words →)

Eggs9mo

Interesting. Presumably if Bessel never got the results he wanted, he could (assuming he's honest) continue until the negative data was enough to convince himself that he was wrong. Depending on his prior that might not happen, he could run out of money or motivation before he gave up and published a negative result. Avoiding this seems related to issues about publishing negative results and timely reporting of raw data.

With regards to the biased reporting, I'll just mention that we would have to adjust for known bias wether we were using Bayesian or frequentist methods.

Eggs9mo

To me the fact that they feature so prominently raises the question of how much certain commitments to "bayesianism" reflect actual usage of bayesian methods vs a kind of pop-science version of bayesianism.

This is a valid concern. I'm new here and just going through the sequences (though I have a mathematics background), but I have yet to see a good framing of bayesian/frequentist debate as maximum likelihood vs maximum a-posteriori. (I welcome referrals)

I think people like to make these "methodological" critiques

Yes, there is a methodological critique to strict p-value calculations, but in the absence of informative priors p values are a really good indicator for experiment design. I feel that in hyping up Bayesian updates people are missing that and not offering a replacement. The focus on methods is a strength when you are talking about methods.

Eggs9mo

and that actually both experimenters precommitted to treat at least 100 patients.

That would be an interesting wrinkle. I haven't read the original source. Supposing this I would think Mr. Frequentist would still say Bessel is still more likely to be fooled by unlikely data than George (in the positive only direction) but honestly only by a very small amount. One could call that the trade-off for a method that won't be fooled by unlikely negative data.

Actually, why?

I was treating the description of Bessel and having a distinct stopping condition of 70%, otherwise he would have stopped at 69.7% like you said. If he was doing the tests one at a time we know 70.7% at 99 didn't occur because he stopped at 100.

Eggs9mo

Correct, in reality the world doesn't change if we reorder our results. The point is that for a frequentist it feels like it should. Because the method is flawed, it seems right for the result to be less right. This is a bad way of analyzing results, but not as bad a way to evaluate methodologies.

Your valid concern about corrupted results stems from the correlation between bad behavior and what a frequentist calls a bad methodology.

Bessel's methodology is not inherently bad either. If Bessel believed that the treatment would save lives and needed to keep going to prove it, wouldn't he behave the same way?

We need a Bayesian methodology that can help evaluate methodology with and without informative priors. This probably already exists in literature, but we won't be able to overcome the use of p-values until it is common knowledge.

9mo

Today I experienced the Sequences post Beautiful Probability for the first time. I will begin by quoting the quotation that Eliezer began with:

Let me introduce this issue by borrowing a complaint of the late great Bayesian Master, E. T. Jaynes (1990):
"Two medical researchers use the same treatment independently, in different hospitals. Neither would stoop to falsifying the data, but one had decided beforehand that because of finite resources he would stop after treating N=100 patients, however many cures were observed by then. The other had staked his reputation on the efficacy of the treatment, and decided he would not stop until he had data indicating a rate of cures definitely greater than

... (read 758 more words →)

Replying toFish and Faces

Eggs10mo

Fish and Faces

Exactly. For the purposes of the post I framed it as a single interaction, but my honest response would be 'Cool. Do you have a video?"

While I recommend looking up the real thing or news, in case you are curious I started with a much longer version of the story before I realized it was distracting from the rest of the post:

"So I have an amazing fish. Marcy. She's an archerfish. You know, the kind of fish that can spit at flying insects? She can recognize human faces! You see, I trained her to spit at politicians.

"We play a game sometimes, where I tap a spot on some glass by the tank.... (read more)

Replying toFish and Faces

Eggs10mo

Fish and Faces

I totally agree with this for some/many use cases. I would caution against doing so in the following situations:

Novel situations - If a type of situation is new to you, the closest pattern may fail to match well.
Unanalyzed situations - If you have always used pattern matching to decide something, it would be valuable to verify the pattern, at least once, to make sure you are not missing or abusing opportunities.
Repeated situations - This is a bit trickier, but sometimes we have repeated applications of the same pattern over and over. It would still be useful to analyze these in depth every once in a while, to maintain the calibration of the heuristic.

In reality it is a balancing act, and it would be best to avoid over-reliance on either approach: over-analysis or pattern heuristics.

Fish and Faces

10mo

What would it take to convince you to come and see a fish that recognizes faces?

Note: I'm not a marine biologist, nor have I kept fish since I was four. I have no idea what fish can really do. For the purposes of this post, let's suppose that fish recognizing faces is not theoretically impossible, but beyond any reasonable expectation.

Imagine someone comes to you with this story:

"I have an amazing fish. Marcy. She's an archerfish—you know, the kind that can spit at insects? Well, I trained her to spit at politicians.
"It started as a simple game. I’d tap a spot, she’d spit at it, and get a treat. One day I put

... (read 577 more words →)

Replying toOne Night in Delphi

Eggs10mo

One Night in Delphi

Ha! Well done. I spent a week making sure my math was right and never thought of this. I agree that updating the truth probability is a better model of this situation, and I can confirm your numbers.

I suppose we could also update each day's success chance, with some kind of prior balancing updating truth probability vs. success probability. Though by that point we are likely no longer "simplifying".

One Night in Delphi

10mo

A rationality exercise.

Mikhail is a mathematician working on a particular conjecture. His goal is to prove it if true or find a counterexample if false. After thoroughly reviewing the evidence, he estimates — fairly and with well-calibrated confidence — that there's a 75% chance the conjecture holds.

One night, he dreams of the Oracle at Delphi.

After making proper obeisance Mikhail takes advantage of the opportunity and asks if the conjecture he is working on holds true. the Oracle delivers a surprisingly clear and specific prophecy: the conjecture is unequivocally true, and if Mikhail dedicates himself exclusively to proving it, he’ll succeed within three months. Upon realizing that he doesn't actually speak Greek, Mikhail... (read 677 more words →)

Exercising Rationality

A Short Diatribe on Hidden Assertions.

11mo

Or: Why thinking about blue tentacle arms is not always a waste of time.

0. Introduction

As I work through the Sequences, I find myself disagreeing—slightly—with a point Eliezer Yudkowsky makes in A Technical Explanation of Technical Explanation:

Imagine that you wake up one morning and your left arm has been replaced by a blue tentacle. The blue tentacle obeys your motor commands—you can use it to pick up glasses, drive a car, etc. How would you explain this hypothetical scenario? Take a moment to ponder this puzzle before continuing.

I took some time to think about it. Then I felt a bit betrayed when he continued:^[1]

How would I explain the event of my left arm

... (read 1142 more words →)

Replying toA Short Diatribe on Hidden Assertions.

Eggs11mo

"Doesn't exist, or doesn't give a fuck about suffering" is the answer that matches the data

I agree with you. (Though I might rephrase the second as 'doesn't care about suffering the way we do'. Either way, your point is valid.)

My point wasn't to say 'doesn't exist' is wrong, but that there is more than one possibility. If you or anyone has taken the time to evaluate the possibilities and come to the conclusion that 'doesn't exist' is the more likely / simple / predictive model, then I commend you. That is what rationality is about.

All I ask is the same courtesy as I might be exploring a different set of models than you are.

A Short Diatribe on Hidden Assertions.