Eggs — LessWrong

Quite so. I haven't read 'Skin in the game' yet, but it was recommended by a friend who read an early version of this post. It looks like it conveys this point exactly.

In response to the caution you referred to, I would agree. In reality we should only be watching practitioners, not listening to them. And then we can only treat the observation as Bayesian evidence.

One problem with this is that most people don't objectively summarize their lives and post all their consequences online. If we want to get more evidence than we can gather personally, we are going to have to listen to someone.

A Look Inside a Frequentist

Eggs7mo10

Interesting. Presumably if Bessel never got the results he wanted, he could (assuming he's honest) continue until the negative data was enough to convince himself that he was wrong. Depending on his prior that might not happen, he could run out of money or motivation before he gave up and published a negative result. Avoiding this seems related to issues about publishing negative results and timely reporting of raw data.

With regards to the biased reporting, I'll just mention that we would have to adjust for known bias wether we were using Bayesian or frequentist methods.

A Look Inside a Frequentist

Eggs7mo10

To me the fact that they feature so prominently raises the question of how much certain commitments to "bayesianism" reflect actual usage of bayesian methods vs a kind of pop-science version of bayesianism.

This is a valid concern. I'm new here and just going through the sequences (though I have a mathematics background), but I have yet to see a good framing of bayesian/frequentist debate as maximum likelihood vs maximum a-posteriori. (I welcome referrals)

I think people like to make these "methodological" critiques

Yes, there is a methodological critique to strict p-value calculations, but in the absence of informative priors p values are a really good indicator for experiment design. I feel that in hyping up Bayesian updates people are missing that and not offering a replacement. The focus on methods is a strength when you are talking about methods.

A Look Inside a Frequentist

Eggs7mo10

and that actually both experimenters precommitted to treat at least 100 patients.

That would be an interesting wrinkle. I haven't read the original source. Supposing this I would think Mr. Frequentist would still say Bessel is still more likely to be fooled by unlikely data than George (in the positive only direction) but honestly only by a very small amount. One could call that the trade-off for a method that won't be fooled by unlikely negative data.

Actually, why?

I was treating the description of Bessel and having a distinct stopping condition of 70%, otherwise he would have stopped at 69.7% like you said. If he was doing the tests one at a time we know 70.7% at 99 didn't occur because he stopped at 100.

A Look Inside a Frequentist

Eggs7mo30

Correct, in reality the world doesn't change if we reorder our results. The point is that for a frequentist it feels like it should. Because the method is flawed, it seems right for the result to be less right. This is a bad way of analyzing results, but not as bad a way to evaluate methodologies.

Your valid concern about corrupted results stems from the correlation between bad behavior and what a frequentist calls a bad methodology.

Bessel's methodology is not inherently bad either. If Bessel believed that the treatment would save lives and needed to keep going to prove it, wouldn't he behave the same way?

We need a Bayesian methodology that can help evaluate methodology with and without informative priors. This probably already exists in literature, but we won't be able to overcome the use of p-values until it is common knowledge.

Fish and Faces

Eggs8mo40

Exactly. For the purposes of the post I framed it as a single interaction, but my honest response would be 'Cool. Do you have a video?"

While I recommend looking up the real thing or news, in case you are curious I started with a much longer version of the story before I realized it was distracting from the rest of the post:

"So I have an amazing fish. Marcy. She's an archerfish. You know, the kind of fish that can spit at flying insects? She can recognize human faces! You see, I trained her to spit at politicians.

"We play a game sometimes, where I tap a spot on some glass by the tank. If she hits it, she gets a treat. We'd been doing this for a couple months when I put a TV behind the glass to see if she would react. She didn't really; not at first. I turned it to the news though, and on a lark tapped the glass where that politician was talking. You know, the boring, long-winded one? She spat at 'em and got her treat.

"So I thought, why not? What if I tap all politicians faces when they come on? It became part of our game. Marcy couldn't predict when I'd tap the glass because I couldn't. She got a game and treats and I got to watch the news. Win-win.

"But then one day I got bored. I wasn't tapping all the politician faces. It was that day, remember, where the thing happened? News was getting sound bites from everyone and their dog? Pelosi and Trump came up and Marcy nailed 'em both. Dead center. I was already giving her the treat before I realized that I hadn't tapped the glass.

"So I test her, you know? She spitting at anyone now? Talking heads? Nope. Snoop? Nope. Vance? Dead aim. AOC? A little to the left but Marcy still got a treat. Every politician got hit. Well, not that Canadian guy. But I don't think she's seen that one before. So I figure she can spot faces. I mark the target and she spits 'em when she sees 'em.

"So what do you say? Come see her, please? I want to know if she'll do it for you, you know? Before I call a fish scientist? Promise it will be a hoot."

Fish and Faces

Eggs8mo20

I totally agree with this for some/many use cases. I would caution against doing so in the following situations:

Novel situations - If a type of situation is new to you, the closest pattern may fail to match well.
Unanalyzed situations - If you have always used pattern matching to decide something, it would be valuable to verify the pattern, at least once, to make sure you are not missing or abusing opportunities.
Repeated situations - This is a bit trickier, but sometimes we have repeated applications of the same pattern over and over. It would still be useful to analyze these in depth every once in a while, to maintain the calibration of the heuristic.

In reality it is a balancing act, and it would be best to avoid over-reliance on either approach: over-analysis or pattern heuristics.

One Night in Delphi

Eggs8mo31

Ha! Well done. I spent a week making sure my math was right and never thought of this. I agree that updating the truth probability is a better model of this situation, and I can confirm your numbers.

I suppose we could also update each day's success chance, with some kind of prior balancing updating truth probability vs. success probability. Though by that point we are likely no longer "simplifying".

A Short Diatribe on Hidden Assertions.

Eggs9mo10

"Doesn't exist, or doesn't give a fuck about suffering" is the answer that matches the data

I agree with you. (Though I might rephrase the second as 'doesn't care about suffering the way we do'. Either way, your point is valid.)

My point wasn't to say 'doesn't exist' is wrong, but that there is more than one possibility. If you or anyone has taken the time to evaluate the possibilities and come to the conclusion that 'doesn't exist' is the more likely / simple / predictive model, then I commend you. That is what rationality is about.

All I ask is the same courtesy as I might be exploring a different set of models than you are.

Eggs' Shortform

Eggs9mo30

Could part of the issue with hallucination in LLMs be that they are often trained on Internet conversations where no one is likely to chime in with "I don't know"?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments