I blog at https://dynomight.net where I like to destroy my credibility by claiming that incense and ultrasonic humidifiers are bad for you.

Wiki Contributions


Thanks, you've 100% convinced me. (Convincing someone that something that (a) is known to be true and (b) they think isn't surprising, actually is surprising is a rare feat, well done!)

Chat or instruction finetuned models have poor prediction cailbration, whereas base models (in some cases) have perfect calibration.


Tell me if I understand the idea correctly: Log-loss to predict next token leads to good calibration for single token prediction, which manifests as good calibration percentage predictions? But then RLHF is some crazy loss totally removed from calibration that destroys all that?

If I get that right, it seems quite intuitive. Do you have any citations, though?

Sadly, no—we had no way to verify that.

I guess one way you might try to confirm/refute the idea of data leakage would be to look at the decomposition of brier scores: GPT-4 is much better calibrated for politics vs. science but only very slightly better at politics vs. science in terms of refinement/resolution. Intuitively, I'd expect data leakage to manifest as better refinement/resolution rather than better calibration.

That would definitely be better, although it would mean reading/scoring 1056 different responses, unless I can automate the scoring process. (Would LLMs object to doing that?)

Thank you, I will fix this! (Our Russian speaker agrees and claims they noticed this but figured it didn't matter 🤔) I re-ran the experiments with the result that GPT-4 shifted from a score of +2 to a score of -1.

Well, no. But I guess I found these things notable:

  • Alignment remains surprisingly brittle and random. Weird little tricks remain useful.
  • The tricks that work for some models often seem to confuse others.
  • Cobbling together weird little tricks seems to help (Hindi ranger step-by-step)
  • At the same time, the best "trick" is a somewhat plausible story (duck-store).
  • PaLM 2 is the most fun, Pi is the least fun.

You've convinced me! I don't want to defend the claim you quoted, so I'll modify "arguably" into something much weaker.

I don't think I have any argument that it's unlikely aliens are screwing with us—I just feel it is, personally.

I definitely don't assume our sensors are good enough to detect aliens. I'm specifically arguing we aren't detecting alien aircraft, not that alien aircraft aren't here. That sound like a silly distinction, but I'd genuinely give much higher probability to "there are totally undetected alien aircraft on earth" than "we are detecting glimpses of alien aircraft on earth."

Regarding your last point, I totally agree those things wouldn't explain the weird claims we get from intelligence-connected people. (Except indirectly—e.g. rumors spread more easily when people think something is possible for other reasons.) I think that our full set of observations are hard to explain without aliens! That is, I think P[everything | aliens] is low. I just think P[everything | no aliens] is even lower.

I know that the mainstream view on Lesswrong is that we aren't observing alien aircraft, so I doubt many here will disagree with the conclusion. But I wonder if people here agree with this particular argument for that conclusion. Basically, I claim that:

  • P[aliens] is fairly high, but
  • P[all observations | aliens] is much lower than P[all observations | no aliens], simply because it's too strange that all the observations in every category of observation (videos, reports, etc.) never cross the "conclusive" line.

As a side note: I personally feel that P[observations | no aliens] is actually pretty low, i.e. the observations we have are truly quite odd / unexpected / hard-to-explain-prosaically. But it's not as low as P[observations | aliens]. This doesn't matter to the central argument (you just need to accept that the ratio P[observations | aliens] / P[observations | no aliens] is small) but I'm interested if people agree with that.

I get very little value from proofs in math textbooks, and consider them usually unnecessary (unless they teach a new proof method).


I think the problem is that proofs are typically optimized for "give most convincing possible evidence that the claim is really true to a skeptical reader who wants to check every possible weak point". This is not what most readers (especially new readers) want on a first pass, which is "give maximum possible into why this claim is true for to a reader who is happy to trust the author if the details don't give extra intuition." At a glance, infinite Napkin seems to be optimizing much more for the latter.

Load More