Charlie Steiner

LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.


Alignment Hot Take Advent Calendar
Reducing Goodhart
Philosophy Corner

Wiki Contributions


Yes, since you don't expect the copy of you on planet A to go anywhere, it would be paradoxical to decrease your probability that you're on planet A.

Which is why you have a 100% chance of being on planet A. At least in the third-person, we-live-in-a-causal-universe, things-go-places sense. Sure, in the subjective, internal sense, the copy of you that's on planet A can have a probability distribution over what's outside their door. But in the sense physics cares about you have a 100% probability of being on planet A both before and after the split, so nothing went anywhere.

Subjectively, you always expected your estimate of what's outside the door to change at the time of the split. It doesn't require causal interaction at the time of the split because you're just using information about timing. A lot like how if I know the bus schedule, my probability of the bus being near my house "acausally" changes over time - except weirder because an extra copy of you is added to the universe.

  1. 1
  2. 1
  3. $50
  4. $33.33

I think there's certainly a question people want to ask when they talk about things like Q1 and Q2, but the standard way of asking them isn't right. If there is no magic essence of "you" zipping around from place to place in the universe, then the probability of "you" waking up in your body can only be 1.

My advice: rather than trying to hold the universe fixed and asking where "you" goes, hold the subjective information you have fixed and ask what the outside universe is like. When you walk out of the door, do you expect to see planet A or planet B? Etc.

I guess the big questions for me were "relative to what?", "did the board have good-faith reasons?", and "will the board win a pyrrhic victory or get totally routed?"

At the time of answering I thought the last two answers were: the board probably had plausible reasons, and they would probably win a pyrrhic victory.

Both of these are getting less likely, the second faster than the first. So I think it's shaping up that I was wrong and this will end up net-negative.

Relative to what? I think I mean "relative to the board completely checking out at their jobs," not "relative to the board doing their job masterfully," which I think would be nice to hope for but is bad to compare to.

I mean, by that standard I'd say Elon Musk is the biggest name in AI. But yeah, jokes aside I think bringing on Altman even for a temporary period is going to be quite useful for Microsoft attracting talent and institutional knowledge from OpenAI, as well as reassuring investors.

What you are doing is training the AI to have an accurate model of itself, used with language like "I" and "you". You can use your brain to figure out what will happen if you ask "are you conscious?" without having previously trained in any position on similarly nebulous questions. Training text was written overwhelmingly by conscious things, so maybe it says yes because that's so favored by the training distribution. Or maybe you trained it to answer "you" questions as about nonfiction computer hardware and it makes the association that nonfiction computer hardware is rarely conscious.

Basically, I don't think you can start out confused about consciousness and cheat by "just asking it." You'll still be confused about consciousness and the answer won't be useful.

I'm worried this is going to lead, either directly or indirectly, to training foundation models to have situational awareness, which we shouldn't be doing.

And perhaps you should be worried that having an accurate model of onesself, associated with language like "I" and "you", is in fact one of the ingredients in human consciousness, and maybe we shouldn't be making AIs more conscious.


I think one lesson of superposition research is that neural nets are the compressed version. The world is really complicated, and NNs that try to model it are incentivized to try to squeeze as much in as they can.

I guess I also wrote a hot take about this.

Load More