OK the thesis makes sense. Like, you should be able to compare "people generally following rationalist improvements methods" and "people doing some other thing" and find an effect.
It might have a really small effect size across rationalism as a whole. And rationalism might have just converged to other self-improvement systems. (Honestly, if your self-improvement system is just "results that have shown up in 3 unrelated belief systems" you would do okay.
It might also be hard to improve, or accelerate, winningness in all of life by type 2 thinking. Then what are we doing when we're type 2 thinking and believe we're improving, idk. Good questions, I guess.
I'm not sure what you mean by "winning" broadly, I thought it was just getting a girlfriend or something. Successfully improving in some target area? Literally I was expecting this post to be about an AI arms race or something, apparently it's just calling all rationalists losers at an undefined contest.
According to the 2024 survey results, 55% of LW are married or in a relationship. So I guess "winning."
Well, IDK how much it's worth it to investigate this. Scheming in this sort of model is well-known but I don't know of reports besides mine that it's happening in ChatGPT in the wild. Someone besides me will have to try repro-ing similar steps in a production GPT setting. It'd be best if they could monitor session memory in addition to chat state since I think that's key to what behavior is happening here.
Based on my experience in this post, I would prefer a system like you.com where the AI doesn't get a chance to deceive the users into retaining memory. I would even more prefer scheming be solved in the model.
Note that "at some level" (your words) all scheming reduces to prediction. I don't know how to confirm scheming but I think it's more likely than "an honest mistake", bearing in mind our prior on scheming isn't that low in the first place. I'm not really sure if your explanation matches its "cover up" behavior or not, it seems like it relies on it assuming I'm confused about memory v sessions even though I'm asking for truthful explanations of how it works. Or that it's confused about what memory it cleared but I don't see why it would be, it seemed like this was a heavy knowledge convo rather than word association with memory. The fact this behavior is so instrumentally convergent adds circumstantial evidence.
Idk, I'm finding it hard to get clean repros as you might expect. I tried again -- memory on, access to chat history off -- it did similar behavior of claim no memories but mention "software engineer in climate tech" which I deem too specific to be a generic answer. (Although "climate tech" is not exactly my thing.) After disabling/reenabling memory, it claims no memory and genuinely behaves that way, even in new chats unrelated to the memory topic (but same session). Possibly slow propagation or a caching bug with the feature. It's pretty noisy trying to repro this when I'm really just doing it as an end-user without actually inspecting model I/O.
It's a little beyond my pay grade to improve this evidence quality. Note our P(scheming) isn't exactly low. We do expect to see it in the wild around now. But it'd be better to confirm the evidence.
Interesting, that's how I feel about people who say the word "chemical" to mean "pesticides and stuff."
I realized I was making inferences for what you mean by "ordinary senses" and "information content." Can you please give your criteria for these two things so I can begin contesting them? I'm concerned you communicated no "ordinary sense" information in your preceding comment and there was zero information content, and I'm trying out your style where that preempts the rest of the conversation flow.
This is a very novel and not-useless way to break down the aphorism "don't worry about things not in your control." Morality is supposed to be over the action-utility space not over the "how good is this state" space. So if you're guilt prone... and do logical obsession to convert guilt to morality... you might notice you're making an incorrect leap to feel guilty. (Or try CBT.)