I think this misses the point, since the problem is[1] less "One guy got made psychotic by 4o." and more "A guy who got some kind of AI-orientated psychosis was allowed to continue to make important decisions at an AI company, while still believing a bunch of insane stuff."
Conditional on the story being true
Recently I was talking to someone about Pause AI's activities in the UK. They asked something like
"Even if Kier Starmer (UK Prime Minister) decided to pause, what would that do?"
And I responded:
"We would have the resources of an entire nation state pointed towards our cause."
Canada (all this applies to the UK too) has a lot more resources to throw around than MIRI, Lighthaven, OpenPhil, and Pause/Control/Stop AI all put together. Canada is a founding NATO member. If Canada's government, and military intelligence leaders, were "Doomer-pilled" on AI, they could do a lot more than we have done.
I recently, rather embarrassingly, made a post with a massive error which an LLM would have found immediately. I seriously misread a paper in a way that cut/pasting the paper and the post into Claude and asking "any egregious misreadings" would have stopped me from making that post. This is far too useful for me to turn down, and this kind of due diligence is +EV for everyone.
The presence of induction heads would be a decent answer. If you show a model ...[A][B]...[A] It will often complete it with [B]. One specific mechanism which can cause this is well-studied, but the general reason is that this pattern is really common in lots of different types of pretraining data.
Books are quite long. I think we should go with the shortest possible description of his life.
I saw someone (cannot find source) say that "Eliezer wrote 300% of the book, and then Nate wrote -200%". I've not read much of Nate's writing other than on LessWrong, so I don't know if he's solely responsible for the change in tone, or if Eliezer was writing more normie-ish to begin with, or what.
The hypothetical ammonia-reduction-in-shrimp-farm intervention has been touted as 1-2 OOMs more effective than shrimp stunning.
I think this is probably an underestimate, because I think that the estimates of shrimp suffering during death are probably too high.
(While I'm very critical of all of RP's welfare range estimates, including shrimp, that's not my point here. This argument doesn't rely on any arguments about shrimp welfare ranges overall. I do compare humans and shrimp, but IIUC this sort of comparison is the thing you multiply by the welfare range estimate to get your utility value, if you're into that)
(If ammonia interventions are developed that are really 1-2 OOMs better than stunning, then even under my utility function it might be in the same ballpark as the campaign against cage-free eggs and other animal charities)
Shrimp stunning, as an intervention, attempts to "knock out" shrimp before they're killed by being placed on ice, which has been likened to "suffocating in a suitcase in the antarctic".
I think that's a bad metaphor. Humans are endotherms and homeotherms, which means we maintain a constant internal body temperature which we generate internally. If it drops, a bunch of stress responses are triggered: shivering, discomfort, etc. which attempt to raise our temperature. Shrimp are poikilotherms, meaning they don't regulate body temperature much at all. This means they don't have the same stress responses to cold that we do.
(I also doubt that, given they have no lungs and can basically never "not breathe" in their normal environment, they'd experience the same stress from asphyxiation that we do, but this is weaker)
I would guess that being thrown onto ice effectively "stuns" the shrimp pretty quickly, as their metabolism---and therefore their synaptic activity---drops with their body temperature.
That seems roughly right but I don't think chaos/revolution is necessary for this thesis.
Imagine the world sitting in a deep basin. You're trying to get it out. You can try a revolution: launch it up into the air and hold your breath waiting for it to land.
Or you can try, sisyphus-style, to roll it out. You might get a bunch of allies, who you try and lead along a narrow and precarious ridgeline to your destination. Then, when you're halfway to the top, some might notice that there's a much easier path requiring much less of a push, and so the world gets pushed down off your intended route along the mountain pass and into a new crater.
The latter seems to have been the case with environmentalism and AI Safety. They pushed the world fairly gradually, but it only takes a few individuals to push the world downwards into a new equilibrium, for the upward pushers to lose that fight.
(There's a Yudkowsky tweet somewhere which says something like this, which I've basically based the entire metaphor here on)
I think we have evidence that this is not the primary thing going on, because there are parts of the evaluation where the model lacks context, could easily face an identical situation in deployment, and thus can’t know it is being evaluated.
This in particular is not strong evidence, since the optimal strategy for a schemer is usually misbehave only very rarely, when it's very confident that it isn't in an evaluation. This is because the number of deployment runs will be >> the number of evaluations. If it's unable to tell, then it should probably just not misbehave in that instance.
What someone spends time writing is an important signal. More generally, what someone is interested in---and spend time on---is one of the most powerful signals as to what they'll do in future.
"Real views" is a bit slippery of a concept. But I strongly predict that most of their outputs in future will look like it's taken "AI will happen and we need to be there" as a primitive, and few will look like they're asking themselves "Should AI happen?". Because the latter question is just not a thought that interests them.