He did not say that they made such claims on LessWrong, where he would be able to publicly cite them. (I have seen/heard those claims in other contexts.)
Curated! I found the evopsych theory interesting but (as you say) speculative; I think the primary value of this post comes from presenting a distinct frame by which to analyze the world, one which I and probably many readers either didn't have distinctly carved out or part of their active toolkit. I'm not sure if this particular frame will prove useful enough to make it into my active rotation, but it has the shape of something that could, in theory.
I've had many similar experiences. Not confident, but I suspect a big part of this skill, at least for me, is something like "bucketing" - it's easy to pick out the important line from a screen-full of console logs if I'm familiar with the 20[1] different types of console logs I expect to see in a given context and know that I can safely ignore almost all of them as either being console spam or irrelevant to the current issue. If you don't have that basically-instant recognition, which must necessarily be faster than "reading speed", the log output might as well be a black hole.
Becoming familiar with those 20 different types of console logs is some combination of general domain experience, project-specific experience, and native learning speed (for this kind of pattern matching).
Similar effect when reading code, and I suspect why some people care what seems like disproportionately much about coding standards/style/convention - if your codebase doesn't follow a consistent style/set of conventions, you can end up paying a pretty large penalty by absence of that speedup.
Made up number
Not having talked to any such people myself, I think I tentatively disbelieve that those are their true objections (despite their claims). My best guess as to what actual objection would be most likely to generate that external claim would be something like... "this is an extremely weird thing to be worried about, and very far outside of (my) Overton window, so I'm worried that your motivations for doing [x] are not true concern about model welfare but something bad that you don't want to say out loud".
This is, broadly speaking, the problem of corrigibility, and how to formalize it is currently an open research problem. (There's the separate question whether it's possible to make systems robustly corrigible in practice without having a good formalized notion of what that even means; this seems tricky.)
Thanks for the heads-up, I've fixed it in the post.
Curated! I think that this post is one of the best attempts I've seen at concisely summarizing... the problem, as it were, in a way that highlights the important parts, while remaining accessible to an educated lay-audience. The (modern) examples scattered throughout were effective, in particular the use of Golden Gate Claude as an example of the difficulty of making AIs believe false things was quite good.
I agree with Ryan that the claim re: speed of AI reaching superhuman capabilities is somewhat overstated. Unfortunately, this doesn't seem load-bearing for the argument; I don't feel that much more hopeful if we have 2-5 years to use/study/work with AI systems that are only slightly-superhuman at R&D (or some similar target). You could write an entire book about why this wouldn't be enough. (The sequences do cover a lot of the reasons.)
Mod note (for other readers): I think this is a good example of acceptable use of LLMs for translation purposes. The comment reads to me[1] like it was written by a human and then translated fairly literally, without performing edits that would make it sound unfortunately LLM-like (perhaps with the exception of the em-dashes).
"Written entirely by you, a human" and "translated literally, without any additional editing performed by the LLM" are the two desiderata, which, if fulfilled, I will usually consider sufficient to screen off the fact that the words technically came out of an LLM[2]. (If you do this, I strongly recommend using a reasoning model, which is much less likely to end up rewriting your comment in its own style. Also, I appreciate the disclaimer. I don't know if I'd want it present in every single comment; the first time seems good and maybe having one in one's profile after that is sufficient? Needs some more thought.) This might sometimes prove insufficient, but I don't expect people honestly trying and failing at achieving good outcomes here to substantially increase our moderation burden.
With the caveat that I only read the first few paragraphs closely and poked intermittently at the rest.
This doesn't mean the comment will necessarily be approved, but if I reject it, it probably won't be for that reason.