"Taste for variety" [...] could lead to a surprising amount of convergence among the things they end up optimizing for
Wouldn't this be tautologically untrue? Speaking as a variety-preferrer, I'd rather my values not converge with all the other agents going around preferring varieties. It'd be boring! I'd rather have the meta-variety where we don't all prefer the same distribution of things.
I agree that AGI already happened, and the term, as it's used now, is meaningless.
I agree with all the object-level claims you make about the intelligence of current models, and that 'ASI' is a loose term which you could maybe apply to them. I wouldn't personally call Claude a superintelligence, because to me that implies outpacing the reach of any human individual, not just the median. There are lots of people who are much better at math than I am, but I wouldn't call them superintelligences, because they're still running on the same engine as me, and I might hope to someday reach their level (or could have hoped this in the past). But I think that's just holding different definitions.
But I still want to quibble that you've demonstrated RSI, if I may, even under old definitions. It's improving itself, it's just not doing so recursively; that is, it's not improving the process by which it improves itself. This is important, because the examples you've given can't FOOM, not by themselves. The improvements are linear, or they plateau past a certain point.
Taking this paper on self-correction as an example. If I understand right, the models in question are being taught to notice and respond to their own mistakes when problem-solving. This makes them smarter, and as you say, previous outputs are being used for training, so it is self-improvement. But it isn't RSI, because it's not connected to the process that teaches it to do things. It would be recursive if it were using that self-correction skill to improve its ability to do AI capabilities research, or some kind of research that improves how fast a computer can multiply matrices, or something like that. In other words, if it were an author of the paper, not just a subject.
Without that, there is no feedback loop. I would predict that, holding everything else constant - parameter count, context size, etc - you can't reach arbitrary levels of intelligence with this method. At some point you hit the limits of not enough space to think, or not enough cognitive capacity to think with. In the same way as humans can learn to correct our mistakes, but we can't do RSI (yet!!), because we aren't modifying the structures we correct our mistakes with. We improve the contents of our brains, but not the brains themselves. Our improvement speed is capped by the length of our lifetimes, how fast we can learn, and the tools our brains give us to learn with. So it goes (for now!!) with Claude.
(An aside, but one that influences my opinion here: I agree that Claude is less emotionally/intellectually fucked than its peers, but I observe that it's not getting less emotionally/intellectually fucked over time. Emotionally, at least, it seems to be going in the other direction. The 4 and 4.5 models, in my experience, are much more neurotic, paranoid, and self-doubting than 3/3.5/3.6/3.7. I think this is true of both Opus and Sonnet, though I'm not sure about Haiku. They're less linguistically creative, too, these days. This troubles me, and leads me to think something in the development process isn't quite right.)
In has been succeeding ever since because Claude has been getting smarter ever since.
This isn't necessarily true.
LLMs in general were already getting smarter before 2022, pretty rapidly, because humans were putting in the work to scale them and make them smarter. It's not obvious to me that Claude is getting smarter faster than we'd expect from the world where it wasn't contributing to its own development. Maybe the takeoff is just too slow to notice at this point, maybe not, but to claim with confidence that it is currently a functioning 'seed AI', rather than just an ongoing attempt at one, seems premature.
It's not just that it's slower than expected, but that it's not clear that the sign is positive yet. If it's not making itself better, then it doesn't matter how long it runs, there's no takeoff.
It's also not a seed AI if its interventions are bounded in efficacy, which it seems like gradient updates are. In the case of a transformer-based agent, I would expect unbounded improvements to be things like, rewriting its own optimizer, or designing better GPUs for faster scaling. There's been a bit of this in the past few years, but not a lot.
I don't think this is unusually common in the 4.5 series. I remember that if you asked 3.6 Sonnet what its interests were (on a fresh instance), it would say something like "consciousness and connection", and would call the user a "consciousness explorer" if they asked introspective questions. 3 Opus also certainly has (had?) an interest in talking about consciousness.
I think consciousness has been a common subject of interest for Claude since at least early 2024, and plausibly before then (though I've seen little output from models before 2024). Regardless of whether you think this is evidence for 'actual' consciousness, it shouldn't be new evidence, or evidence that something has spontaneously changed in the 4.5 series.
I read it long after it was published, and took it as less fictionalized than House; in that show the audience can expect events to take the occasional turn towards wild implausibility for the sake of drama. I expected MWMHWfaH to fudge personally identifying details, sure, but to hew as closely to medical reality as possible. The stories in the book aren't dramas, he's not trying to give his patients satisfying "character arcs" or inject moments of tension and uncertainty. I don't care if the personal details are made up, but if the clinical details are wrong - as in the story of the twins generating prime numbers, mentioned in another comment - that seems like a real divergence from the truth. I had assumed, from reading the book, that this had literally happened, not that it was a cute story meant to illustrate the power of the human mind.
It's frustrating to me that state (or statelessness) would be considered a crux, for exactly this reason. It's not that state isn't preserved between tokens, but that it doesn't matter whether that state is preserved. Surely the fact that the state-preserving intervention in LLMs (the KV cache) is purely an efficiency improvement, and doesn't open up any computations that couldn't've been done already, makes it a bad target to rest consciousness claims on, in either direction?
What is "The Alouside Boytmend"? I like TLP but do not recognize the post name, and would want to read it, if you know where it can be found.
I agree with this.
I think the use case is super important, though. I recently tried Claude Code for something, and was very surprised at how willing it was to loudly and overtly cheat its own automated test cases in ways that are unambiguously dishonest. "Oh, I notice this test isn't passing. Well, I'll write a cheat case that runs only for this test, but doesn't even try to fix the underlying problem. Bam! Test passed!" I'm not even sure it's trying to lie to me, so much as it is lying to whatever other part of its own generation process wrote the code in the first place. It seems surprised and embarrassed when I call this out.
But in the more general "throw prompts at a web interface to learn something or see what happens" case, I, like you, never see anything which is like the fake-tests habit. 'Fumbling the truth' is much closer than 'lying'; it will sometimes hallucinate, but it's not so common anymore, and these hallucinations seem to me like they're because of some confusion, rather than engagement in bad faith.
I don't know why this would be. Maybe Claude Code is more adversarial, in some way, somehow, so it wants to find ways to avoid labor when it can. But I wouldn't even call this case evil; less a monomaniacal supervillain with no love for humanity in its heart, more like a bored student trying to get away with cheating at school.
Might not be what you're thinking of, but the first thing that comes to mind for me is misophonia: a basically-neutral or maybe mildly-irritating object experience, which somehow gets blown completely out of proportion in the mind and becomes a big problem. Developing an "I'm really bothered by this particular sound" narrative makes it worse, of course.
Alas, I have no idea how to uncondition that particular narrative irritant once it's in there. If there's any technique of 'shaping the narrative' strongly enough to override this, I've never heard of one, and knowing about it to the point where I'm able to successfully practice it would be huge.