Like most other meaningful concepts, it's not really categorical or binary. Trying to set a threshold and categorize something that is more naturally on a spectrum or a point on several spectra causes a lot of disagreement because where to set thresholds and category boundaries is subjective.
Sure, people are going to disagree about exactly where to draw the boundaries of AGI, and yet AGI remains a useful concept, even if we can't fully agree on what counts as it. That's in part why I think the idea of "minimum viable AGI" is useful, to be able to point to this space where we're not so far along that everyone will agree it's AGI, but far enough that thinking of it as AGI is reasonable.
To put a finer point on it: AGI isn't a thing (it's a cloud of things) so debating whether "it's here" is a waste of time. What's important is discussing what's actually here (which you do) and the implications of whatever-this-is being here. Which you leave implicit.
FWIW I think your perspective is a little different since you're dealing with these systems mostly in the area they were most designed for, coding. Their competence falls off pretty steeply in other areas.
As for whether they're AGI: mu.
Isn't the definition of AGI the opposite of that? A computer program that is capable of any task that a human is capable of doing via operating a computer seems like a fairly strict definition, and certainly seems to preclude it being a "cloud of things".
You could make it stricter by applying a percentile to it explicitly[1], so that you can rigorously test it. "In all tasks we were able to define, AGI must perform better on the target metric than 50 percent of human participants". Still, either way, It's a binary thing. If you can define a computer operation task that a typical human can do but an AI system can't, then it's unambiguously narrow rather than general AI.
(I'd argue that the percentile is already there, it's just implied rather than stated outright because of the can of sociological worms it opens)
The problem is that there are many definitions of AGI in circulation. Different people use the term different ways. Because you're speaking to many people when you write online, AGI doesn't mean one thing. It doesn't even mean a strictly defined thing for most of the people in the conversation, because they've absorbed much of the meaning from context. That's how brains work.
If we all agreed on a definition like that, then we wouldn't have this problem and it would be crisp.
Except in practice you'd find that there were a few things it couldn't do yet, but those things don't seem very important, so it's very tempting to say "well it meets that definition for most purposes, so we should think of it as mostly AGI".
In a complex space, even a crisp definition will become complex and therefore vague.
This is known as the descriptivist view of language I believe. And I think it's simply correct. Words are used in complex ways that differ between people. Using them "correctly" means using them in ways that your intended audience will understand what you mean. Unfortunately it's not possible to do this perfectly. I think this is just how brains and the world work.
Different people use the term different ways.
I think, at a certain point, a phase is self-explanatory enough that you can write off a certain share of definitions as just being wrong. AGI exists as a term in contrast to Narrow AI, which means "AI that can do some things as well as a human, but not others". For either term to have any semantic significance at all, AGI can't have exceptions.
Using your example, a system that was very useful for doing three important things would be "a good narrow AI system", or just "a useful AI tool". No additional information is conveyed by calling it "AGI".
Right. When we're far away from things, treating them as points is a useful approximation. Take the question "Which way is my house?" When I am across the city, this is a useful question with a straightforward answer. When I am in the yard, or worse, inside it, I can no longer treat my house as a point.
It is precisely because we are near to AGI (I've felt "inside the house" since GPT-2) that questions that treat this construct as a point aren't very useful.
I like an analogy I heard Toby Ord use at EAG; hopefully he doesn't mind me borrowing it.
It's like being on a hike with friends up a mountain shrouded in clouds. At some point on the hike, things get a bit misty, and then vision drops to minimal. At what point did we enter the cloud? Was it when the first mists appeared? When visibility dropped below some metric? Maybe it doesn't matter, except insofar as being in a cloud requires taking actions, like tying ourselves together with rope so we don't get lost.
AGI can mean many things. I rather prefer Transformative AI (TAI). I can see a number of signifcant thresholds for that, none of which we have yet met, nor seem about to:
I suspect these four capability thresholds are fairly close together.
A major capability gap for all four of these is long range planning and task performance — when the average maximum task you can carry out before needing external assistance is still only about 12-hours work for a human, you're just not credibly even close to any of the above. Another major capacity gap is continual learning: RAG and context summarization only get you so far. These two gaps are very likely interrelated: running out of context is probably one of the main reasons for the maximum effective task length.
A concerning thing here is that continual learning for LLMs is not a scaling-fixable issue, it's an architectural/framework issue. So it's rather hard to predict if we'll have a great fix for it (or at least one significantly better then RAG and current context summarization) in 6 months, or if it'll take 6 years. So far people have spent several years working on this, and while existing solutions are a lot more capable than back when we had a 4k token context and then you were done, it's still a strong limitation — and from a "more time to solve alignment" point of view, a very helpful one: one that strongly limits the scope of the possible threat from any AI not working with a human.
I suspect a weak-but-MVP-crossing continual learning is here in the form of updating their memory notepads and skills, and working in groups. I bet if you locked LLM-quality at current levels but let people improve harnesses a bunch we'd have a borderline viable version of the "AI able to reflect and self-improvement in a crude way." (but, worse than humans and below some escape velocity necessary to RSI)
You might think of "the AI" as a group of agent-harnesses with an overseer-planning-agent and a "check if a weird loop has happened"-agent and various other object-level-guys that spin up and down, Doesn't seem that different from human brains having various subprocesses.
(But, I have some hesitation here because I have still been kind of surprised by why it took this long. I feel like my previous models would have predicted this was possible a year ago and I'm confused why it took this long, so I mistrust this sense)
The author says that although the usual and most grand conception of AGI is not here, it is meaningful and useful to go ahead and say that AGI is here. He does not explain why it is useful to say that; he just says it. He also says that the latest models meet criteria 1-3, but does not give supporting examples. Without these elaborations, this audaciously-titled post is another one of many found around the web lately that boil down to: "I have been using the latest models a heck of a lot lately, and I am blown away/scared."
Yeah sure I assume my readers are also using these models and can draw the same conclusions as I am or they will disagree. So far, people seem to mostly agree. Didn't seem overly necessary to write a 10 page paper to point out what most people can simply see for themselves.
Selection Bias. I completely disagree, [1] I just don't know what to do with that so didn't comment. "No I think you're wrong"? This doesn't seem like a really productive comment. "No I think AGI can't do xyz yet?". Maybe better but that will just open me up for another debate about this, and I've zero interest in debating it. Overall I didn't see what kind of comment was worth writing, so I didn't bother. This reaction is probably quite common, and at any rate, looking comments like this doesn't tell you otherwise.
I'm only saying something now to push back on the "reception indicates agreement" point.
I'm not sure even upvotes indicate net agreement because people will have stronger inhibitions for downvoting. Many people who agree will just upvote because they agree. I doubt everyone who disagrees will downvote without formulating a critique. Though idk, iirc mods have encouraged liberal downvoting, maybe I'm wrong and people do it freely.
For my view, I'll just refer to Steven Byrnes' formulation:
By “AGI” I mean here “a bundle of chips, algorithms, electricity, and/or teleoperated robots that can autonomously do the kinds of stuff that ambitious human adults can do—founding and running new companies, R&D, learning new skills, using arbitrary teleoperated robots after very little practice, etc.”
Yes I know, this does not exist yet! (Despite hype to the contrary.) Try asking an LLM to autonomously write a business plan, found a company, then run and grow it for years as CEO. Lol! It will crash and burn!
And I guess I'll add that I don't think LLMs are close to AGI, will lead to AGI, or anything like that, and yea I do also think claims to the contrary are net harmful for several reasons. ↩︎
Having written lots of things on Less Wrong that people don't agree with (and a few things they do agree with), my sense is that agreement/disagreement is highly correlated with up/down votes, but yes I have no formal study to verify this, just personal observation and inference.
That said, I wrote this post mostly because I'm freaking out about the state of things and this is my best attempt to crystallize the source of that freak out. I'm not actually trying to make a rigorous argument about what counts as AGI, and I don't even know if making a rigorous argument is worth it. You should probably read this post more as "Gordon is saying that he's feeling the AGI hard, and you should do with the information about his judgement as a 25 year veteran of AGI discourse what you will."
Strongly agree that it correlates. It's just not quite the same as most people agreeing with you. It prob measures something more like a lot of people strongly agreeing with you, with it mattering less how many people disagree with you. Like if on topic A, agreement/disagreement is 50/50, and on topic B it's 90/10, but for A people feel super strongly and B people are mostly indifferent, then A might get rapid upvotes whereas B would just disappear from visibility immediately.
Anyway, much more importantly:
That said, I wrote this post mostly because I'm freaking out about the state of things and this is my best attempt to crystallize the source of that freak out. I'm not actually trying to make a rigorous argument about what counts as AGI, and I don't even know if making a rigorous argument is worth it. You should probably read this post more as "Gordon is saying that he's feeling the AGI hard, and you should do with the information about his judgement as a 25 year veteran of AGI discourse what you will."
Big props for saying that! And this is actually one of the reasons why I perceive these posts as harmful, I think a lot of people are freaking out, this kind of post contributes to more people freaking out, and bc I think AGI isn't very close, to me that just seems like a big net negative. But of course that depends entirely on your beliefs; if AGI were close, maybe it would be correct to freak out. This is what always makes it difficult to know what to say about posts like this; like to me the entire pre-apocalypse vibe is a huge negative for the overall utility of lesswrong, but I think people genuinely believe it, so unclear to what extent I should/can push back.
Fwiw my ire is much more directed at people upvoting this than at you for writing it. This seems to me like a pretty clear case: even if the narrative were 100% true, I don't see how continuously broadcasting the vibe is a good idea. If I were an alignment researcher, I doubt it'd be good for my productivity. In fact I just talked to someone a few weeks ago on Discord who told me they don't check LW much anymore for mental health reasons even though they still completely believe in the narrative and are trying to work on it. (They even deleted their account, which I thought was a bad idea.)
Fair enough. My judgement is obviously different, in that I want other people to freak out (well, I don't actually want them to be anxious and fearful, but I can't control that) in that I want people to realize what I think is happening and, if they agree, take short term actions that may buy us more time to do critical safety work. For example, now is a great time in my view to coordinate on a capabilities pause (though yes I know there are many concerns with pauses and that's a separate debate).
My judgement is obviously different, in that I want other people to freak out (well, I don't actually want them to be anxious and fearful, but I can't control that) in that I want people to realize what I think is happening and, if they agree, take short term actions that may buy us more time to do critical safety work.
Already said this, but want to repeat that I think the perspective is totally valid.
But in terms of the cold consequentialist calculus, I don't know how you get to the "more alarming is a good idea" result. Maybe I'm biased because 2/2 cases I know well (myself and the friend I mentioned) low-key left the platform because the constant reminders are so crippling for mental health. I don't have a survey on how bad other people feel. But my impression is that I see a post about AI acceleration more than half the time I look at the frontpage. Valentine wrote Here's the Exit literally over three years ago! It was already so bad back then that people contemplated leaving the community over it. And it's been going on ever since.
I genuinely believe that even if your utility function has zero terms in it other than maximizing useful AI interventions (whether policy or technical safety work, or anything else), you should want fewer posts like this. Everyone got the memo that it's time to panic. I think the awareness-of-how-bad-it-is curve would have plateaued even if there were one fifth as many posts like this one, and the marginal effect of every other post is just to make people freak out more.
I appreciate your replies, Gordon, and your saying (or implying) that this post does indeed boil down to vibes ("feeling the AGI hard") and not some unassailable pronouncement. Given that, I do wonder, like Rafael, if the bombastic title "AGI is Here" is overstated and will lead many to undue anxiety. (Or due anxiety, but not actionable anxiety.)
I understand that since you believe full-strength AGI is near at hand, you believe it is meaningful and useful to overstate the present state of things a bit. So I wonder: what are people to do about this? Of course the whole society is grasping for the answer to this, and we cannot know it. Since you have been in this space for a long time (and I am not in it at all) I'd be curious to hear your thoughts in a later post.
You say your audience is other techies who are deep in this stuff, in agentic AI. I guess your message to them is something like, "Make sure you skill up so you are not left behind like all novice programmers certainly will be."
What would you advise non-techies (including former techies, like me) who are cautiously wowed by LLMs but are not especially worried about AGI to do with the alarm that you believe you rightly sound? I do not study AI safety or pay attention to the ongoing speculations about AGI, whatever that concept means. I do not frequent this site. I only made an account and commented here because a good friend who does frequent the site (and, like me, does not use agentic tools) sent me this post with an implied oh shit.
If non-specialists (my friend and I) are not the audience you want to engage, my apologies, and feel free to ignore my post. And I am sorry if my original post was not too productive. It's just that this week there was a lot of fear pervading the waters of the web. I'm trying to understand why, and I had hoped, given the title of this post, to find something more than that someone deep in this stuff is experiencing anxieties that "most people [in his audience] can simply see for themselves." In that case, I wonder why such a post is needed, other than for the audience to feel less alone: "Hey, I guess I'm not the only one with this deep worry."
Promised follow-up post is live: https://www.lesswrong.com/posts/bj6ffpD6Jzid6vFa8/what-to-do-about-agi
fyi I agree it's fairly important but not actually sure I know why you think it's important to say this. (I have guesses but seemed good to spell it out)
You mean why I think it's good to claim that AGI is here given that I believe it is?
My theory is that saying this might wake some people up to the current situation and get them to act in ways that reduce existential risk.
Oh interesting. That's not why it seemed important to me.
I thought this was roughly priced in, and it seems still seems like 4.6 & co can't actually do many types of mental cognitive tasks. And in terms of "it can qualitatively do the things that you need to invent AGI now, whereas it couldn't before", I don't actually know that that's changed (much).
I would have thought last-year's Opus 4 were able to do the range of things Opus 4.6 is able to do, because of how competent and planning-y and metacognition-y they seemed at coding. But, then when you put them in situations that were remotely outside their wheelhouse, they sputtered and got confused and flail-y. I eventually updated "okay clearly something interesting is happening here, but, it's more like they have very-narrow-domain-specialized-metacognition.
Outside those narrow domains, Opus 4 has metacognition, but, none of the actual skills it needs connect it to useful things.
I think Opus 4.6 is another step along the chain, where it has many more narrow-focused-stacks-of-skills in coding that all reinforce each other, and also it's probably at least somewhat better at metacognition overall. But, I would bet against it turning out to be good enough at non-code non-math domains.
(has anyone tried running Openclaw agent swarms with overseers and detect-loop watchers with Opus 4? I am curious if they could have handled your big one-shot tasks)
...
Nonetheless, I think Opus 4.6 + "current gen Cursor scaffold" is sufficiently good at enough different things, with enough longterm planning that I'm like:
"Okay, I feel like they have some kind of 'complete stack' here, but, due to the jagged frontier, the set of domains their stack is minimum-viable at is different from human humans." (My example wouldn't be "make a peanut butter sandwich", it'd be "navigating abstract domains with bad feedback loops.")
The thing I think is significant about this is "We have left the domain where 'AGI' is a particularly useful discriminator, and we need better ontology in order to navigate what's coming next."
I also think it's a useful time to say to everyone who'd been vaguely dismissing things for not being real, to be like "Bro, it is real. Whatever you were waiting for to think Shit's Real, it's clearly here by now." Which is closer to what you had in mind, I think. But, I still don't have particular things in mind for most people to do, if they weren't the sort of person to have already figured out it was real last year.
I expect the main things most people can do is apply pressure on their governments to take policy action. Making this happen is no small feat, and is mostly a matter of sufficient awareness so that everyone knows that everyone knows it's real and there are options to stop it until we finish more safety work. Coordination on this scale is not just a few people acting, it's getting people to organically come to the ideas and apply pressure, and doing that requires sufficient credible signals that it's real and saying it's real and having everyone believe it.
Nod. But, I think trying to warn "AGI is literally here" feels kinda like the wrong move to me anyways.
The move I would make is "AI keeps improving in ways that are on the path to generalization and strategic awareness. Here is where it was 3 years ago. Here's where it was last year. Here's where it was last month". I think that's consistently alarming whether or not people agree on what counts as AGI. (and, every few months there are more alarming things to point at).
I think it's currently at the point where people paying attention should notice "this sure doesn't seem to obviously NOT be AGI", but, I think it's still at a point where crying "AGI" might leave people underwhelmed and then get Boy Cried Wolf syndrome. (and meanwhile just focusing on it's object level capabilities seems more robustly good)
I like to say that AGI isn't "here", but it is "latent". That is, we don't have any model that could fairly be called AGi, but that AGI can be built without any fundamentally new technologies. In a worst-case scenario for AI progress, which in my opinion is essentially "modest improvements in intelligence from here, accompanied with a modest reduction in cost", I think we could achieve a "good enough" facsimile of AGI by constructing a sufficiently advanced harness around current models.
This is my take as well. It is a general intelligence IMO, even if it doesn't yet hit everyone's goalpost for "AGI". My prediction is that it's one unhobbling away from that, and that will come within the next two years, though diffusion of is likely to be considerably longer.
The raw intelligence is sufficient at this point (and will continue to improve). The long-term coherence and continual learning aspects are where it's hobbled. A sufficiently sophisticated scaffold can likely approximate those aspects well enough to be scary, and it's orders of magnitude faster to build and tune such things these days compared to a couple of years ago.
No nitpicks from Tampa. Cf. WHEN MUST I START KICKING AND SCREAMING AT YOU THAT IT IS FUCKING HAPPENING
I do tend to agree, and I have felt this way for a while. I do also think it is important to find the right ways to frame things, and framing existing tech as AGI is I think preferable. There is too much baggage associated with the word AI from its 80 years of history. And when we use the term AGI, we can kind of wipe the slate clean and be more clear about the differential between "this AI" and all the AI that had come before.
Regardless of the fact it's still limited in various ways; I do think the technical criticisms of calling it AGI (e.g. Hendrycks et al from last year) are also helpful and critical. But, on the other hand, the non-technical criticisms of calling it AGI seem like they are (1) mainly coming from the POV of "it's all a hoax," which is extremely counter productive and unhelpful. Or, (2) mainly coming from the perspective of being really critical about how AI is being sold and used; which is indeed a profound criticism, but you can't make the problem go away by calling it something it's not. It's like trying to use an ad hominem attack against the concept.
I think perhaps a more useful, less subjective definition of AGI is something along the line of:
Can perform most jobs that are:
Because if it really was generally intelligent why can't it do that job?
I don't understand other people's workflows. These models still can not follow clearly written and numbered instructions, including when split out by me manually. And the worst part is they declare victory after every single prompt regardless of what happens.
Do you have an example? I admit that I do end up bridging gaps a lot, and they can only work totally autonomously on narrowly scoped tasks (and even then they often make mistakes on the first pass that need to be corrected in code review). But, again, this is not much different than assigning a task to a junior engineer.
Extracting sections from books and reformatting them is what I happen to be trying to do right now and it sucks. I think you might be confused that an LLM plus a LW user piloting it patiently is in fact an agi.
I'm interested in what setup exactly you're using (I think using Claude 4.6 in Cursor is noticeably smarter than me calling it from other contexts. I think the Cursor harness and (probably, haven't checked) Claude Code are particularly good).
(to be clear, the thing that feels like "AGI" is the LLM + harness, not the LLM by itself)
Hmm, interesting, I'd expect it to work better for you. I kind of wonder if there's something about your prompting that's not working, or if your tasks are too far outside looking like software engineering or other tasks it's been trained on.
For context, I have several hundred lines of instructions I hand it, plus a prompt of a few hundred words for what I want, and now it can one-shot many software tasks and 80% a great many more. The only place that still really requires lots of intense human-AI iteration are tasks that exceed what it can reason about within the context window, like large refactors, that are often hard to break down into smaller chunks.
But the crux of my feelings come from how good it's getting at looping. It does often take a bit of work to get the right initial prompt, but then it can iterate for hours or even days if you have the tokens on tasks that have clear, measurable objectives, which is why I am sad to say we now have functional paperclip maximizers, even if they are, for now, easily defeated.
I'm expecting some things to work better once I have a separate computer running openclaw or something. It's all so annoyingly fiddly so I'd figure I'd wait a couple months for people to improve that.
I doubt you need that at all, Claude Code CLI or Codex CLI and you're most of the way there. Based on your other comment saying 3.1 I'm wondering whether or not you're using Claude/ChatGPT rather than Gemini? Gemini 3.0 at least was notably behind both of them, and while Gemini 3.1 has improved it still seems to struggle in comparison.
Extracting sections from books in my experience works pretty well- the main way they'll ever choke on that is if they decide to read a 200page pdf to context because they lack knowledge of their own limits at digesting that. Tell them to convert it to text if they don't do that themselves?
I'm somewhat hesitant to write this post because I worry its central claim will be misconstrued, but I think it's important to say now, so I'm writing it anyway.
Claude Opus 4.6 was released on February 5th. GPT-5.3 came out the same day. We've had a little over two weeks to use these models, and in the past day or so, I and others have started to realize, AGI is here.
Now, I don't want to overstate what I mean by this, so let me be clear on the criteria I'm using. If I were sitting back in 2018, before the release of GPT-2, and you asked me what AGI would be capable of, I'd probably have said something like this:
It's hard to deny that Opus 4.6 and GPT-5.3 are able to do 1-3. The only one up for real debate is 4, because there are things that I can do, like make a peanut butter sandwich, that Claude and ChatGPT cannot. But given the capabilities these models are demonstrating, this feels more like a limitation of their harnesses than the models themselves. Given a few weeks and some advances in robotics, I'm confident the current models could be used to make sandwiches, though perhaps at the cost of millions of tokens.
To be clear, these models aren't AGI the way we expected it. When people talk about AGI, they often use the word to mean the whole thing, with continuous and transfer learning completely solved, full-spectrum multimodal perception, and embodiment in the form of robot interfaces. Instead, what we have is more like minimum viable AGI, meaning it's an AI just general enough that we should meaningfully begin applying the AGI label.
It's possible that, in retrospect, we should have made this declaration earlier. Maybe it should have come when Opus 4 or GPT-5 were released, or maybe when Claude Code came out. But those models were worse on all four of my criteria in ways that made it harder to say they were across the AGI threshold, and those who did say it were easier to dismiss.
Now it's harder to deny the claims. I work with these models every day to write code, and the amount of work I can delegate to them is incredible, surpassing what I would expect of a junior engineer. They're even capable enough to build a just-barely-functioning paperclip maximizer, which is a terrifying sentence to write. In the coming weeks and months, these models are only going to get more powerful, and as they do, things are going to get weirder.
You may think I'm early in making a declaration of AGI, and perhaps I am. But I hope you can agree that, if it's not there yet, AGI is coming soon, and I fear that we are nowhere near ready for it.
Follow up post: What to Do About AGI