I originally started writing this as a message to a friend, to offer my personal timeline takes. It ended up getting kind of long, so I decided to pivot toward making this into a post. 

These are my personal impressions gathered while doing a bachelors and a masters degree in artificial intelligence, as well as working for about a year and a half in the alignment space.

 

AI (and AI alignment) has been the center of my attention for a little over 8 years now. For most of that time, if you asked me about timelines I'd gesture at an FHI survey that suggested a median timeline of 2045-2050, and say "good chance it happens in my lifetime." When I thought about my future in AI safety, I imagined that I'd do a PhD, become a serious academic, and by the time we were getting close to general intelligence I would already have a long tenure of working in AI (and be well placed to help).

I also imagined that building AI would involve developing a real "science of intelligence," and I saw the work that people at my university (University of Groningen) were doing as pursuing this great project. People there were working on a wide range of machine learning methods (of which neural networks were just one idea), logic, knowledge systems, theory of mind, psychology, robotics, linguistics, social choice, argumentation theory, etc. I heard very often that "neural networks are not magic," and was encouraged to embrace an interdisciplinary approach to understanding how intelligence worked (which I did).

At the time, there was one big event that caused a lot of controversy: the success of AlphaGo (2016). To a lot of people, including myself, this seemed like "artificial intuition." People were not very impressed with the success of DeepBlue in chess, because this was "just brute force" and this would obviously not scale. Real intelligence was about doing more than brute force. AlphaGo was clearly very different, though everyone disagreed on what the implications were. Many of my professors bet really hard against deep learning continuing to succeed, but over and over again they were proven wrong. In particular I remember OpenAI Five (2017/2018) as being an extremely big deal in my circles, and people were starting to look at OpenAI as potentially changing everything.

There was this other idea that I embraced, which was something adjacent to Moravec's paradox: AI would be good at the things humans are bad at, and vice versa. It would first learn to do a range of specialized tasks (which would be individually very impressive), gradually move toward more human-like systems, and the very last thing it would learn to do was master human language. This particular idea about language has been around since the Turing test: Mastering language would require general, human-level intelligence. If you had told me there would be powerful language models in less than a decade, I would have been quite skeptical.

When GPT happened, this dramatically changed my future plans. GPT-2 and especially GPT-3 were both extremely unnerving to me (though mostly exciting to all my peers). This was, in my view

  1.  "mastering language" which was not supposed to happen until we were very close to human level
  2. demonstrating general abilities. I can't overstate how big of a deal this was. GPT-2 could correctly use newly invented words, do some basic math, and a wide range of unusual things that we now call in-context learning. There was nothing even remotely close to this anywhere else in AI, and people around me struggled to understand how this was even possible.
  3. a result of scaling. When GPT-3 came out, this was especially scary, because they hadn't really done anything to improve upon the design of GPT-2, they just made it bigger. Instead of there being strongly diminishing returns, as was always the case for other AI algorithms, it was clearly a massive improvement made entirely by just scaling it up. This was a slap in the face to the project of building a “science of intelligence,” and strong evidence that building AGI would be a lot easier than anyone around me had originally imagined. 

I will say that, although this was enough evidence for me to give up my plans of getting a PhD, and led me to participate in the 2022 cohort of the AI Safety camp and eventually become more directly involved in the alignment community, I still held out a lot of hope that this was all a dramatic over-correction. GPT was pretty weird, and I felt a fair amount of sympathy for a kind of "stochastic parrot" story. My professors were quick to bring up ELIZA, and sure, maybe imitating human language wasn't as big of a deal as I originally thought. People once thought that mastering chess would require general intelligence too.

During the AI safety camp I was mentored by janus, and that was my first interaction with people who had real experience interacting with language models. There I realized that GPT's weirdness made it especially hard to properly evaluate its actual capabilities, and in particular, would lead people to systematically underestimate how capable it was. This post by Nostalgebraist does a good job of articulating this. The basic idea is that GPT wasn’t trained to do the things you are evaluating it on, and any proficiency at any of your intelligence metrics is just a coincidence. The metrics are just (weakly) correlated with accurately predicting text typical to the internet. 

I went on to work for janus at Conjecture, and every day they would show me some GPT traces that blew my mind, and forced me to reevaluate what was possible. I didn’t understand what was really going on with GPT. I still continue to find GPT behavior extremely confusing and find it hard to draw strong conclusions from it. What felt unambiguously clear though, from everything they showed me (and what I basically already suspected since the GPT-2 paper), was that GPT was somehow doing general cognition. This part I find the hardest to justify to people, and I think there are others who are better at this. The core heuristic that compels me, however, is remembering just how absolutely inconceivable all of this was even 5 years ago. 

I don’t know what my timelines are, and I’m annoyed at people (including janus sometimes) who put really specific numbers on their predictions. I remember showing the FHI survey to my thesis supervisor during my bachelors, and him responding in typical Dutch directness: “These numbers came from their ass.” That said, I’m even more baffled by anyone who is very confident that we won’t have some kind of takeoff this decade. Everything is moving so fast, and I don’t see any compelling evidence that things will significantly slow down.

New to LessWrong?

New Comment
29 comments, sorted by Click to highlight new comments since: Today at 3:46 PM

this was all frustratingly predictable in advance for anyone who had simply looked at deep learning for real and taken a moment to understand what they were looking at. the only worlds where we don't have takeoff in the next decade that I put significant weight on are ones where strong takeoff never occurs because this is what intelligence must look like, and therefore weak takeoff - more of the same but bigger - is all there is to have. Alignment optimists will tell you this is now guaranteed; I am not yet convinced of that, and anyway it's little comfort if true. It should be embarrassing that so many people were still betting hard against deep learning by the time attention was showing promise in 2016 - the idea that you couldn't have general cognition from attention over modules and memory was already a clearly bad prediction. people who are continuing to bet against deep learning continuing to work in the same core ways it has been have not learned from recent history. We'll see how that wall deep learning is hitting looks after it's done hitting it.

Isn't it just plausible that current deep learning methods are universal but currently inefficient and thus it will take a huge amount of compute and/or algorithmic progress?

This can easily get you 10+ year timelines.

What's the current best case for that hypothesis? I'm mildly skeptical of claims about the data-inefficiency of current methods compared to the human brain, due to it not being an apples to apples comparison + due to things like EfficientZero which seem pretty darn data-efficient.

As far as I understand, the gears to ascension means that either we have fast takeoff somewhen in the next ten years or we don't have it at all. We can get 10+ years timelines, but if so, it is going to be slow takeoff.

Ok, I guess I was unsure what "strong/weak" means here.

strong takeoff: general self modification allows AIs to be many, many orders of magnitude more intelligent than humans within a few development cycles, anything less than 5 years to being able to outmaneuver any human at anything.

weak takeoff: AIs can more or less match human capability, but to exceed it looks similar to humans exceeding other humans' capability. they eventually are drastically stronger than humans were, but there's never a moment where they can have a sudden insight that gives them this strength; it's all incremental progress all the way down.

strong takeoff is the thing yudkowsky still seems to be expecting: a from-our-perspective-godlike-machine that is so fast and accurate that we have absolutely no hope of even comprehending it, and which is so far beyond the capabilities of any intelligent entity now that they all look like a handwriting recognition model in comparison. it's not obvious to me that strong takeoff is possible. I'd give it 80% probability that it is right now, but there's significant weight on logarithmically diminishing returns such that the things that are strong than us never get so much stronger that we have no hope of understanding what they're doing. eg, if we're 10% quality jpegs of the universe and the new model is a 70% quality jpeg, contrast to the superai model where we're imagining a 99.999% quality jpeg or something.

but I don't see any model where AI isn't a training run away from stronger than human in every capability by like 2030 or so, and I think it's ridiculous that anyone thinks it's highly plausible. we're already so close, how can you think there's that much left to do! seems to me that the limits on ai capability are already pretty much just training data.

FWIW, people talking about "slow" or "continuous" takeoff don't typically expect that long between "human-ish level AI" and "god" if things go as fast as possible (like maybe 1 to 3 years).

See also What a compute-centric framework says about takeoff speeds

I've been saying for a few years that "slow takeoff" should be renamed "fast takeoff" and "fast takeoff" should be renamed "discontinuous takeoff".

mmmmore like "weak" takeoff = foom/singularity ~doesn't happen even slowly, "strong" takeoff = foom singularity ~happens, even if more slowly than expected.

weak takeoff is proposing "the curves saturate on ~data quality way sooner than you expected, and scaling further provides ongoing but diminishing returns".

something like, does intelligence have a "going critical"-type thing at all, vs whether increasingly ~large AIs are just trying harder and harder to squeeze a little more optimality out of the same amount of data.

my thinking singularity probably does happen is because of things in the genre of, eg, phi-2. but it's possible that that's just us not being all the way up a sigmoid that is going to saturate, and I generally have a high prior on sigmoid saturation sorts of dynamics in the growth of individual technologies. updating off of ai progress it seems like that sigmoid could be enormous! but I still have significant probability on it isn't.

I expect humans to be unequivocally beaten at ~everything in the next 10 years (and that's giving a fairly wide margin! I'd be surprised if it takes that long) - the question I was commenting on is what happens after that.

No singularity seems pretty unlikely to me (e.g. 10%) and also I can easily imagine AI talking a while (e.g. 20 years) while still having a singularity.

Separately, no singularity plausibly implies no hinge of history and thus maybe implies that current work isn't that important from a longtermist perspective

replying to edited version,

no hinge of history

Well, humans are still at risk of being wiped out by a war with a successor species and/or a war with each other aided by a successor species, regardless of which of these models is true. Not dying as an individual or species is sort of a continuous always-a-hinge-of-history sort of thing.

Separately, no singularity plausibly implies we lose most value in the universe from the longtermist perspective.

I don't really see why that would be the case. We can still go out and settle space, end disease, etc. it just means that starkly superintelligent machines turn out to not be alien minds by nature of their superintelligence.

(Sorry, I edited my comment because it was originally very unclear/misleading/wrong, does the edited version make more sense?)

[-][anonymous]3mo31

it's not obvious to me that strong takeoff is possible. I'd give it 80% probability that it is right now, but there's significant weight on logarithmically diminishing returns such that the things that are strong than us

Other than compute requirements, have you considered what kinda of cognitive tasks you would assign a model to complete that would lead to developing this kind of superintelligence?

Remember you started with random numbers. In a sense, the annealing and SGD is saying in words "find the laziest function that will solve this regression robustly". For LLMs the regression is between a calculated number of input tokens and the next token continuation.

The input set is so large that the "laziest" algorithm that has the least error seems to mimic the cognitive process humans use to generate words, which then guesses the most tokens.

And then RL etc after that.

So when I think of the kinds of things a superintelligence is supposed to be able to do, I ask myself how you would build a valid test where the model will be able to do this in the real world.

This is harder than it sounds because in many cases, if something is "correct" or not depends on information current sims can't model well. For example particle sims don't model armor perfectly, FSM sims don't model the wear of mechanical parts correctly, and we have only poor models for how a human would respond to a given set of words said a specific way.

So like the flaw here is say you come up with tasks humans can't quite do on their own, but the sim can check if it were done right. "Design an airliner" for instance. So the sim models every bolt and the cockpit avionics and so on. All of it.

And an ASI model is trained and it can do this task. Humans cannot beat this "game" because there are millions of discrete parts.

But any aircraft the ASI designs have horrific failure modes and crash eventually. Because the sim is just a little off. And the cockpit avionics HMI is unusable because the model of what humans can perceive is slightly off.

So you collect more data and make the model better and so on, but see it's functionally "ceilinged" at just a bit better than humans because the model becomes just superintelligent enough to max out the airline design task and no more, or just smart enough to max out the suite of similar tasks.

It's also not going to be able to ever 1 shot a real airplane, just get increasingly close.

yeah this sounds like a reasonable description of the importance of extremely high quality data. that training data limit I ended on is not trivial by any means.

[-][anonymous]3mo20

80 percent confidence seems unsupported by evidence. With the evidence that human data is very poor - for clear and convincing evidence look at all the meta analysis of prior studies, or the constant "well actually" rebuttals to facts people thought they knew. (And then a rebuttal rebuttal and in the end nobody knows anything) A world where analyzing all the data humans know on a subject leaves most people less confident about anything than before they started is not one where we have the data to train a superintelligence.

Such a machine will be as confused as we are even if it has the memory to simultaneously assume every single assumption is both true and not true, and keep track of the combinatorial explosion of possibilities.

To describe the problem succinctly: if you have a problem that only a superintelligence can solve in front of you, and your beliefs about all the variables form a tree with hundreds of millions of possibilities (medical problems will be this way), you may have the cognitive capacity of a superintelligence but in actual effectiveness your actions will be barely better than humans. As in functionally not an ASI.

Getting the data is straightforward. You just need billions of robots. You replicate every study and experiment humans ever did with robots this time, you replicate human body failures with "reference bodies" that are consistent in behavior and artificial. All data analysis is done from raw data, all conclusions always take into account all prior experiments data, no p-hacking.

We don't have the robots yet, though apparently Amazon robotics is on an exponential trajectory, having added 750k in the last 2 years, which is more than all prior years combined.

Assuming the trajectory continues, it will be 22 years until 1 billion robots. Takeoff but not foom.

there's significant weight on logarithmically diminishing returns such that the things that are strong than us never get so much stronger that we have no hope of understanding what they're doing

If autonomous research level AGIs are still 2 OOMs faster than humans, that leads to massive scaling of hardware within years even if they are not smarter, at which point it's minds the size of cities. So the probable path to weak takeoff is a slow AGI that doesn't get faster on hardware of the near future, and being slow it won't soon help scale hardware.

Hey now, I spent several months doing loads of research and writing copious notes to produce my personal highly refined ass-numbers! They may smell, but at least I understand their provenance! Lol. Anyway, my story has a lot in common with yours. GPT-2 unnerved me, GPT-3 scared me by suggesting that scaling might take us all the way to a takeoff of some sort.

I breathed a big sigh of relief when GPT-4 came out and it was only in the middle-ish of my predicted capability range rather than near the top. I mean, it's still scary, but we probably have at least another year or two before things get really nuts. Maybe even as many as five years! What a blessing that would be!

Here's the simplest reason to take short timelines seriously:

We don't know how easy it might be to create AGI.

It's often said we don't know how hard, and that prject itmes are routinely understimated. But it's equally true that we don't know how easy it might turn out to be, and what synergies we'll find across techniques and tools that will speed progress.

More specific reasoning points to short timelines for me, and at the very least to the strong possibility.

We now have a system that performs like a 140 IQ human (give or take a lot) for most cognitive tasks framed in language. There are notable gaps where these systems perform worse than humans. We have other systems that can turn sensory input, both human and otherwise, into language.

How could anyone be sure it won't be easy fill those gaps? There are obvious measures like episodic memory and executive functioning scaffolding, and putting those together with external tools including (but far from limited to) sensory networks.

I've been building neural network models of brain function since 1999. It looks to me like we need zero breakthroughs to reproduce (functionally, in a loose analog) human brain function in networks. The remainder is just schlep- scaling, scaffolding, and combinations of techniques. I could easily be wrong. But it seems like we should at least be taking the possibility very seriously.

[-][anonymous]3mo40

What do you mean by "takeoff"? I assume you mean the beginning of the AI Singularity.

I think the biggest opinion difference now isn't either there will be AGI or not, manifold and metaculus keep ticking down the estimate.

It's primarily

(1) how far does it actually scale from there. What kind of things is greater intelligence actually capable of doing? How much compute will greater levels of intelligence take to host?

(2) how rapidly can the actual physical world be altered. This means robots, which have also been accelerated recently, and robots able to build each other, and doubling times.

There are people that predict things like 1 week doubling times but I feel they are speculating outside the range of possibilities because there are unavoidable latencies to manipulate the physical world.

Do you have opinions on either?

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

What futures where we don't get AGI within the next 10 years seem plausible to you?

Two possibilities have most of the "no agi in 10 years" probability mass for me:

  • The next gen of AI really starts to scare people, regulation takes off, and AI goes the way of nuclear reactors
  • Transformer style AI goes the way of self driving cars and turns out to be really hard to get from 99% reliable to the necessary 99.9999% that you need for actual productive work

My take on self-driving taking forever is driving is near AGI complete. Humans drive roughly a million miles between fatal accidents; it would not be particularly surprising if in these million miles (where you are interacting with intelligent agents) you inevitably encounter near AGI-complete problems. Indeed, as the surviving self-driving companies are all moving to end-to-end approaches, self-driving research is begining to resemble AGI research more and more.

Why would the latter prevent agi? It would just be both high skill and unreliable, yeah?

[-][anonymous]3mo42

Correct. The easiest way to avoid slamming into the sdc bottlenecks would be to carefully deploy AGI to uses where the 1 percent failures won't cause unacceptable damages.

Any kind of human free environment is like that. Robotic cleaning, shelving, hauling, loading/unloading, manufacturing, mining, farming. Each case is where you close the store and lock the doors, have a separate section of a warehouse for robots, robotic areas of a factory with safety barriers, or robot only mines.

This to me looks like you could automate a significant chunk of the world economy, somewhere between 25-50 percent of it, just improving and scaling and integrating currently demonstrated systems.

You could also use AGI for tutoring, assisting with all the things it already does, as a better voice assistant, for media creation including visualization videos, and so on.

So when it hits the failure cases, when a robotic miner triggers a tunnel collapse, when a robotic cleaner breaks a toilet, when a shelver shoves over piles of goods, when a machine generated video has some porn - all these cases are ones where so long as the cost to fix the damage still makes it net cheaper than humans it's worth using the AGI.

Over time as the error rate slowly drops you could deploy to more and more uses, start letting humans into the areas with the robots, etc.

This is very different from sdcs where there is this requirement for near perfection before anyone can deploy the cars or make any money.

By "reliable" I mean it in the same way as we think of it for self-driving cars. A self-driving car that is great 99% of the time and fatally crashes 1% of the time isn't really "high skill and unreliable" - part of having "skill" in driving is being reliable.

In the same way, I'm not sure I would want to employ an AI software engineer that 99% of the time was great, but 1% of the time had totally weird inexplicable failure modes that you'd never see with a human. It would just be stressful to supervise, to limit its potential harmful impact to the company, etc. So it seems to me that AI's won't be given control of lots of things, and therefore won't be transformative, until that reliability threshold is met.

So what if you don't want to employ it though? The question is when can it employ itself. It doesn't need to pass our reliability standards for that.

[-][anonymous]3mo22

That is true only in the sense that it would pass the reliability standards we should have, not what we do have.

Let me explain : suppose it's a robot that assembles the gear assemblies used in other robots. If the robot screws up badly and trashes itself and surrounding equipment 1 percent of the time, it will destroy more than it's own "cost" (cost not being dollars, but in labor hours by other robots) than it contributes. This robot (software + hardware) package is too unreliable for any use.

Explaining the first paragraph: suppose the robot is profitable to run, but screws up in very dramatic ways. Then it's reliable enough that we should be using it. But upper management in an old company might fail to adopt the tech.