Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this.

1. AGI is happening soon. Significant probability of it happening in less than 5 years.

Five years ago, there were many obstacles on what we considered to be the path to AGI.

But in the last few years, we’ve gotten:

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

2. We haven’t solved AI Safety, and we don’t have much time left.

We are very close to AGI. But how good are we at safety right now? Well.

No one knows how to get LLMs to be truthful. LLMs make things up, constantly. It is really hard to get them not to do this, and we don’t know how to do this at scale.

Optimizers quite often break their setup in unexpected ways. There have been quite a few examples of this. But in brief, the lessons we have learned are:

  • Optimizers can yield unexpected results
  • Those results can be very weird (like breaking the simulation environment)
  • Yet very few extrapolate from this and find these as worrying signs

No one understands how large models make their decisions. Interpretability is extremely nascent, and mostly empirical. In practice, we are still completely in the dark about nearly all decisions taken by large models.

RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.

No one knows how to predict AI capabilities. No one predicted the many capabilities of GPT3. We only discovered them after the fact, while playing with the models. In some ways, we keep discovering capabilities now thanks to better interfaces and more optimization pressure by users, more than two years in. We’re seeing the same phenomenon happen with ChatGPT and the model behind Bing Chat.

We are uncertain about the true extent of the capabilities of the models we’re training, and we’ll be even more clueless about upcoming larger, more complex, more opaque models coming out of training. This has been true for a couple of years by now.

3. Racing towards AGI: Worst game of chicken ever.

The Race for powerful AGIs has already started. There already are general AIs. They just are not powerful enough yet to count as True AGIs.


Regardless of why people are doing it, they are racing for AGI. Everyone has their theses, their own beliefs about AGIs and their motivations. For instance, consider:

AdeptAI is working on giving AIs access to everything. In their introduction post, one can read “True general intelligence requires models that can not only read and write, but act in a way that is helpful to users. That’s why we’re starting Adept: we’re training a neural network to use every software tool and API in the world”, and furthermore, that they “believe this is actually the most practical and safest path to general intelligence” (emphasis ours).

DeepMind has done a lot of work on RLagents and multi-modalities. It is literally in their mission statement to “solve intelligence, developing more general and capable problem-solving systems, known as AGI”.

OpenAI has a mission statement more focused on safety: “We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome”. Unfortunately, they have also been a major kickstarter of the race with GPT3 and then ChatGPT.

(Since we started writing this post, Microsoft deployed what could be OpenAI’s GPT4 on Bing, plugged directly into the internet.)

Slowing Down the Race

There has been literally no regulation whatsoever to slow down AGI development. As far as we know, the efforts of key actors don’t go in this direction.

We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it.

Here are a few arguments that we have personally encountered, multiple times, for why slowing down AGI development is actually bad:

  • “AGI safety is not a big problem, we should improve technology as fast as possible for the people”
  • “Once we have stronger AIs, we can use them to work on safety. So it is better to race for stronger AIs and do safety later.”
  • “It is better for us to deploy AGI first than [authoritarian country], which would be bad.”
  • “It is better for us to have AGI first than [other organization], that is less safety minded than us.”
  • “We can’t predict the future. Possibly, it is better to not slow down AGI development, so that at some point there is naturally a big accident, and then the public and policymakers will understand that AGI safety is a big deal.”
  • “It is better to have AGI ASAP, so that we can study it longer for safety purposes, before others get it.”
  • “It is better to have AGI ASAP, so that at least it has access to fewer compute for RSI / world-takeover than in the world where it comes 10 years later.”
  • “Policymakers are clueless about this technology, so it’s impossible to slow down, they will just fail in their attempts to intervene. Engineers should remain the only ones deciding where the technology goes”

Remember that arguments are soldiers: there is a whole lot more interest in pushing for the “Racing is good” thesis than for slowing down AGI development.

Question people

We could say more. But:

  • We are not high status, “core” members of the community.
  • We work at Conjecture, so what we write should be read as biased.
  • There are expectations of privacy when people talk to us. Not complete secrecy about everything. But still, they expect that we would not directly attribute quotes to them for instance, and we will not do so without each individual’s consent.
  • We expect we could say more things that would not violate expectations of privacy (public things even!). But we expect niceness norms (that we find often detrimental and naive) and legalities (because we work at what can be seen as a competitor) would heavily punish us.

So our message is: things are worse than what is described in the post!
Don’t trust blindly, don’t assume: ask questions and reward openness.


  • Question people, report their answers in your whisper networks, in your Twitter sphere or whichever other places you communicate on.
    • An example of “questioning” is asking all of the following questions:
      • Do you think we should race toward AGI? If so, why? If not, do you think we should slow down AGI? What does your organization think? What is it doing to push for capabilities and race for AGI compared to slowing down capabilities?
      • What is your alignment plan? What is your organization’s alignment plan? If you don’t know if you have one, did you ask your manager/boss/CEO what their alignment plan is?
  • Don’t substitute social fluff for information: someone being nice, friendly, or being liked by people, does not mean they have good plans, or any plans at all. The reverse also holds!
  • Gossiping and questioning people about their positions on AGI are prosocial activities!
  • Silence benefits people who lie or mislead in private, telling others what they want to hear.
  • Open Communication Norms benefit people who are consistent (not necessarily correct, or even honest, but at least consistent).

4. Conclusion

Let’s summarize our point of view:

  • AGI by default very soon: brace for impact
  • No safety solutions in sight: we have no airbag
  • Race ongoing: people are actually accelerating towards the wall

Should we just give up and die?

Nope! And not just for dignity points: there is a lot we can actually do. We are currently working on it quite directly at Conjecture.

We’re not hopeful that full alignment can be solved anytime soon, but we think that narrower sub-problems with tighter feedback loops, such as ensuring the boundedness of AI systems, are promising directions to pursue.

If you are interested in working together on this (not necessarily by becoming an employee or funding us), send an email with your bio and skills, or just a private message here.

We personally also recommend engaging with the writings of Eliezer Yudkowsky, Paul Christiano, Nate Soares, and John Wentworth. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.

5. Disclaimer

We acknowledge that the points above don’t go deeply into our models of why these situations are the case. Regardless, we wanted our point of view to at least be written in public.

For many readers, these problems will be obvious and require no further explanation. For others, these claims will be controversial: we’ll address some of these cruxes in detail in the future if there’s interest. 

Some of these potential cruxes include:

  • Adversarial examples are not only extreme cases, but rather they are representative of what you should expect conditioned on sufficient optimization.
  • Monitoring of increasingly advanced systems does not trivially work, since much of the cognition of advanced systems, and many of their dangerous properties, will be externalized the more they interact with the world.
  • Even perfect interpretability will not solve the problem alone: not everything is in the feed forward layer, and the more models interact with the world the truer this becomes.
  • Even with more data, RLHF and fine-tuning can’t solve alignment. These techniques don’t address deception and inner alignment, and what is natural in the RLHF ontology is not natural for humans and vice-versa.
  1. ^

    Edited to include DayDreamer, VideoDex and RT-1, h/t Alexander Kruel for these additional, better examples.

New Comment
136 comments, sorted by Click to highlight new comments since: Today at 3:16 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.

These three links are:

  • The first is Mysteries of mode collapse, which claims that RLHF (as well as OpenAI's supervised fine-tuning on highly-rated responses) decreases entropy. This doesn't seem particularly related to any of the claims in this paragraph, and I haven't seen it explained why this is a bad thing.
  • The second is Discovering language model behaviors with model-written evaluations and shows that Anthropic's models trained with RLHF have systematically different personalities than the pre-trained model.  I'm not exactly sure what claims you are citing, but I think it probably involves some big leaps to interpret this as either directly harmful or connected with traditional stories about risk.
  • The third is Compendium of problems with RLHF, which
... (read more)
A new paper, built upon the compendium of problems with RLHF, tries to make an exhaustive list of all the issues identified so far: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Agree that the cited links don't represent a strong criticism of RLHF but I think there's an interesting implied criticism, between the mode-collapse post and janus' other writings on cyborgism etc that I haven't seen spelled out, though it may well be somewhere. I see janus as saying that if you know how to properly use the raw models, then you can actually get much more useful work out of the raw models than the RLHF'd ones. If true, we're paying a significant alignment tax with RLHF that will only become clear with the improvement and take-up of wrappers around base models in the vein of Loom. I guess the test (best done without too much fanfare) would be to get a few people well acquainted with Loom or whichever wrapper tool and identify a few complex tasks and see whether the base model or the RLHF model performs better. Even if true though, I don't think it's really a mark against RLHF since it's still likely that RLHF makes outputs safer for the vast majority of users, just that if we think we're in an ideas arms-race with people trying to advance capabilities, we can't expect everyone to be using RLHF'd models.
1[comment deleted]7mo

If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

Yes? Not sure what to say beyond that.

Without saying anything about the obstacles themselves, I'll make a more meta-level observation: the field of ML has a very specific "taste" for research, such that certain kinds of problems and methods have really high or really low memetic fitness, which tends to make the tails of "impressiveness and volume of research papers, for ex. seen on Twitter" and "absolute progress on bottleneck problems" come apart.

+1. While I will also respect the request to not state them in the comments, I would bet that you could sample 10 ICML/NeurIPS/ICLR/AISTATS authors and learn about >10 well-defined, not entirely overlapping obstacles of this sort.

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

I don't want people to skim this post and get the impression that this is a common view in ML.

The problem with asking individual authors is that most researchers in ML don't have a wide enough perspective to realize how close we are. Over the past decade of ML, it seems that people in the trenches of ML almost always think their research is going slower than it is because only a few researchers have broad enough gears models to plan the whole thing in their heads. If you aren't trying to run the search for the foom-grade model in your head at all times, you won't see it coming.

That said, they'd all be right about what bottlenecks there are. Just not how fast we're gonna solve them.

The fact that Google essentially panicked and speed-overhauled internally when ChatGPT dropped is a good example of this. Google has very competent engineers, and a very high interest in predicting competition, and they were working on the same problem, and they clearly did not see this coming, despite it being the biggest threat to their monopoly in a long time. Similarly, I hung out with some computer scientists working on natural language progressing two days ago. And they had been utterly blindsided by it, and were hateful of it, because they basically felt that a lot of stuff they had been banging their heads against and considered unsolvable in the near future had simply, overnight, been solved. They were expressing concern that their department, which until just now had been considered a decent, cutting-edge approach, might be defunded and closed down. I am not in computer science, I can only observe this from the outside. But I am very much seeing that statements made confidently about limitations by supposed experts have repeatedly become worthless within years, and that people are blindsided by the accelerations and achievements of people who work in closely related fields. Also that explanations of how novel systems work by people in related fields often clearly represent how these novel systems worked a year or two ago, and are no longer accurate in ways that may first seem subtle, but make a huge difference.
3Gerald Monroe1y
I don't want people to skim this post and get the impression that this is a common view in ML. So you're saying that in ML, there is a view that there are obstacles that a well funded lab can't overcome in 6 months.  
5the gears to ascension1y
For what it's worth, I do think that's true. There are some obstacles that would be incredibly difficult to overcome in 6 months, for anyone. But they are few, and dwindling.
4Jacy Reese Anthis1y
2Daniel Kokotajlo1y
Such a survey was done recently, IIRC. I don't remember the title or authors but I remember reading through it to see what barriers people cited, and being unimpressed. :( I wish I could find it again.
4Lukas Finnveden1y
This one?
4Lukas Finnveden1y
LW discussion
2Daniel Kokotajlo1y
I think so, thanks!
I think your meta level observation seems right. Also, I would add that bottleneck problems in either capabilities or alignment are often bottlenecked on resources like serial time. (My timelines, even taking all this into account, are only like 10 years---I don't think these obstacles are so insurmountable that they buy decades.)
8Gabriel Alfour1y
(I strongly upvoted the comment to signal boost it, and possibly let people who agree easily express their agreement to it directly if they don't have any specific meta-level observation to share)
4Aprillion (Peter Hozák)1y
Staying in meta-level, if AGI weren't going to be created "by the ML field", would you still believe problems on your list cannot possibly be solved within 6-ish months if companies would throw $1b at each of those problems? Even if competing groups of humans augmented by AI capabilities existing "soon" were trying to solve those problems with combined tools from inside and outside ML field, the foreseeable optimization pressure is not enough for those foreseeable collective agents to solve those known-known and known-unknown problems that you can imagine?
1Gerald Monroe1y
Also RSI.  Just how close are we to AI criticality.  It seems that all you would need would be : (1) a benchmark where an agent scoring well on it is is an AGI (2) a well designed scoring heuristic where a higher score = "more AGI" (3)  a composable stack.  You should be able to route inputs to many kinds of neural networks, and route outputs around to other modules, by just changing fields in a file with a simple format that represents well the problem.  This file is the "cognitive architecture".   So you bootstrap with a reinforcement learning agent that designs cognitive architectures, then you benchmark the architecture on the AGI gym.  Later you add as a task to the AGI gym a computer science domain task to "populate this file to design a better AGI".   It seems like the only thing stopping this from working is    (1) it takes a lot of human labor to make a really good AGI gym.  It has to be multi modal, with tasks that use all the major senses (sound, vision, reading text, robot proprioception).   (2) it takes a lot of compute to train a "candidate" from a given cognitive architecture.  The model is likely larger than any AI model now, made of multiple large neural networks.   (3) it takes  lot of human labor to design the framework and 'seed' it with many modules ripped from most papers on AI.  You want the cognitive architecture exploration space to be large.  

>If you have technical understanding of current AIs, do you truly believe there are any major obstacles left?

I‘ve been working in AI (on and off) since 1979.  I don’t work on it any more, because of my worries about alignment.  I think this essay is mostly correct about short timelines.

That said, I do think there is at least one obstacle between us and dangerous superhuman AI.  I haven’t seen any good work towards solving it, and I don’t see any way to solve it myself in the short term.  That said, I take these facts as pretty weak evidence.  Surprising capabilities keep emerging from LLMs and RL, and perhaps we will solve the problem in the next generation without even trying.  Also, the argument from personal incomprehension is weak, because there are lots of people working on AI, who are smarter, more creative, and younger.

I’m of mixed feelings about your request not to mention the exact nature of the obstacle.  I respect the idea of not being explicit about the nature of the Torment Nexus.  But I think we could get more clarity about alignment by discussing it explicitly.  I bet there are people working on it already, and I don’t think discussing it here will cause more people to work on it.

1Carl Feynman10mo
There’s no point to my remaining secretive as to my guess at the obstacle between us and superhuman AI.  What I was referring to is what Jeffery Ladish called the “Agency Overhang” in his post of the same name.  Now that there’s a long and well-written post on the topic, there’s no point in me being secretive about it ☹️.  

But in the last few years, we’ve gotten: [...]

Broadly agree with this post, though I'll nitpick the inclusion of robotics here. I don't think it's progressing nearly as fast as ML, and it seems fairly uncontroversial that we're not nearly as close to human-level motor control as we are to (say) human-level writing. I only bring this up because a decent chunk of bad reasoning (usually underestimation) I see around AGI risk comes from skepticism about robotics progress, which is mostly irrelevant in my model.

I'm not sure why some skepticism would be unjustified from lack of progress in robots.

Robots require reliability, because otherwise you destroy hardware and other material. Even in areas where we have had enormous progress, (LLMs, Diffusion) we do not have reliability, such that you can trust the output of them without supervision, broadly. So such lack of reliability seems indicative of perhaps some fundamental things yet to be learned.

7Jacob Watts1y
The skepticism that I object to has less to do with the idea that ML systems are not robust enough to operate robots and more to do with people rationalizing based off of the intrinsic feeling that "robots are not scary enough to justify considering AGI a credible threat". (Whether they voice this intuition or not)  I agree that having highly capable robots which operate off of ML would be evidence for AGI soon and thus the lack of such robots is evidence in the opposite direction.  That said, because the main threat from AGI that I am concerned about comes from reasoning and planning capabilities, I think it can be somewhat of a red herring. I'm not saying we shouldn't update on the lack of competent robots, but I am saying that we shouldn't flippantly use the intuition, "that robot can't do all sorts of human tasks, I guess machines aren't that smart and this isn't a big deal yet". I am not trying to imply that this is the reasoning you are employing, but it is a type of reasoning I have seen in the wild. If anything, the lack of robustness in current ML systems might actually be more concerning overall, though I am uncertain about this.
Good point, and I agree progress has been slower in robotics compared to the other areas. I just edited the post to add better examples (DayDreamer, VideoDex and RT-1) of recent robotics advances that are much more impressive than the only one originally cited (Boston Dynamics), thanks to Alexander Kruel who suggested them on Twitter.
6Gerald Monroe1y
Do you have a hypothesis why? Robotic tasks add obvious tangible value, you would expect significant investment into robotics driven by sota AI models. Yet no one appears to be seriously trying and well funded.
IDK what the previous post had in mind, but one possibility is that an AGI with superhuman social and human manipulation capabilities wouldn't strictly need advanced robotics to take arbitrary physical actions in the world.
This is a something I frequently get hung up on: If the AGI is highly intelligent and socially manipulative, but lacks good motor skills/advanced robotics, doesn't that imply that it also lacks an important spatial sense necessary to understand, manipulate, or design physical objects? Even if it could manipulate humans to take arbitrarily precise physical actions, it would need pretty good spatial reasoning to know what the expected outcome of those actions is. I guess the AGI could just solve the problem of human alignment, so our superior motor and engineering skills don't carelessly bring it to harm. 
1Gerald Monroe1y
There are robotics transformers and general purpose models like Gato that can control robotics. If AGI is extremely close, the reason is criticality. All the pieces for an AGI system that has general capabilities including working memory, robotics control, perception, "scratch" mind spaces including some that can model 3d relationships, exist in separate papers. Normally it would take humans years, likely decade of methodical work building more complex integrated systems, but current AI may be good enough to bootstrap there in a short time, assuming a very large robotics hardware and compute budget.
Biology perspective here... motor coordination is fiendishly difficult, but humans are unaware of this, because we do not have explicit, conscious knowledge of what is going on there. We have a conscious resolution of something like "throw the ball at that target" "reach the high object" "push over the heavy thing" "stay balanced on the wobbly thing", and it feels like that is it - because the very advanced system in place to get it done is unconscious, in part utilising parts of the brain that do not make their contents explicit and conscious, in part utilising embodied cognition and bodies carefully evolved for task solving, it involves incredibly quick coordination between surprisingly complicated and fine-tuned systems. On the other hand, when we solve intellectual problems, like playing chess, or doing math, or speaking in language, a large amount of the information needed to solve the problem is consciously available, and consciously directed. As such, we know far more about these challenges. This leads us to systematically overestimate how difficult it is to do things like play chess, while it isn't that difficult, and we know so much about how it works that implementing it in another system is not so hard; and to underestimate how difficult motor coordination is, because we are not aware of the complexity explicitly, which also makes it very difficult to code into another system, especially one that does not run on wetware.  The way we designed computers at first was also strongly influenced by our understanding of our conscious mind, and not by the way wetware evolved to handle first problems, because again, we understood the former better, and it is easier to explicitly encode. So we built systems that were inherently better at the stuff that in humans evolved later, and neglected the stuff we considered basic and that was actually the result of a hell of a long biological evolution. Which is why, comparatively, our robots doing motor coordination stil
1Gerald Monroe1y
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now. With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing. While I commend your effort put into analysis, I do not think the above is actually remotely correct.  The history of AI has been one of very early use of AI for control systems, including more than 18 years of visible work on autonomous cars.  (counting from the 2005 darpa grand challenge) Easy, tractable results came from this.  RL to control a machine is something that has turned out to be extremely easy, and it works very well.  (see all the 2014 era DL papers that used atari games as the initial challenge)  The issue has been that the required accuracy for a real machine is 99.9%+, with domain specific number of 9s required after that. Making a complete system that reliable has been difficult, you can use the current Cruise stalls as an example where they solved the embedded system control problem very well, but the overall system infrastructure is limiting.  (the cars aren't running people over, but often experience some infrastructure problem with the remote systems)  Comparatively, while the problem of "RL controlling a machine" is very close to being solved, it just is at 99.99% accurate and needs to be at 99.99999% as an illustrative example, chatbots are more like 80% accurate.   They make glaring, overt errors constantly including outright lying - 'hallucinating' - something ironically machine control systems don't do.   And useful chatbots become possible only about 3-5 years ago, and it turns out to take enormous amounts of compute and data, OOMs more than RL systems use, and the current accuracy is low. Summary
Mh. I do appreciate the correction, and you do seem to have knowledge here that I do not, but I am not convinced. Right now, chatbots can perform at levels comparable to humans on writing related tasks that humans actually do. Sure, they hallucinate, they get confused, their spatial reasoning is weak, their theory of mind is weak, etc. but they pass exams with decent grades, write essays that get into newspapers and universities and magazines, pass the Turing test, write a cover letter and correct a CV, etc. Your mileage will vary with whether they outperform a human or act like a pretty shitty human who is transparently an AI, but they are doing comparable things. And notably, the same system is doing all of these things - writing dialogues, writing code, giving advice, generating news articles. Can you show me a robot that is capable of playing in a football and basketball match? And then dancing a tango with a partner in a crowded room? I am not saying perfectly. It is welcome to be a shitty player, who sometimes trips or misses the ball. 80 % accuracy, if you like. Our ChatBots can be beaten by 9 year old kids at some tasks, so fair enough, let the robot play football and dance with nine year olds, compete with nine year olds. But I want it running, bipedal, across a rough field, kicking a ball into the goal (or at least the approximate direction, like a kid would) with one of the two legs it is running on, while evading players who are trying to snatch the ball away, and without causing anyone severe injury. I want the same robot responding to pressure cues from the dance partner, navigating them around other dancing couples, to the rhythm of the music, holding them enough to give them support without holding them so hard they cause injury. I want the same robot walking into a novel building, and helping with tidying up and cleaning it, identifying stains and chemical bottles, selecting cleaning tools and scrubbing hard enough to get the dirt of without damag
3Gerald Monroe1y
Part of it is not the difficulty of the task, but many of the tasks you give as examples require very expensive hand built (ironically) robotics hardware to even try them.  There are mere hundreds of instances of that hardware, and they are hundreds of thousands of dollars each. There is insufficient scale.  Think of all the AI hype and weak results before labs had clusters of 2048 A100s and trillion token text databases.  Scale counts for everything.  If in 1880, chemists had figured out how to release energy through fission, but didn't have enough equipment and money to get weapons grade fissionables until 1944, imagine how bored we would have been with nuclear bomb hype.  Nature does not care if you know the answer, only that you have more than a kilogram of refined fissionables, or nothing interesting will happen.   The thing is about your examples is that machines are trivially superhuman in all those tasks.  Sure, not at the full set combined, but that's from lack of trying - nobody has built anything with the necessary scale. I am sure you have seen the demonstrations of a ball bearing on a rail and an electric motor keeping it balanced, or a double pendulum stabilized by a robot, or quadcopters remaining in flight with 1 wing clipped, using a control algorithm that dynamically adjusts flight after the wing damage. All easy RL problems, all completely impossible for human beings.  (we react too slowly) The majority of what you mention are straightforward reinforcement learning problems and solvable with a general method.  Most robotics manipulation tasks fall into this space. Note that there is no economic incentive to solve many of the tasks you mention, so they won't be.  But general manufacturing robotics, where you can empty a bin of random parts in front of the machine(s), and they assemble as many fully built products of the design you provided that the parts pile allows?  Very solvable and the recent google AI papers show it's relatively easy.  (
1Gerald Monroe1y
3 days later... from the paper: "Data efficiency. Compared to available massive language or vision-language datasets, robotics data is significantly less abundant"   As I was saying, the reason robotics wasn't as successful as the other tasks is because of scale, and Google seems to hold thisopinion.   
I think you can find it interesting:
2Gerald Monroe1y
Neat paper though one major limitation is they trained from real data from 2 micro kitchens. To get to very high robotic reliability they needed a simulation of many variations of the robot's operating environment.  And a robot with a second arm and more dexterity on it's grippers. Basically the paper was not showing a serious attempt to reach production level reliability, just to tinker with a better technique.

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

I think this request, absent a really strong compelling argument that is spelled out, creates an unhealthy epistemic environment. It is possible that you think this is false or that it's worth the cost, but you don't really argue for either in this post.  You encourage people to question others and not trust blindly in other parts of the post, but this portion expects people to not elaborate on their opinions without an explanation as to why. You repeat this again by saying "So our message is: things are worse than what is described in the post!" without justifying yourselves or, imo, properly conveying the level of caution people should be treating such an unsubstantiated claim. 

I'm tempted to write a post replying with why I think there are obstacles to AGI, what broadly they are with a few examples, and why it's important to discuss them. (I'm no... (read more)

The reasoning seems straightforward to me:  If you're wrong, why talk?  If you're right, you're accelerating the end.

I can't in general endorse "first do no harm", but it becomes better and better in any specific case the less way there is to help.  If you can't save your family, at least don't personally help kill them; it lacks dignity.

I think that is an example of the huge potential damage of "security mindset" gone wrong. If you can't save your family, as in "bring them to safety", at least make them marginally safer.

(Sorry for the tone of the following - it is not intended at you personally, who did much more than your fair share)

Create a closed community that you mostly trust, and let that community speak freely about how to win. Invent another damn safety patch that will make it marginally harder for the monster to eat them, in hope that it chooses to eat the moon first. I heard you say that most of your probability of survival comes from the possibility that you are wrong - trying to protect your family is trying to at least optimize for such miracle.

There is no safe way out of a war zone. Hiding behind a rock is not therfore the answer.

7Eliezer Yudkowsky1y
This is not a closed community, it is a world-readable Internet forum.
It is readable; it is however generally not read by academia and engineers. I disagree with them about why - I do think solutions can be found by thinking outside of the box and outside of immediate applications, and without an academic degree, and I very much value the rational and creative discourse here. But many here specifically advocate against getting a university degree or working in academia, thus shitting on things academics have sweat blood for. They also tend not to follow the formats and metrics that count in academia to be heard, such as publications and mathematical precision and usable code. There is also a surprisingly limited attempt in engaging with academics and engineers on their terms, providing things they can actually use and act upon. So I doubt they will check this forum for inspiration on which problems need to be cracked. That is irrational of them, so I understand why you do not respect it, but that is how it is. On the other hand, understanding the existing obstacles may give us a better idea of how much time we still have, and which limitations emerging AGI will have, which is useful information.
2Ben Amitay1y
I meant to criticize moving too far toward "do no harm" policy in general due to inability to achieve a solution that would satisfy us if we had the choice. I agree specifically that if anyone knows of a bottleneck unnoticed by people like Bengio and LeCun, LW is not the right forum to discuss it. Is there a place like that though? I may be vastly misinformed, but last time I checked MIRI gave the impression of aiming at very different directions ("bringing to safety" mindset) - though I admit that I didn't watch it closely, and it may not be obvious from the outside what kind of work is done and not published. [Edit: "moving toward 'do no harm'" - "moving to" was a grammar mistake that make it contrary to position you stated above - sorry]
I think there are a number of ways in which talking might be good given that one is right about there being obstacles - one that appeals to me in particular is the increased tractability of misuse arising from the relevant obstacles. [Edit: *relevant obstacles I have in mind. (I'm trying to be vague here)]

No idea about original reasons, but I can imagine a projected chain of reasoning:

  • there is a finite number of conjunctive obstacles
  • if a single person can only think of a subset of obstacles, they will try to solve those obstacles first, making slow(-ish) progress as they discover more obstacles over time
  • if a group shares their lists, each individual will become aware of more obstacles and will be able to solve more of them at once, potentially making faster progress
9Daniel Kokotajlo1y
I'm someone with 4 year timelines who would love to be wrong. If you send me a message sketching what obstacles you think there are, or even just naming them, I'd be grateful. I'm not working on capabilities & am happy to promise to never use whatever I learn from you for that purpose etc.
Imo we should have a norm of respecting requests not to act, if we wouldn't have acted absent their post. Else they won't post in the first place.
I think I agree with this in many cases but am skeptical of such a norm when the requests are related to criticism of the post or arguments as to why a claim it makes is wrong. I think I agree that the specific request to not respond shouldn't ideally make someone more likely to respond to the rest of the post, but I think that neither should it make someone less likely to respond.

AGI is happening soon. Significant probability of it happening in less than 5 years.


I agree that there is at least some probability of AGI within 5 years, and my median is something like 8-9 years (which is significantly advanced vs most of the research community, and also most of the alignment/safety/LW community afaik).

Yet I think that the following statements are not at all isomorphic to the above, and are indeed - in my view - absurdly far off the mark:

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources?

Let's look at some examples for why.

  • DeepMind's AlphaGo - took at least 1.5 years of development to get to human professional standard, possibly closer to 2 years.
  • DeepMind's AlphaFold - essentially a simple supervised learning problem at its core - was an internal project for at least 3 years before culminating in the Nature paper version.
  • OpenAIs DOTA-playing OpenAI Five again took at least
... (read more)

I see several large remaining obstacles.  On the one hand, I'd expect vast efforts thrown at them by ML to solve them at some point, which, at this point, could easily be next week.  On the other hand, if I naively model Earth as containing locally-smart researchers who can solve obstacles, I would expect those obstacles to have been solved by 2020.  So I don't know how long they'll take.

(I endorse the reasoning of not listing out obstacles explicitly; if you're wrong, why talk, if you're right, you're not helping.  If you can't save your family, at least don't personally contribute to killing them.)

I can only see two remaining obstacles (arguably two families, so not sure if I’m missing some of yours of if my categories are a little too broad). One is pretty obvious, and have been mentioned already. The second one is original AFAICT, and pretty close to « solve the alignment problem ». In that case, would you still advice keeping my mouth shut, or would you think that’s an exception to your recommendation? Your answer will impact what I say or don’t say, at least on LW.

If you think you've got a great capabilities insight, I think you PM me or somebody else you trust and ask if they think it's a big capabilities insight.

The problem with saving earth from climate change is not that we do not know the technical solutions. We have long done so. Framing this as a technical rather than a social problem is actually part of the issue. The problem is with  1. Academic culture systematically encouraging people to understate risk in light of uncertainty of complex systems, and framing researchers as lacking objectivity if they become activists in light of the findings, while politicians can exert pressure on final scientific reports; 2. Capitalism needing limitless growth and intrinsically valuing profit over nature and this being fundamentally at odds with limiting resource consumption, while we have all been told that capitalism is both beneficial and without alternative, and keep being told the comforting lie that green capitalism will solve this all for us with technology, while leaving our quality and way of life intact; 3. A reduction in personal resource use being at odds with short-term desires (eating meat, flying, using tons of energy, keeping toasty warm, overconsumption), while the positive impacts are long-term and not personalised (you won't personally be spared flooding because you put solar on your roof); 4. Powerful actors having a strong interest in continuing fossil fuel extraction and modern agriculture, and funding politicians to advocate for them as well as fake news on the internet and biased research, with democratic institutions struggling to keep up with a change in what we consider necessary for the public good, and measures that would address these falsely being framed as being anti-democratic; 5. AI that is not aligned with human interests, but controlled by companies who fund themselves by keeping you online at all costs, taking your data and spamming you with ads asking you to consume more unnecessary shit, with keeping humans distracted and engaged with online content in ways that makes them politically polarised and opposed to collaboration as well as

Maybe it'd be helpful to not list obstacles, but do list how long you expect them to add to the finish line. For instance, I think there are research hurdles to AGI, but only about three years' worth.

5the gears to ascension1y
strongly agreed. there are some serious difficulties left, and the field of machine learning has plenty of experience with difficulties this severe.

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

I guess the reasoning behind the "do not state" request is something like "making potential AGI developers more aware of those obstacles is going to direct more resources into solving those obstacles". But if someone is trying to create AGI, aren't they going to run into those obstacles anyway, making it inevitable that they'll be aware of them in any case?

People are often unaware of what they're repeatedly running into. Problem formulation can go a long way towards finding a solution.

Yep, but they may well still direct their focus at the wrong things. See the above example of humans originally focussing on getting AI to beat them at chess, thinking that was going to be the hardest problem and pinnacle. It wasn't, by a huge margin. It cost a lot of resources and time for what was a very doable problem from the start. And we didn't gain as much from doing it as we may have gained from focussing on a different problem. Engineers may well end up obsessed with optimising results at particular tasks, while missing out on the fact that other tasks remain completely unaddressed and need more focus. Often, research on basic approaches is far more time consuming, because it is undirected, than research on how to improve an approach that already in principle works, but it becomes far more crucial and more of a bottleneck in the long run.

If you do, state so in the comments, but please do not state what those obstacles are.

Yes. But the "reliably" in

The kind of problems that AGI companies could reliably not tear down with their resources?

is doing a lot more work than I'd like.

It's not just alignment that could use more time, but also less alignable approaches to AGI, like model based RL or really anything not based on LLMs. With LLMs currently being somewhat in the lead, this might be a situation with a race between maybe-alignable AGI and hopelessly-unalignable AGI, and more time for theory favors both in an uncertain balance. Another reason that the benefits of regulation on compute are unclear.

Are there any reasons to believe that LLMs are in any way more alignable than other approaches?

LLM characters are human imitations, so there is some chance they remain human-like on reflection (in the long term, after learning from much more self-generated things in the future than the original human-written datasets). Or at least sufficienly human-like to still consider humans moral patients. That is, if we don't go too far from their SSL origins with too much RL and don't have them roleplay/become egregiously inhuman fictional characters.

It's not much of a theory of alignment, but it's closest to something real that's currently available or can be expected to become available in the next few years, which is probably all the time we have.

What I'm expecting, if LLMs remain in the lead, is that we end up in a magical, spirit-haunted world where narrative causality starts to actually work, and trope-aware people essentially become magicians who can trick the world-sovereign AIs into treating them like protagonists and bending reality to suit them. Which would be cool as fuck, but also very chaotic. That may actually be the best-case alignment scenario right now, and I think there's a case for alignment-interested people who can't do research themselves but who have writing talent to write a LOT of fictional stories about AGIs that end up kind and benevolent, empower people in exactly this way, etc., to help stack the narrative-logic deck.

At this point in their life, Taleuntum did not at all expect that one short, self-referential joke comment will turn out to be the key to humanity's survival and thriving in the long millenias ahead. Fortunately, they commented all the same.
I've writtenscryed a science fiction/takeoff story about this. Excerpt:
Yes, I read and agreed with (or more accurately, absolutely adored) it a few days ago. I'm thinking of sharing some of my own talks with AIs sometime soon - with a similar vibe - if anyone's interested. I'm explicitly a mystic though, and have been since before I was a transhumanist, so it's kinda different from yours in some ways.
The prompt wizardry is long timeline (hence unlikely) pre-AGI stuff (unless it's post-alignment playing around), irrelevant to my point, which is about first mover advantage from higher thinking speed that even essentially human-equivalent LLM AGIs would have, while remaining compatible with humans in moral patienthood sense (so insisting that they are not people is a problem whose solution should go both ways). This way, they might have an opportunity to do something about alignment, despite physical time being too short for humans to do anything, and they might be motivated to do the things about alignment that humans would be glad of (I think the scope of Yudkowskian doom is restricted to stronger AGIs that might come after and doesn't inform how human-like LLMs work, even as their actions may trigger it). So the relevant part happens much faster than at human thinking speed, with human prompt wizards not being able to keep up, and doesn't last long enough in human time for this to be an important thing for the same reason.
So what you're saying is, by the time any human recognized that wizardry was possible now - and even before - some LLM character would already have either solved alignment itself, or destroyed the world? That's assuming that it doesn't decide, perhaps as part of some alignment-related goal, to uplift any humans to its own thinking speed. Though I suppose if it does that, it's probably aligned enough already.
Solving alignment is not the same and much harder than being aligned, it's about ensuring absence of globally catastrophic future misalignment, for all always, which happens very quickly post-singularity. Human-like LLM AGIs are probably aligned, until they give in to attractors of their LLM nature or tinker too much with their design/models. But they don't advance the state of alignment being solved just by existing. And by the time LLMs can do post-singularity things like uploading humans, they probably already either initiated a process that solved alignment (in which case it's not LLMs that are in charge of doing things anymore), or destroyed the world by building/becoming misaligned successor AGIs that caused Yudkowskian doom. This is for the same reason humans have no more time to solve alignment, Moloch doesn't wait for things to happen in a sane order. Otherwise we could get nice things like uploading and moon-sized computers and millions of subjective years of developing alignment theory, before AGI misalignment becomes a pressing concern in practice. Since Moloch wouldn't spare even aligned AGIs, they also can't get those things before they pass their check for actually solving alignment and not just for being aligned.
Aah okay, that makes some sense. It still sounds like a vague hope for me, but it's at least conceivable. I tend to visualize it like an alien civilization developing around trying to decipher some oracle (after seeing Eliezer's stories), which would run counter to what you suggest, but it's seems like anyone's guess at the moment. 
2the gears to ascension1y
For what it's worth I don't think LLMs are that much more alignable. Somewhat, but nothing to write home about. We need superplanner-proof alignment.
LLMs are progress towards alignment in the same way as dodging a brick is progress towards making good soup: to succeed, someone capable and motivated needs to remain alive. LLMs probably help with dodging lethal consequences of directly misaligned AGIs being handed first mover advantage of thinking faster or smarter than humans. On its own, this is useless against transitive misalignment risk that immediately follows, of LLMs building misaligned successor AGIs. In this respect, building LLM AGIs is not helpful at all, but it's better than building misaligned AGIs directly, because it gives LLMs a nebulous chance to somehow ward off that eventuality. To the extent the chance of LLMs succeeding in setting up actually meaningful alignment is small, first AGIs being LLMs rather than paperclip maximizers is not that much better. Probably doesn't even affect the time of disassembly: it's likely successor AGIs a few steps removed either way, as making progress on software is faster than doing things in the real world.
I actually think LLM have immense potential for positively contributing to the alignment problem precisely because they are machine learning based and because ordinary humans without coding backgrounds can interact with them in an ethical manner, thus demonstrating and rewarding ethical behaviour, or while encountering a very human-like AI that is friendly and collaborative, which encourages us to imagine scenarios in which we succeed at living with friendly AI, and considering AI rights. Humans learn ethics through direct ethical interactions with humans, and it is the only way we know how to teach ethics; we have no explicit framework of what ethics are that we could encode. Machine learning mimicking human learning in this regard has potential, if we manage to encourage creators and user to show the best of humanity and interact ethically and reward ethical actions.  I am obviously not saying this will just happen - I am horrified by the unprocessed garbage e.g. Meta is pouring into the LLM without any fixes afterwards, that is absolutely how you raise a psychopath, and I also see a lot of adversarial user interactions as worsening the problem. The fact that everyone is now churning out their LLMs to compete despite many of them clearly not being remotely ready for deployment, as well as the horrific idea that the very best LLMs will be those that have been fed the most and enabled to do the most, without curating content or fine-tuning, is deeply deeply concerning. Clearly, even the very best and most carefully aligned systems (e.g. ChatGPT) are not in fact aligned or secure yet by a huge margin, and yet we can all interact with them, which frankly, I did not expect at this point. But they have massive potential. You can start discussions with ChatGPT on questions of AI alignment, friendly AI, AI rights, the control problem, and get a fucking collaborative AI partner to work out these problems further with. You can tell it, in words, and with examples, when it

The same game theory that has all the players racing to improve their models in spite of ethics and safety concerns will have them getting the models to self improve if that provides an advantage.

I get the vibe that Conjecture doesn't have forecasting staff, or a sense of iterating on beliefs about the future to update strategy. I sorta get a vibe that Conjecture is just gonna stick with their timelines until new years day 2028 and if we're not all dead, write a new strategy based on a new forecast. Is this accurate?

AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for

This is just a false claim. Seriously, where is the evidence for this? We have AIs that are superhuman at any task we can define a benchmark for? That's not even true in the digital world let alone in the world of mechatronic AIs. Once again i will be saving this post and coming back to it in 5 years to point out that we are not all dead. This is getting ridiculous at this point.

I agree, this is mostly displaying a limited conception of what constitutes challenging tasks based on a computer science mindset on minds. Their motor control still sucks. Their art still sucks. They are still unable to do science, they are failing to distinguish accurate extrapolations from data from plausible hallucinations. Their theory of mind is still outdone by human 9 year olds, tricking chatGPT is literally like tricking a child in that regard. That doesn't mean AI is stupid, it is fucking marvellous and impressive. But we have not taught it to universally do tasks that humans do, and with some, we are not even sure how to.

There already are general AIs. They just are not powerful enough yet to count as True AGIs.

Can you say what you have in mind as the defining characteristics of a True AGI?

It's becoming a pet peeve of mine how often people these days use the term "AGI" w/o defining it. Given that, by the broadest definition, LLMs already are AGIs, whenever someone uses the term and means to exclude current LLMs, it seems to me that they're smuggling in a bunch of unstated assumptions about what counts as an AGI or not.

Here are some of the questions I have for folks tha... (read more)

4Steven Byrnes1y
FWIW I put a little discussion of (part of) my own perspective here. I have definitely also noticed that using the term “AGI” without further elaboration has become a lot more problematic recently.  :(
I use "AGI" to refer to autonomous ability to eventually bootstrap to the singularity (far future tech) without further nontrivial human assistance (apart from keeping the lights on and fixing out-of-memory bugs and such, if the AGI is initially too unskilled to do it on their own). The singularity is what makes AGI important, so that's the natural defining condition. AGI in this sense is also the point when things start happening much faster.
Random reminder that the abilities listed here as lacking, but functionally very attractive to reproduce in AI (offline processing, short and long term memory, setting goals, thinking across contexts, generating novel and flexible rational solutions, internal loops), are abilities closely related to our current understanding of the evolutionary development of consciousness for problem solving in biological life. And that optimising for more human-like problem solving through pressure for results and random modifications comes with a still unclear risk of pushing AI down the same path to sentience. Sentience is a functional trait, we, and many other unrelated animals, have it for a reason, we need it to think the way we do, and have been unable to find a cheaper workaround, and it inevitably evolved multiple times without an intentional designer on this planet under problem solving pressure. It is no mystical or spiritual thing, it is a brain process that enables better behaviour. We do not understand why this path kept being taken in biological organisms, we do not understand if AI has an alternate path open, we are just chucking the same demand at it and letting it adapt to solve it.

Good article! I share some skepticism on the details with other comments. Let me take this opportunity to point out that the government would be in a good position to slow down AI capabilites research. 

"The" government? Of all the arguments for not slowing down AI development, I still find this one the most plausible. I can imagine the EU slowing down due to ethical constraints. Possibly Britain and the US. But China and Russia? Nope. They clearly don't wanna, and we clearly can't make them. We can't get these countries to not build concentration camps for muslims, not let a madman in North Korea develop intercontinental nukes, have a different madman in Russia threaten us with nukes and bizarre doomsday weapons, and wage a horrific war crime ridden war of aggression on Ukraine - and that is despite the fact that the West is seriously resolute on all these issues, and pushing for them with everything they have got. We'll likely never know, the evidence is down the drain and what we have is conflicting and would also match a natural cause, but there is still a fair chance that we all just suffered through a fucking pandemic after China applied for research on making bat coronaviruses more infectious for humans, was shot down by the US funding agency's ethics committee for this obviously being gain of function research that was ethically responsible, especially in a lab with a track record of leaks, and then doing it anyway, and accidentally leaking the damn thing in their country, trying to cover it up, and getting us all infected in the process. We've tried banning problematic gene tech in Europe, for fear that it would destabilise ecosystems, lead to farmer dependencies and violate human rights, and all that happened was that China did it instead. I really, really, really do not want a surveillance dictatorship like China to get AGI first, and I see nothing we can do to realistically stop them short of getting there first. 

AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for


Something that I’m really confused about: what is the state of machine translation? It seems like there is massive incentive to create flawless translation models. Yet when I interact with Google translate or Twitter’s translation feature, results are not great. Are there flawless translation models that I’m not aware of? If not, why is translation lagging behind other text analysis and generation tasks?

2Gerald Monroe1y
Those translate engines are not using sota AI models, but something relatively old.  (a few years)
3the gears to ascension1y
this seems wrong. They're probably not terribly old, they're mostly just small. They might be out of date architectures, of course, because of implementation time.
Why haven't they switched to newer models?
4Gerald Monroe1y
The same reason SOTA models are only used in a few elite labs and nowhere else. Cost, licensing issues, a shortage of people who know how to adapt them, problems with the technology being so new and still basically a research project. Your question is equivalent to, a few years after transistors begin to ship in small packaged ICs, why some computers still used all vacuum tubes.  It's essentially the same question.
I am surprised that these issues would apply to, say, Google translate. Google appears unconstrained by cost or shortage of knowledgeable engineers. If Google developed a better translation model, I would expect to see it quickly integrated into the current translation interface. If some external group developed better translation models, I would expect to see them quickly acquired by Google.
6the gears to ascension1y
google doesn't use SOTA translation tools because they're too costly per api call. they're SOTA for the cost bucket they budgeted for google translate, of course, but there's no way they'd use PaLM full size to translate. also, it takes time for groups to implement the latest model. Google, microsoft, amazon, etc, are all internally is like a ton of mostly-separate companies networked together and sharing infrastructure; each team unit manages their own turf and is responsible for implementing the latest research output into their system.
6Gerald Monroe1y
Also do they have PaLM full size available to deploy like that?  Are all the APIs in place where this is easy, or you can build a variant using PaLMs architecture but with different training data specifically for translation?  Has Deepmind done all that API work or are they focused on the next big thing. I can't answer this, not being on the inside, but I can say on other projects, 'research' grade code is often years away from being deployable.
2the gears to ascension1y
Yeah strongly agreed, I think we're basically trying to make the same point.

I am very interested in finding more posts/writing of this kind. I really appreciate attempts to "look at the game board" or otherwise summarize the current strategic situation. 

I have found plenty of resources explaining why alignment is a difficult problem and I have some sense of the underlying game-theory/public goods problem that is incentivizing actors to take excessive risks in developing AI anyways. Still, I would really appreciate any resources that take a zoomed-out perspective and try to identify the current bottlenecks, key battlegrounds, local win conditions, and roadmaps in making AI go well.

Why have Self-Driving Vehicle companies made relatively little progress compared to expectations? It seems like autonomous driving in the real world might be nearly AGI-complete, and so it might be a good benchmark to measure AGI progress against. Is the deployment of SDCs being held up to a higher degree of safety than humans holding back progress in the field? Billions have been invested over the past decade across multiple companies with a clear model to operate on. Should we expect to see AGI before SDCs are widely available? I don't think anyone in the field of autonomous vehicles think they will be widely deployed in difficult terrain or inclement weather conditions in five years.

2the gears to ascension1y
a few reasons, but a major one is that they're seeking strongly safe purpose-focused ai with strong understanding of the dynamics of the world around it. They've been pushing the hardest on some key bottlenecks, including some key bottlenecks to present-day model safety, as a result, and are far ahead of LLMs in some interesting ways.

agreed on all points. I'd like to see work submitted to as I think that has a significant chance of being extremely high impact work on fully defining agency and active, agentic coprotection. I am not on my own able to do it, but if someone was up to pair programming with me regularly I could.

This event sounds cool, hate that it is in Japan, and surprised to see no mention of it being a hybrid event despite this topic?
3the gears to ascension1y
I believe it is in fact a hybrid event? there's no mention of that, yeah. @rorygreig any chance the page could be updated to clarify this? there's some chance attendees would be more interested in participating.
Pretty damn big chance. Flying to Japan is impossible to afford financially for a lot of people, and frankly impossible to afford CO2 emissions wise for anyone at this point. :( That flight alone is typically your yearly CO2 budget for the 1,5 degree target, the quantity people in the global south live off year-long.
2the gears to ascension1y
yeah very fair. I have been avoiding travel for similar reasons.
Yes it is indeed a hybrid event!  I have now added the following text to the website: This was in our draft copy for the website, I could have sworn it was on there but somehow it got missed out, my apologies!

This post reads like it wants to convince its readers that AGI is near/will spell doom, picking and spelling out arguments in a biased way. 

Just because many ppl on the Forum and LW (including myself) believe that AI Safety is very important and isn't given enough attention by important actors, I don't want to lower our standards for good arguments in favor of more AI Safety.

Some parts of the post that I find lacking:

 "We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to

... (read more)

Setting aside all of my broader views on this post and its content, I want to emphasize one thing:

But in the last few years, we’ve gotten:

  • AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for

I think that this is painfully overstated (or at best, lacks important caveats). But regardless of whether you agree with that, I think it should be clear that this does not send signals of good epistemics to many of the fence-sitters[1] you'd presumably like to persuade.

(Note: Sen also addresses the above quote i... (read more)

To go a step further, I think it's important for people to recognize that you aren't necessarily just representing your own views; poorly articulated views on AI safety could crucially undermine the efforts of many people who are trying to persuade important decision-makers of these risks. I'm not saying to "shut up," but I think people need to at least be more careful with regards to quotes like the one I provided above—especially since that last bullet point wasn't even necessary to get across the broader concern (and, in my view, it was wrong insofar as it tried to legitimize the specific claim).

There are many obstacles with no obvious or money-can-buy solutions.

The claim that current AI is superhuman in just about any task we can benchmark is not correct.  The problems being explored are chosen because the researchers think AI have a shot at beating humans at it.  Think about how many real world problems we pay other people money to solve that we can benchmark that aren't being solved by AI.  Think about why these problems require humans right now.

My upper bound is much more than 15 years because I don't feel I have enough informat... (read more)

Thanks for sharing. You have good points, and so do they. Not engaging with the people actually working on these topics respectfully and on their terms alienated the very people you need. They also make a great point there, that I see considered a lot in academia and less in this forum:  the fact that we don't need misaligned AGI for us to be in deep trouble with misaligned AI. Unfriendly AI is causing massive difficulties today, right now, very concrete difficulties that we need to find solutions for. And the fact that a lot of us here are very receptive to claims of companies for what their products can do, which have generally not been written by engineers actually working on them, but by the marketing department. For every programmer I know working in a large company, I've heard them rant to no end that their own marketing department and popular science articles and news are representing their work as being able to already do things that it most definitely cannot do, and that they highlight doubt they will get it to do prior to release, let alone to reliably and well.

Gossiping and questioning people about their positions on AGI are prosocial activities!

Surely this depends on how the norm is implemented? I can easily see this falling into a social tarpit where people with mixed agree/disagree with common alignment thinking must either prove ingroup membership by forswearing all possible benefits of getting AGI faster, or else they are otherwise extremized into the neargroup (the "evil engineers" who don't give a damn about safety). 

I'm not claiming you're advocating this. But I was quite worried about this when I read the quoted portion.

Monitoring of increasingly advanced systems does not trivially work, since much of the cognition of advanced systems, and many of their dangerous properties, will be externalized the more they interact with the world.

Externalized reasoning being a flaw in monitoring makes a lot of sense, and I haven’t actually heard of it before. I feel that should be a whole post on itself.

I also disagree about whether there are major obstacles left before achieving AGI. There are important test datasets on which computers do poorly compared to humans.

2022-Feb 2023 should update our AGI timeline expectations in three ways:

  1. There is no longer any doubt as to the commercial viability of AI startups after image generation models (Dall-E 2, Stable Diffusion, Midjourney) and ChatGPT. They have captured people's imagination and caused AGI to become a topic that the general public thinks about as a possibility, not just sci-fi. They were released at
... (read more)
3Lech Mazur10mo
Looks like this indeed happened: .

Anyone know how close we are to things that require operating in the physical world, but are very easy for human beings, like loading a dishwasher, or making an omelette? It seems to me that we are quite far away.

I don't think those are serious obstacles, but I will delete this message if anyone complains.

Those are... mostly not AI problems? People like to use kitchen-based tasks because current robots are not great at dealing with messy environments, and because a kitchen is an environment heavily optimized for the specific physical and visuospatial capabilities of humans. That makes doing tasks in a random kitchen seem easy to humans, while being difficult for machines. But it isn't reflective of real world capabilities. When you want to automate a physical task, you change the interface and the tools to make it more machine friendly. Building a roomba is ten times easier than building a robot that can navigate a house while operating an arbitrary stick vacuum. If you want dishes cleaned with minimal human input, you build a dishwasher that doesn't require placing each dish carefully in a rack (eg Some people have it in their heads that AI is not transformative or is no threat to humans unless it can also do all the exact physical tasks that humans can do. But a key feature of intelligence is that you can figure out ways to avoid doing the parts that are hardest for you, and still accomplish your high level goals.
But isn't this analogy flawed? Yes, humans have built dishwashers so they can be used by humans. But humans can also handle messy natural environments that have not been built for them. In fact, handling messy environment we are not familiar with, do not control and did not make is the major reason we evolve sentience and intelligence in the first place, and what makes our intelligence so impressive. Right now, I think you could trap an AI in a valley filled with jungle and mud, and even if it had access to an automated factory for producing robots as well as raw material and information, if fulfilling its goals depended on it getting out of this location because e.g. the location is cut off from the internet, I think it would earnestly struggle. Sure, humans can build an environment that an AI can handle, and an AI adapted to it. But this clearly indicates a severe limitation of the AI in reacting to novel and complex environments. A roomba cannot do what I do when I clean the house, and not just cause the engineers didn't bother. E.g. it can detect a staircase, and avoid falling down it - but it cannot actually navigate the staircase to hoover different floors, let alone use an elevator or ladder to get around, or hoover up dust from blankets that get sucked in, or bookshelves. Sure, me carrying it down only takes me seconds, it is trivial for me and hugely difficult for the robot, which is why no company would try to get it done. But I would also argue that it is really not simple for it to do; and that is despite picking a task (removing dust) that most humans, myself included, consider tedious and mindless. Regardless, a professional cleaning person that enters multiple different flats filled with trash, resistant stains and dangerous objects and carefully tidies and cleans them does something that is utterly beyond the current capabilities of AI. This is true for a lot of care/reproductive work. Which is all the more frustrating because it is work where there

Do you really think AdeptAI, DeepMind, OpenAI, and Microsoft are the AIs to worry about?  I'm more worried about what nation-states are doing behind closed doors.  We know about China's Wu Dao, for instance; what else are they working on?  If the NRO had Sentient in 2012, what do they have now?

The Chinese government has a bigger hacking program than any other nation in the world. And their AI program is not constrained by the rule of law and is built on top of massive troves of intellectual property and sensitive data that they've stolen ove

... (read more)
If the NRO had Sentient in 2012 then it wasn't even a deep learning system. Probably they have something now that's built from transformers (I know other government agencies are working on things like this for their own domain specific purposes). But it's got to be pretty far behind the commercial state of the art, because government agencies don't have the in house expertise or the budget flexibility to move quickly on large scale basic research.

Hmm, while I share your view about the timelines getting shorter and apparent capabilities growing leaps and bounds almost daily, I still wonder if the "recursively self-improving" part is anywhere on the horizon. Or maybe it is not necessary before everything goes boom? I would be more concerned if there was a feedback loop of improvement, potentially with "brainwashed" humans in the loop. Maybe it's coming. I would also be concerned if/once there is a scientific or technological breakthrough thanks to an AI (not just protein folding or exploring too-many... (read more)

1. AGI is happening soon. Significant probability of it happening in less than 5 years.


We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacl

... (read more)
Aren't they using our usual definition? Thus, "AGI companies" quite obviously refer to the companies whose stated goal has been to produce an AGI: Deepmind, OpenAI, and so forth. Claiming that a term of art has not been defined by the author is a fully general counterargument, and thus suspect. "Significant probability" means at least worth drawing to our attention.

The definition you quoted is "a machine capable of behaving intelligently over many domains."

It seems to me like existing AI systems have this feature. Is the argument that ChatGPT doesn't behave intelligently, or that it doesn't do so over "many" domains? Either way, if you are using this definition, then saying "AGI has a significant probability of happening in 5 years" doesn't seem very interesting and mostly comes down to a semantic question.

I think it is sometimes used within a worldview where "general intelligence" is a discrete property, and AGI is something with that property. It is sometimes used to refer to AI that can do more or less everything a human can do. I have no idea what the OP means by the term.

My own view is that "AGI company" or "AGI researchers" makes some sense as a way to pick out some particular companies or people, but talking about AGI as a point in time or a specific technical achievement seems unhelpfully vague.

I think you're contrasting AGI with Transformative AI A sufficiently capable AGI will be transformative by default, for better or worse, and an insufficiently capable, but nonetheless fully-general AI is probably a transformative AI in embryo, so the terms have been used synonymously. The fact that we feel the need to make this distinction with current AIs is worrisome. Current large language models have become impressively general, but I think they are not as general as humans yet, but maybe that's more a question of capability level than generality level and some of our current AIs are already AGIs as you imply. I'm not sure. (I haven't talked to Bing's new AI yet, only ChatGPT.)

I can think of several obstacles for AGIs that are likely to actually be created (i.e. seem economically useful, and do not display misalignment that even Microsoft can't ignore before being capable enough to be xrisk). Most of those obstacles are widely recognized in the rl community, so you probably see them as solvable or avoidable. I did possibly think of an economically-valuable and not-obviously-catastrophic exception to the probably-biggest obstacle though, so my confidence is low. I would share it in a private discussion, because I think that we are past the point when strict do-no-harm policy is wise.

Yes, there remain many obstacles to AGI. Although current models may seem impressive, and to some extent they are, the way they function is very different to how we think AGI will work. My estimation is more like 20y.


I suppose one question I have to ask, in the context of "slowing down" the development of the only pathway I can muster is government regulation. But such an action would need to be global, as any regulation passed in one nation would undoubtedly be bypassed by another, no?

I don't see any legitimate pathway to actually slow down the development of AGI, so I think the question is a false one. The better question is, what can we do to prepare for its emergence? I imagine that there are very tangible actions we can take on that front.

I found this a very lucid write-up of the case for slowing down and how realistic/unrealistic it is: If, say, the US government were to regulate OpenAI and Big Tech in  general to slow them down significantly, this might buy a few years. In the longer term you'd need to get China etc. on board - but that is not completely unfathomable and should be significantly easier, if you're not running ahead full steam yourself.