Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I occasionally have some thoughts about why AGI might not be as near as a lot of people seem to think, but I'm confused about how/whether to talk about them in public.

The biggest reason for not talking about them is that one person's "here is a list of capabilities that I think an AGI would need to have, that I don't see there being progress on" is another person's "here's a roadmap of AGI capabilities that we should do focused research on". Any articulation of missing capabilities that is clear enough to be convincing, seems also clear enough to get people thinking about how to achieve those capabilities.

At the same time, the community thinking that AGI is closer than it really is (if that's indeed the case) has numerous costs, including at least:

  • Immense mental health costs to a huge number of people who think that AGI is imminent
  • People at large making bad strategic decisions that end up having major costs, e.g. not putting any money in savings because they expect it to not matter soon
  • Alignment people specifically making bad strategic decisions that end up having major costs, e.g. focusing on alignment approaches that one might pay off in the long term and neglecting more foundational long-term research
  • Alignment people losing credibility and getting a reputation of crying wolf once predicted AGI advances fail to materialize

Having a better model of what exactly is missing could conceivably also make it easier to predict when AGI will actually be near. But I'm not sure to what extent this is actually the case, since the development of core AGI competencies feels more of a question of insight than grind[1], and insight seems very hard to predict.

A benefit from this that does seem more plausible would be if the analysis of capabilities gave us information that we could use to figure out what a good future landscape would look like. For example, suppose that we aren't likely to get AGI soon and that the capabilities we currently have will create a society that looks more like the one described in Comprehensive AI Services, and that such services could safely be used to detect signs of actually dangerous AGIs. If this was the case, then it would be important to know that we may want to accelerate the deployment of technologies that are taking in the world in a CAIS-like direction, and possibly e.g. promote rather than oppose things like open source LLMs.

One argument would be that if AGI really isn't near, then that's going to be obvious pretty soon, and it's unlikely that my arguments in particular for this would be all that unique - someone else would be likely to make them soon anyway. But I think this argument cuts both ways - if someone else is likely to make the same arguments soon anyway, then there's also limited benefit in writing them up. (Of course, if it saves people from significant mental anguish, even just making those arguments slightly earlier seems good, so overall this argument seems like it's weakly in favor of writing up the arguments.)

  1. ^

    From Armstrong & Sotala (2012)

    Some AI prediction claim that AI will result from grind: i.e. lots of hard work and money. Other claim that AI will need special insights: new unexpected ideas that will blow the field wide open (Deutsch 2012).

    In general, we are quite good at predicting grind. Project managers and various leaders are often quite good at estimating the length of projects (as long as they’re not directly involved in the project (Buehler, Griffin, and Ross 1994)). Even for relatively creative work, people have sufficient feedback to hazard reasonable guesses. Publication dates for video games, for instance, though often over-optimistic, are generally not ridiculously erroneous—even though video games involve a lot of creative design, play-testing, art, programing the game “AI,” etc. Moore’s law could be taken as an ultimate example of grind: we expect the global efforts of many engineers across many fields to average out to a rather predictable exponential growth.

    Predicting insight, on the other hand, seems a much more daunting task. Take the Riemann hypothesis, a well-established mathematical hypothesis from 1885, (Riemann 1859). How would one go about estimating how long it would take to solve? How about the P = NP hypothesis in computing? Mathematicians seldom try and predict when major problems will be solved, because they recognize that insight is very hard to predict. And even if predictions could be attempted (the age of the Riemann’s hypothesis hints that it probably isn’t right on the cusp of being solved), they would need much larger error bars than grind predictions. If AI requires insights, we are also handicapped by the fact of not knowing what these insights are (unlike the Riemann hypothesis, where the hypothesis is clearly stated, and only the proof is missing). This could be mitigated somewhat if we assumed there were several different insights, each of which could separately lead to AI. But we would need good grounds to assume that.


Ω 26

New Answer
New Comment

7 Answers sorted by

Obviously I think it's worth being careful, but I think in general it's actually relatively hard to accidentally advance capabilities too much by working specifically on alignment. Some reasons:

  1. Researchers of all fields tend to do this thing where they have really strong conviction in their direction and think everyone should work on their thing. Convincing them that some other direction is better is actually pretty hard even if you're trying to shove your ideas down their throats.
  2. Often the bottleneck is not that nobody realizes that something is a bottleneck, but rather that nobody knows how to fix it. In these cases, calling attention to the bottleneck doesn't really speed things up, whereas for thinking about alignment we can reason about what things would look like if it were to be solved.
  3. It's generally harder to make progress on something by accident than to make progress on purpose on something if you try really hard to do it. I think this is true even if there is a lot of overlap. There's also an EMH argument one could make here but I won't spell it out.

I think the alignment community thinking correctly is essential for solving alignment. Especially because we will have very limited empirical evidence before AGI, and that evidence will not be obviously directly applicable without some associated abstract argument, any trustworthy alignment solution has to route through the community reasoning sanely.

Also to be clear I think the "advancing capabilities is actually good because it gives us more information on what AGI will look like" take is very bad and I am not defending it. The arguments I made above don't apply, because they basically hinge on work on alignment not actually advancing capabilities.

Hasn't the alignment community historically done a lot to fuel capabilities?

For example, here's an excerpt from a post I read recently

My guess is RLHF research has been pushing on a commercialization bottleneck and had a pretty large counterfactual effect on AI investment, causing a huge uptick in investment into AI and potentially an arms race between Microsoft and Google towards AGI: 

I don't think RLHF in particular had a very large counterfactual impact on commercialization or the arms race. The idea of non-RL instruction tuning for taking base models and making them more useful is very obvious for commercialization (there are multiple concurrent works to InstructGPT). PPO is better than just SFT or simpler approaches on top of SFT, but not groundbreakingly more so. You can compare text-davinci-002 (FeedME) and text-davinci-003 (PPO) to see. The arms race was directly caused by ChatGPT, which took off quite unexpectedly not because of model quality due to RLHF, but because the UI was much more intuitive to users than the Playground (instruction following GPT3.5 was already in the API and didn't take off in the same way). The tech tree from having a powerful base model to having a chatbot is not constrained on RLHF existing at all, either. To be clear, I happen to also not be very optimistic about the alignment relevance of RLHF work beyond the first few papers--certainly if someone were to publish a paper today making RLHF twice as data efficient or whatever I would consider this basically just a capabilities paper.

I think empirically EA has done a bunch to speed up capabilities accidentally. And I think theoretically we're at a point in history where simply sharing an idea can get it in the water supply faster than ever before.

A list of unsolved problems, if one of them is both true and underappreciated, can have a big impact.

The conversations I've had with people at Deepmind, OpenAI, and in academia make me very sure that lots of ideas on capabilities increases are already out there so there's a high chance anything you suggest would be something people are already thinking about. Possibly running your ideas past someone in those circles, and sharing anything they think is unoriginal would be safe-ish?

I think one of the big bottlenecks is a lack of ways to predict how much different ideas would help without actually trying them at costly large scale. Unfortunately, this is also a barrier to good alignment work. I don't have good ideas on making differential progress on this.

If {the reasoning for why AGI might not be near} comprises {a list of missing capabilities}, then my current guess is that the least-bad option would be to share that reasoning in private with a small number of relevant (and sufficiently trustworthy) people[1].

(More generally, my priors strongly suggest keeping any pointers to AGI-enabling capabilities private.)

  1. E.g. the most capable alignment researchers who seem (to you) to be making bad strategic decisions due to not having considered {the reasoning for why AGI might not be near}. ↩︎

I think that sharing the reasoning in private with a small number of people might somewhat help with the "Alignment people specifically making bad strategic decisions that end up having major costs" cost, but not the others, and even then it would only help a small amount of the people working in alignment rather than the field in general.

I mostly agree. I also think that impact is very unevenly distributed over people; the most impactful 5% of people probably account for >70% of the impact. [1] And if so, then the difference in positive impact between {informing the top 5%} and {broadcasting to the field in general on the open Internet} is probably not very large. [2] Possibly also worth considering: Would (e.g.) writing a public post actually reach those few key people more effectively than (e.g.) sending a handful of direct/targeted emails? [3] -------------------------------------------------------------------------------- 1. Talking about AI (alignment) here, but I think something like this applies in many fields. I don't have a good quantification of "impact" in mind, though, so this is very hand-wavey. ↩︎ 2. Each approach has its downsides. The first approach requires identifying the relevant people, and is likely more effortful. The latter approach has the downside of putting potentially world-ending information in the hands of people who would use it to end the world (a bit sooner than they otherwise would). ↩︎ 3. What is in fact the most effective way to reach whoever needs to be reached? (I don't know.) ↩︎

From a broad policy perspective, it can be tricky to know what to communicate. I think it helps if we think a bit more about the effects of our communication and a bit less about correctly conveying our level of credence in particular claims. Let me explain.

If we communicate the simple idea that AGI is near then it pushes people to work on safety projects that would be good to work on even if AGI is not near while paying some costs in terms of reputation, mental health, and personal wealth.

If we communicate the simple idea that AGI is not near then people will feel less need to work on safety soon. This would let them not miss out on opportunities that would be good to take ahead of when they actually need to focus on AI safety.

We can only really communicate one thing at a time to people. Also, we should worry more about tail risks a false positives (thinking we can build AGI safely when we cannot) than false negatives (thinking we can't build AGI safely when we can). Taking these two facts into consideration, I think the policy implication is clear: unless there is extremely strong evidence that AGI is not near, we must act and communicate as if AGI is near.

Reading Habryka's recent discussion might give some inspiration.

Whatever the probability of AGI in the reasonably near future (5-10 years), the probability of societal shifts due to implementation of highly capable yet sub-AGI AI is strictly higher. I think regardless of where AI "lands" in terms of slowing down in progress (if it is the case we see an AI winter/fall), the application of systems that exist even just today, even if technological progress were to stop, is enough to merit appreciating the different world that is coming within the same order of magnitude as how different it would be with AGI. 

I think it's almost impossible at this point to argue against the value of providence with respect to the rise of dumb (in the relative to AGI sense) but highly highly capable AI.

I think it is okay for you to be vague. Simply saying that you can see numerous bottlenecks, but don't wish to list them to avoid others working on them, is enough to cause some weaker update than a list would cause. 

2 comments, sorted by Click to highlight new comments since: Today at 3:42 AM

IME a lot of people's stated reasons for thinking AGI is near involve mistaken reasoning and those mistakes can be discussed without revealing capabilities ideas:

An alternative framing that might be useful: What do you see as the main bottleneck for people having better predictions of timelines (as you see it)?

Do you in fact think that having such a list is it?

New to LessWrong?