The movement to reduce AI x-risk  is overly purist. This is leading to a lot of sects to maintain each individual sect's platonic level of purity and is actively (greatly) harming the cause.

How the Safety Sects Manifest

  • People suggest not publishing AI research
  • More recently, Jan and his team leaving OpenAI
  • Less recently, Paul Christiano leaving OpenAI to form METR[1]
  • Even less recently, Anthropic  forming off of OpenAI
  • A suggestion to blacklist anyone who decided to give $30 million (a paltry sum of money for a startup) to OpenAI. 
     

I think these were all legitimate responses to a perceived increase in risk, but ultimately did or will do more harm than good. Disclaimer: I am the least sure that the formation Anthropic increases p(doom) but I speculate, post AGI, it will be seen as such.  

The Safetyists Played Their Hands Too Early

To a fundamentalist, it's unethical to ignore the causes of those actions, but the world is a messy and unpredictable place. It isn't possible to get anything done without cooperating with some actors who may be deceitful or even harmful.  As an example, most corporations are filled with people who don't care about the mission and would hop for a higher paying job. Despite this apparent mess of conflicting incentives, most corporations are very good at making a lot of money. Maybe it isn't possible to align incentives for non-monetary goals but I doubt this. (Paying an employee more hurts the company's profits).

The ideal response to each of these examples is to wait until we're far closer to AGI to ring the alarm bells. If prediction markets are right, we still have ~8 years until we have something that meets their relatively weak definition of AGI. There is no momentum in being 8 years early, and instead the doom claims lose credibility the same way the Earth going underwater predictions of the 70s fell flat.[2] This behavior has happened with GPT-2 as well. 

I get race conditions are a factor in those decisions, but hardware is probably the key limiting factor, and they are already follow an exponential curve.[3] If there is no global race to make AGI, it's far more likely that Google builds a bunch of bigger datacenters to train their content and ads algorithms. Then someone at DeepMind stumbles across it with little international scrutiny. Google leadership realizes this will make them a lot of money, then races to use it without the world being prepared at all. (Or meta does this exact thing, their datacenter built to compete with TikTok is training llama3-400b)

The Various Safety Sects Will Continue To Lose Relevance

If Jan and Ilya don't end up joining DeepMind or if AGI does not come within 1-2 years, I will consider it a net increase in p(doom) that they can't compromise on their safety beliefs to actually make an impact. I predict Anthropic will lose relevance. They will likely never have access to the amount of compute DeepMind or OpenAI will. They are valued at ~1/5th of the amount OpenAI is valued at so I'm guessing whatever amount OpenAI raised is significantly more than what they have raised.[4] It is looking increasingly clear that the "safer" group of players have nowhere near as much compute as "unsafe" group of players. The "unsafe" group will likely reach AGI first. Will Anthropic hold themselves to their charter clause? No one knows, but I highly doubt it. I think the founder's egos will rationalize not needing to trigger the charter clause until it's too late.

Safetyism Can't Exist Without Strong Backers

Sidenote: Recently there was this comment. I think this viewpoint is a good example of what I'm arguing against. It will be impossible to do anything without money, compute, or clout. So to sum this post up, if your alignment plan doesn't involve OpenAI, DeepMind, or Anthropic solving it, it won't work.   

  1. ^

    I claim that historically METR will ultimately have had little to no positive impact on catastrophic risks for AI. In fact, Paul's appointment at NIST was allegedly met with a "revolt" from some employees, which if true, is very sad. I doubt this would happen if he was still associated with OpenAI in some capacity. Clout matters.

  2. ^

    This is a highly charitable comparison as the claimed negative impacts of climate change actually were happening at the time. There was lots of in-your-face evidence with smog from coal plants and what not. 

  3. ^

    Ignore Nvidia presenting reducing precision as a "gain".

  4. ^

    OpenAI's recent raise was not disclosed, however I assume they will have lower rates for the Stargate datacenter.

    PS: 
    I think overall this is a positive interpretation of these sect splits. A more negative interpretation of Anthropic could be that safetyism is a rationalization for wanting to create their own company to enrich themselves. Jan and Ilya's departures could just be mainly due to a loss of internal influence after a failed coup which was really driven by a desire to not productize. 

New Comment
11 comments, sorted by Click to highlight new comments since:

I think your model is a bit simplistic. METR has absolutely influenced the behavior of the big labs, including DeepMind. Even if all impact goes through the big labs, you could have more influence outside of the lab than as one of many employees within. Being the head of a regulatory agency that oversees the labs sets policy in a much more direct way than a mid level exec within the company can.

[-]O O30

Is there evidence that METR had more than nominal impact? I also think the lack of clout will limit his influence in the government. To some government employee, he’s just someone from a random startup they never heard of having outsized influence. Within that agency he's just a cog in some slow moving behemoth. Within OpenAI he is at least an influential voice in the safety org.

I work at DeepMind and have been influenced by METR. :)

[-]O O10

That is great to hear, but I find it probable they’ll be ignored/lobbied against/gamed when it goes against business interests.

I think this post might suffer from the lack of distinction between karma and agreement/disagreement on the level of posts. I don't think it deserves negative karma, but with this range of topics, it is certain to elicit a lot of disagreement.


Of course, one meta-issue is the diversity of opinion, both in the AI community and in the AI existential safety community.

The diversity of opinion in the AI community is huge, but it is somewhat obfuscated by "money, compute, and SOTA success" effects, which tend to create an artificial impression of consensus when one looks from the outside. But people often move from leading orgs to pursue less standard approaches, in particular, because large orgs are often not so friendly to those non-standard approaches.

The diversity of opinion in the AI existential safety community is at least as big (and is probably even larger, which is natural given that the field is much younger, with its progress being much less certain), but, in addition to that, the diversity is less obfuscated, because it does not have anything resembling the Transformer-based LLM highly successful center around which people can consolidate.

I doubt that the diversity of opinion in the AI existential safety community is likely to decrease, and I doubt that such a decrease would be desirable.


Another meta-issue is how much we should agree on the super-importance of compute. On this meta-issue, the consensus in the AI community and in the AI existential safety community is very strong (and in the case of the AI existential safety community, the reason for this consensus is that compute is, at least, a lever one could plausibly hope to regulate).

But is it actually that unquestionable? Even with Microsoft backing OpenAI, Google should have always been ahead of OpenAI, if it were just a matter of raw compute.

The Llama-3-70B training run is only in millions of GPU hours, so the cost of training can't much exceed 10 million dollars, and it is a model roughly equivalent to early GPT-4 in its power.

I think that non-standard architectural and algorithmic breakthroughs can easily make smaller players competitive, especially as inertia of adherence to "what has been proven before" will inhibit the largest players.


Then, finally, there is all this focus of conversations around "AGI", both in the AI community and in the AI existential safety community.

But for the purpose of existential safety we should not focus on "AGI" (whatever that might be). We should focus on a much more narrow ability of AI systems to accelerate AI research and development.

Here we are very close. E.g. John Schulman in his latest podcast with Dwarkesh said

Even in one or two years, we'll find that the models can do a lot more involved tasks than they can do now. For example, you could imagine having the models carry out a whole coding project instead of it giving you one suggestion on how to write a function. You could imagine the model taking high-level instructions on what to code and going out on its own, writing any files, and testing it, and looking at the output. It might even iterate on that a bit. So just much more complex tasks.

OK, so we are likely to have that (I don't think he is over-optimistic here), and the models are already very capable of discussing AI research papers and exhibit good comprehension of those papers (that's one of my main use cases for LLMs: to help me understand an AI research paper better and faster). And they will get better at that as well.

This combination of the coming ability of LLMs to do end-to-end software projects on their own and the increasing competence of LLMs in their comprehension of AI research sounds like a good reason to anticipate rapidly intensifying phenomenon of AI systems accelerating AI research and development faster and faster in a very near future. Hence the anticipation of very short timelines by many people (although this is still a minority view, even in the AI existential safety circles).

[-]O O30

The diversity of opinion in the AI existential safety community is at least as big (and is probably even larger, which is natural given that the field is much younger, with its progress being much less certain), but, in addition to that, the diversity is less obfuscated, because it does not have anything resembling the Transformer-based LLM highly successful center around which people can consolidate.


I still don't see how you can validate any alignment ideas without having a lot of compute to test them out on precursor models. Or how you can validate them without training misaligned models that won't be released to the public. That's why I have almost 0 faith in any of these smaller players. Maybe it's good for small players to publish research so big labs can try to repro them. In that case, I still don't see why you would leave your lab to do that. Your idea is much more likely to be reproduced if you influence training runs or have greater access to more diverse models at a lab. I take this as part of why for example Neel Nanda joined an AI lab.

But is it actually that unquestionable? Even with Microsoft backing OpenAI, Google should have always been ahead of OpenAI, if it were just a matter of raw compute.

Since hardware jumps are exponential, even if they don't do "yolo-runs" like OpenAI does where they dedicate a large portion of their compute to risky ideas, just wait a few years until their GPUs/TPUs get better and a gpt-4 sized model is much cheaper to train. I expect Google to race ahead of OpenAI sooner or later, at least until Stargate is finished.  Arguably their demos have been more technically impressive, even if OpenAI's demos are shinier looking. 
 

I think that non-standard architectural and algorithmic breakthroughs can easily make smaller players competitive, especially as inertia of adherence to "what has been proven before" will inhibit the largest players.


Do these exist? My model is that most ideas are just come up with experimentally validating papers and hypothesis through trial-and-error (i.e. compute), and OpenAI wasn't a small player in terms of compute. They also used standard architectures and algorithms. They just took more risks than Google. But a risky "yolo run" now is a "standard training run" in a few years.

 

Even in one or two years, we'll find that the models can do a lot more involved tasks than they can do now. For example, you could imagine having the models carry out a whole coding project instead of it giving you one suggestion on how to write a function. You could imagine the model taking high-level instructions on what to code and going out on its own, writing any files, and testing it, and looking at the output. It might even iterate on that a bit. So just much more complex tasks.


OK, so we are likely to have that (I don't think he is over-optimistic here), and the models are already very capable of discussing AI research papers and exhibit good comprehension of those papers (that's one of my main use cases for LLMs: to help me understand an AI research paper better and faster). And they will get better at that as well.

This really does not sound like AGI to me (or at least highly depends on what a coding project means here) and prediction+stock markets don't buy that this will be AGI either.  This just sounds like a marginally better GPT-4. I'd expect AGI to be as transformative as people expect AGI to be when it can automate hardware research.

It's certainly true that having a lot of hardware is super-useful. One can try more things, one can pull more resources towards things deemed more important, one can do longer runs if a training scheme does not saturate, but keeps improving.

:-) And yes, I don't think a laptop with a 4090 is existentially dangerous (yet), and even a single installation with 8 H100s is probably not enough (at the current and near-future state of algorithmic art) :-)

But take a configuration worth a few million dollars, and one starts having some chances...

Of course, if a place with more hardware decides to adopt a non-standard scheme invented by a relatively hardware-poor place, the place with more hardware would win. But a non-standard scheme might be non-public, and even if it is public, people often have strong opinions about what to try and what not to try, and those opinions might interfere with a timely attempt.

I think that non-standard architectural and algorithmic breakthroughs can easily make smaller players competitive, especially as inertia of adherence to "what has been proven before" will inhibit the largest players.

Do these exist?

Yes, of course, we are seeing a rich stream of promising new things, ranging from evolutionary schemas (many of which tend towards open-endedness and therefore might be particularly unsafe, while very promising) to various derivatives of Mamba to potentially more interpretable architectures (like Kolmogorov-Arnold networks or like recent Memory Mosaics, which is an academic collaboration with Meta, but which has not been a consumer of significant compute yet) to GFlowNet motifs from Bengio group, and so on.

These things are mostly coming from places which seem to have "medium compute" (although we don't have exact knowledge about their compute): Schmidhuber's group, Sakana AI, Zyphra AI, Liquid AI, and so on. And I doubt that Microsoft or Google have a program dedicated to "trying everything that look promising", even though it is true that they have manpower and hardware to do just that. But would they choose to do that?

OK, so we are likely to have that (I don't think he is over-optimistic here), and the models are already very capable of discussing AI research papers and exhibit good comprehension of those papers (that's one of my main use cases for LLMs: to help me understand an AI research paper better and faster). And they will get better at that as well.

This really does not sound like AGI to me (or at least highly depends on what a coding project means here)

If it's an open-ended AI project, it sounds like "foom before AGI", with AGI-strength appearing at some point on the trajectory as a side-effect.

The key here is that when people discuss "foom", they usually tend to focus on a (rather strong) argument that AGI is likely to be sufficient for "foom". But AGI is not necessary for "foom", one can have "foom" fully in progress before full AGI is achieved ("the road to superintelligence goes not via human equivalence, but around it").

[-]O O10

And I doubt that Microsoft or Google have a program dedicated to "trying everything that look promising", even though it is true that they have manpower and hardware to do just that. But would they choose to do that?

Actually I'm under the impression a lot of what they do is just sharing papers in a company slack and reproducing stuff at scale. Now of course they might intuitively block out certain approaches that they think are dead-ends but turn out to be promising, but I wouldn't underestimate their agility at adapting new approaches if something unexpected is found.[1]  My mental model is entirely informed from seeing Dwarkesh's interview with DeepMind researchers. They talk about ruthless efficiency in trying out new ideas and seeing what works. They also talk about how having more compute would make them X times better researchers. 

Yes, of course, we are seeing a rich stream of promising new things, ranging from evolutionary schemas (many of which tend towards open-endedness and therefore might be particularly unsafe, while very promising) to various derivatives of Mamba to potentially more interpretable architectures (like Kolmogorov-Arnold networks or like recent Memory Mosaics, which is an academic collaboration with Meta, but which has not been a consumer of significant compute yet) to GFlowNet motifs from Bengio group, and so on.

I think these are all at best marginal improvements and will be dwarfed by more compute[2] or at least will only beat SOTA after being given more compute. I think the space for algo improvement for a given amount of compute is saturated quickly.  Also if anything, the average smaller place will over-index on techniques that crank out a little extra performance on smaller models but fail at scale. 

Of course, if a place with more hardware decides to adopt a non-standard scheme invented by a relatively hardware-poor place, the place with more hardware would win. 

My mental model of the hardware poor is they want to publicize their results as fast as they can so they get more clout, VC funding, or just getting essentially acquired by big tech. Academic recognition in the form of citations drive researchers. Getting rich drives the founders. 

The key here is that when people discuss "foom", they usually tend to focus on a (rather strong) argument that AGI is likely to be sufficient for "foom". But AGI is not necessary for "foom", one can have "foom" fully in progress before full AGI is achieved ("the road to superintelligence goes not via human equivalence, but around it").

Yes I agree there is a small possibility, but I find this is almost "pascal mugging". I think there is a stickiness of the AlphaGo model of things that's informing some choices which are objectively bad in a world where the AlphaGo model doesn't hold. The fear response to the low odds world is not appropriate for the high odds world.  

  1. ^

    I think the time it takes to deploy a model after training is making people think these labs are slower than they actually are. 

  2. ^

    As an example most improvements from Llama-3 came from just training the models on more data (with more compute). Sora looks worse than SOTA approaches until you throw more compute at it.

And I doubt that Microsoft or Google have a program dedicated to "trying everything that look promising", even though it is true that they have manpower and hardware to do just that. But would they choose to do that?

Actually I'm under the impression a lot of what they do is just sharing papers in a company slack and reproducing stuff at scale.

I'd love to have a better feel for how much of the promising things they try to reproduce at scale...

Unfortunately, I don't have enough inside access for that...

My mental model of the hardware poor is they want to publicize their results as fast as they can so they get more clout, VC funding, or just getting essentially acquired by big tech. Academic recognition in the form of citations drive researchers. Getting rich drives the founders.

There are all kinds of people. I think Schmidhuber's group might be happy to deliberately create an uncontrollable foom, if they can (they have Saudi funding, so I have no idea how much hardware do they actually have, and how much options for more hardware do they have contingent on preliminary results). Some other people just don't think their methods are strong enough to be that unsafe. Some people do care about safety (but still want to go ahead; some of those say "this is potentially risky, but in the future, and not right now", and they might be right or wrong). Some people feel their approach does increase safety (they might be right or wrong). A number of people are ideological (they feel that their preferred approach is not getting a fair shake from the research community, and they want to make a strong attempt to show that the community is wrong and myopic)...

I think most places tend to publish some of their results for the reasons you've stated, but they are also likely to hold some of stronger things back (at least, for a while); after all, if one is after VC funding, one needs to show those VCs that there is some secret sauce which remains proprietary...

[-]O O11

Unfortunately, I don't have enough inside access for that...


Yeah, with you there. I am just speculating based on what I've heard online and through the grapevine, so take my model of their internal workings with a grain of salt. With that said I feel pretty confident in it.

if one is after VC funding, one needs to show those VCs that there is some secret sauce which remains proprietary

IMO software/algorithmic moat is pretty impossible to keep. Researchers tend to be pretty smart, enough to figure it out independently, even if they manage to stop any researcher from leaving and diffusing knowledge. Some parallels:

  • The India trade done by Jane Street. They are were making billions of dollars contingent on the fact that no one else knows about this trade, but eventually their alpha also got diffused. 
  • TikTok's content algorithm which the Chinese government doesn't want to export only took a couple months for Meta/Google to replicate. 

if one is after VC funding, one needs to show those VCs that there is some secret sauce which remains proprietary

IMO software/algorithmic moat is pretty impossible to keep.

Indeed.

That is, unless the situation is highly non-stationary (that is, algorithms and methods are modified fast without stopping; of course, a foom would be one such situation, but I can imagine a more pedestrian "rapid fire" evolution of methods which goes at a good clip, but does not accelerate beyond reason).