Remmelt

Research coordinator of Stop/Pause area at AI Safety Camp.

See explainer on why AGI could not be controlled enough to stay safe:
lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable

 

Sequences

Bias in Evaluating AGI X-Risks
Developments toward Uncontrollable AI
Why Not Try Build Safe AGI?

Comments

Sorted by

Ah, thank you for correcting. I didn’t realise it could be easily interpreted that way. 

Also suggest exploring what it may means we are unable to be able to solve the alignment problem for fully autonomous learning machinery.

There will be a [new AI Safety Camp project](https://docs.google.com/document/d/198HoQA600pttXZA8Awo7IQmYHpyHLT49U-pDHbH3LVI/edit) about formalising a model of AGI uncontainability. 

Fixed it!  You can use either link now to share with your friends.

Remmelt20

To clarify for future reference, I do think it’s likely (80%+) that at some point over the next 5 years there will be a large reduction in investment in AI and a corresponding market crash in AI company stocks, etc, and that both will continue to be for at least three months.

Ie. I think we are heading for an AI winter. It is not sustainable for the industry to invest 600+ billion dollars per year in infrastructure and teams in return for relatively little revenue and no resulting profit for major AI labs.

At the same time, I think that within the next 20 years tech companies could both develop robotics that self-navigate multiple domains and have automated major sectors of physical work. That would put society on a path to causing total extinction of current life on Earth. We should do everything we can to prevent it.

Remmelt20

Not necessarily :)

Quite likely OpenAI and/or Anthropic continue to exist but their management would have to overhaul the business (no more freebies?) to curb the rate at which they are burning cash. Their attention would be turned inwards.

In that period, there could be more space for people to step in and advise stronger regulation of AI models. Eg. to enforce liability, privacy, and copyright

Or maybe other opportunities open up. Curious if anyone has any ideas.

Remmelt10

What's a good overview of those grounded arguments?

 

Thanks, appreciating your question. The best overview I managed to write was the control problem post.  Still takes quite some reading through to put the different parts of the argument together though.

Remmelt10

The report is focussed on preventing harms of technology to people using or affected by that tech.

It uses FDA’s mandate of premarket approval and other processes as examples of what could be used for AI.

Restrictions to economic productivity and innovation is a fair point of discussion. I have my own views on this – generally I think the negative assymetry around new scalable products being able to do massive harm gets neglected by the market. I’m glad the FDA exists to counteract that.

The FDA’s slow response to ramping up COVID vaccines during the pandemic is questionable though, as one example. Getting a sense there is a lot of problems with bureacracy and also industrial capture with FDA.

The report does not focus on that though.

Remmelt30

Curious about the 'delay the development' via regulation bit.

What is your sense of what near-term passable regulations would be that are actually enforceable? It's been difficult for large stakeholder groups facing threatening situations to even enforce established international treaties, such as the Geneva convention or the Berne three-step test.

Here are dimensions I've been thinking need to be constrained over time:

  • Input bandwidth to models (ie. available training and run-time data, including from sensors).
  • Multi-domain work by/through models (ie. preventing an automation race-to-the-bottom)
  • Output bandwidth (incl. by having premarket approval for allowable safety-tested uses as happens in other industries).
  • Compute bandwidth (through caps/embargos put on already resource-intensive supply chains).



(I'll skip the 'make humans smarter' part, which I worry increases problems around techno-solutionist initiatives we've seen).

Remmelt30

Appreciating your thoughtful comment.  

It's hard to pin down ambiguity around how much alignment "techniques" make models more "usable", and how much that in turn enables more "scaling". This and the safety-washing concern gets us into messy considerations. Though I generally agree that participants of MATS or AISC programs can cause much less harm through either than researchers working directly on aligning eg. OpenAI's models for release. 

Our crux though is about the extent of progress that can be made – on engineering fully autonomous machinery to control* their own effects in line with continued human safety. I agree with you that such a system can be engineered to start off performing more** of the tasks we want it to complete (ie. progress on alignment is possible). At the same time, there are fundamental limits to controllability (ie. progress on alignment is capped). 

This is where I think we need more discussion:

  • Is the extent of AGI control possible at least more than the extent of control needed 
    (to prevent eventual convergence on causing human extinction)?



* I use the term "control" in the established control theory sense, consistent with Yampolskiy's definition. Just to avoid confusing people, as the term gets used in more specialised ways in the alignment community (eg. in conversations about the shut-down problem or control agenda).
** This is a rough way of stating it. It's also about the machinery performing fewer of the tasks we wouldn't want the system to complete. And the relevant measure is not as much about the number of preferred tasks performed, as the preferred consequences. Finally, this raises a question about who the 'we' is who can express preferences that the system is to act in line with, and whether coherent alignment with different persons' preferences expressed from within different perceived contexts is even a sound concept. 

Load More