I keep trying to explain to people that my threat model where improbably-but-possible discontinously-effective algorithmic progress in AI allowing for >1.0 self-improvement cycles at current levels of compute is a big deal. Most people who take a disagreement stance to this argue that it seems quite unlikely that this happens soon. I feel like it's a helpful concept to be able to show them the Risk Assessment Matrix and point to the square corresponding to 'improbable / intolerable' and say, 'See? improbable doesn't mean safe!'.

I like to call accelerating change period where AI helps accelerate further AI but only via working with humans at a less-than-human-contribution level the 'zoom' in contrast to the super-exponential artificial-intelligence-independently-improving-itself 'foom'. Thus, the period we are currently is the 'zoom' period, and the oh-shit-we're-screwed-if-we-don't-have-AI-alignment period is the foom period. When I talk about the future critical juncture wherein we realize could initiate a foom at then-present technology, but we restrain ourselves because we know we haven't yet nailed AI alignment, the 'zoom-foom gap'. This gap could be as little as seconds while a usually-overconfident capabilities engineer pauses just for a moment with their finger over the enter key, or it could be as long as a couple of years while the new model repeatedly fails the safety evaluations in its secure box despite repeated attempts to align it and thus wisely doesn't get released. 'Extending the zoom-foom gap' is thus a key point of my argument for why we should build a model-architecture-agnostic secure evaluation box and then make it standard practice for all AI research groups to run this safety eval on their new models before deployment.

(note: this is more about buying ourselves time to do alignment research than directly solving the problem. Having more time during the critical period where we have a potential AGI to study but it isn't yet loose on the world causing havok seems like a big win for alignment research. )

Reply

[-]chrizbo3y10

Really enjoyed using these safety and risk visualizations in this context!

It was serendipitous when a safety blogger I follow posted this:

https://safetyrisk.net/the-risk-matrix-myth/

I'd recommend reading and watching a lot of his videos in reference to this.

The balance of new capability (aka innovation) vs. safety (reducing known risks) is an important dynamic in any team that builds. I haven't found a way to build perfect safety without severely limiting all new capabilities. The second best way is to have good ability to react when you find problems.

Reply

[-]Flaglandbase3y10

These models are very good for estimating external risks but there are also internal risks if it's possible to somehow provide enough processing power to make a super powerful AI, like it could torture internal simulations in order to understand emotions.

Reply

[+][comment deleted]3y61

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

36

What if we approach AI safety like a technical engineering safety problem

36

Ω 18

36

Ω 18

Introduction

Technical safety in a nutshell

Useful tools in technical safety

HazID

Risk Matrix

Bow-tie analysis

Swiss cheese model

Change Management

Hierarchy of Controls

Are they useful for AI safety?