"Safety as a Scientific Pursuit" (2024) — LessWrong

17

"Safety as a Scientific Pursuit" (2024)

by technicalities

23rd Jan 2024

3 min read

17

This is a linkpost for https://banburismus.substack.com/p/safety-as-a-scientific-pursuit

17

"Safety as a Scientific Pursuit" (2024)

New Comment

3 comments, sorted by

Click to highlight new comments since: Today at 4:59 PM

[-]Tom McGrath2y40

Very much appreciate the link post - I’d been trying to write a summary/contextualisation for LW and this is a much better one than I’d come up with.

I’d be very grateful for the LW community’s thoughts (especially any pushback). I expect this will be the source of the strongest counterarguments.

Assumptions are	Continuous	Discountinuous
Inductive^[1]	Most of ML	Not sure? Maybe Gould, physicists
Deductive	Christiano, Shulman, OpenPhil	Yudkowsky, most of MIRI

I like it better than "rationalist" and "empiricist" ↩︎

[-]Tom McGrath2y30

Thanks! I really like inductive vs deductive and would probably have used them if I’d thought of it.

More from technicalities

Curated and popular this week

Tom McGrath, until recently a Research Scientist at Deepmind, has written up why he's not excited about theoretical AI safety. It's similar to Aaronson and Barak's "reform" alignment. It's argued in good faith and pretty constructive.

The key provocation for you is probably his view on the safety implications of open-sourcing models.

The author knows the field has moved in this direction already, and is trying to establish common knowledge and more of that. He also knows that it's a sore point to imply that modern rationalists aren't empiricists. For your blood pressure I recommend that you mentally prepend "AI-" to every mention of rationalism in the post.

Because more empirically-minded people remain unconvinced of potential risks, or are unconvinced that these risks pose a currently-binding constraint they keep pushing forward (correctly, by their lights) on AI research they view as beneficial. This gets interpreted by the safety community as “not caring about safety".

An alternative explanation is that the safety community believes that their opponents are primarily ultra-short-term individualists (e.g. the guy quoted in the first paragraph of this essay by Michael Nielsen), idiots, or essentially eschatologically motivated like Rich Sutton and common ground is basically impossible to find. If this is the case then I'd appreciate having it confirmed! I don't think most people in AI who disagree with the safety community and/or don't support a pause fall into any of these camps...

My addition: take the Drexler grey goo story. It's usually told as an own (haha stupid pessimistic doomers) but I think it reflects very well on E.D:

1. We identify a possible problem
2. We raise the alarm and do more research
3. We get evidence and update

This seems like the optimal policy to me. McGrath is saying "you have to actually do (3)!"

it’s worth acknowledging that not only does rationalism have its place... you might believe that... more rationalism is still the correct way forward.

If you believe that there’s some important sharp discontinuity, and that all relevant risks are on the other side of that discontinuity, then empiricism will probably look like a waste of time. Examples of this might be fast takeoff or the sharp left turn. We don’t see these in current systems, of course, but the natural empiricist move is to ask what relevant analogues these might have in current systems, and what we might learn that transfers. For example, we could study discontinuities and emergent properties in current networks and learn all we can about them, or we could investigate how physically plausible fast takeoff is. Still, the fact that we think about them at all is something we can attribute to rationalist-style thinking.

Another time it would seem worth switching over to primarily rationalist-style approaches is if you believe that: dangerous systems are just around the corner; you know what the relevant problems are; and you have no idea of how to solve them (though you’ll clearly be better at convincing others if you can produce solid evidence of this). In this case it’s probably worth allocating more time and effort to rationalist-style methods. I think that Eliciting Latent Knowledge is a good example of working within this paradigm, and offers some chance of progress.

Finally, rationalist methods can offer the possibility of completely new approaches (although empirical work can also turn up many beneficial surprises).

perhaps the distinction between empirical and rational research is far more than a difference in intellectual toolsets that we can switch out at will. It may be something far more fundamental; a deep disposition that is a core part of our identity as individual scientists and people. Maybe I favour empirical work out of a lack of imagination, or as a crutch for a lack of theoretical brilliance? Conversely, rationalist preferences may play some similarly deep psychological role for some that hold them - I am loath to psychoanalyse here, however. Even if the empiricism/rationalism divide is deeper than I hope, at least we can facilitate dialogue between these two cultures. Moreover, this is much less of a binary distinction than I have drawn so far, and those in the middle can bridge the gap to produce what neither culture could have achieved alone.

See also another candidate explanation for intractable disagreement here.