danm — LessWrong

It might be productive to hash out a difference in thinking here as well. Perhaps a crux is: I (a safety researcher at OpenAI) think question (a) “is this path to RSI on track to be safe, based on what we see so far?” is likely amenable to empirical study. I think the best way to gather evidence on question (b) “is there any version of RSI that would be safe?” is to gather evidence on (a) for some particular approach.

One could measure and attempt to fix safety issues along the path to RSI (an example of what OpenAI calls “iterative deployment”, applied to in this case an internal deployment). If attempted fixes didn’t seem to be working, one could continuously gather and show evidence to that effect to present the most compelling case. This has more risk surface than determining whether the endeavor is safe ahead of time, but seems more tractable and more to OpenAI safety researchers' comparative advantage.

Curious @habryka if there is a different approach to question (b) that you think safety researchers at OpenAI are well positioned to pursue instead or in parallel.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments