AIs at the current capability level may be important for future safety work

The argument against R&D of contemporary systems because of future systems capabilities has always been shortsighted. Two examples of this are nuclear weapons controls and line level commenting of programming code.

In nuclear weapons development the safety advocates argued for safety measures on the weapons themselves to prevent misuse. They were overruled by arguments that future systems would be so physically secure that they couldn’t be stolen and the controls were centralized to the launch control process, usually with one person there having ultimate control. Proliferation and miniaturization eventually made theft and misuse a major risk and an entire specialized industry sprang up to develop and implement Permissive Action Links (PALs).

It wasn’t until the late nineteenth 1980s that the entire U.S. nuclear weapons inventory was equipped with PALs. Which is nuts. Even then, malfunction concerns were paramount, and they continue to be concerns, creating real risks in deterrence based defenses. Even today PALs are still being upgraded for reliability. It was the fact that PALs had to be retrofitted that was responsible for the 70 year timeframe for implementation of devices with questionable reliability. Safety R&D as a primary R&D focus during the pioneering era of nuclear weapons could have prevented that mess. Trying to shoehorn it in later was unbelievably expensive, difficult, and created strategic threats.

The finance and defense sectors have a similar problem with software. High turnover iterations led to a culture that eschewed internal comments in the code. Now making changes creates safety concerns that are only overcome with fantastically complex and expensive testing regimes that take so long to complete the problem they are trying to solve has been fixed another way. It’s phenomenally wasteful. Again, future safety was ignored because if arguments based in future capabilities.

Most importantly however, is that R&D of contemporary systems is how we learn to study future systems. Even though future generations of the technology will be vastly different, they are still based, ultimately, on today’s systems. By considering future safety as a contemporary core competency the way is being paved for the study and innovation of future systems and preventing waste from adhoc shoehorning exercises in future.

In some cases, we might care about extrapolation for its own sake rather than to save compute. E.g., because we can't measure something directly on a more powerful model for whatever reasons or because we think that observing how something varies with capabilities/scale is generally informative about what is going on. This seemed like a less interesting application of weaker models with not many implications, so I decided to not discuss it in the main text. ↩︎

LESSWRONG
LW

LESSWRONG
LW

82

AIs at the current capability level may be important for future safety work

82

Ω 39

82

Ω 39