Safe AIs through engineering principles

by Gerald Monroe1 min read20th Jan 20184 comments

1

Personal Blog

When nuclear engineers design reactors, they go to elaborate lengths to make sure that under no circumstances (even a meltdown) is enough fissionable material able to accumulate to get a nuclear explosion. You could in fact design a nuclear reactor to be always on the edge of a criticality incident. It would make the reactor much lighter and simpler and higher performance.

When demolition workers place the charges, they are very careful to keep the wiring insulated and to make sure that the initiator is securely locked up until everyone is at a safe distance.

When you install electrical wiring, you always need to have fuses or breakers with trip points lower than what the wiring can handle, and you isolate circuits and make sure under the worst case scenario the insulation won't melt - then add a safety margin on that.

If we ever have the technical means to develop nanotechnology that can self replicate, as long as we store the blueprint data files in macroscale computers and do not give any "nanorobots" we build enough internal memory to hold their own blueprints, we can rest assured that they never can replicate out of our control.

AIs can be made the same way, with similar weak links and deliberate bottlenecks installed in their cognitive paths. First of all, you don't need executive decision making functionality to make an AI to automate most tasks. In most cases, the core of the intelligence can be a small section of code where possible actions are evaluated versus's the agents terminal values, and then the min() or max() from that action array is picked and sent to the output.

Today, that's going to just be some ARM system on a chip running Linux or QNX, and it'll just be a Python script, optimized to a faster language if the project has the budget. It's slow as a snail. The system can't evolve out of control, it's ultimately just a loop picking the lowest/highest number from a list.

There may be methods where the subsystems of this AI are designed by other AIs, but they are still designed to optimize for specific parameters picked by humans, and the outputs of those subsystems are still run through simple optimizers. Similarly, the AI doing the subsystem design is just another idiot savant. It just generates candidate neural architectures, scores them through some numerical metric, and picks the best one.

Recent evidence seems to say that this is all you need to easily hit superhuman performance in individual tasks. This type of agent won't be asking to escape a box as it doesn't have any awareness or motivation or ability to speak. It would just be a script tirelessly trying to generate waldo actions that push the predicted state of the world closer to it's creator's desires.

It seems to me that the practical way to build safe AIs is to build AIs so limited that they are inherently safe, in the same way you don't optimize nuclear reactor designs too far.

1

4 comments, sorted by Highlighting new comments since Today at 9:57 PM
New Comment

The trouble is that while such a limited AI would be safer, but it would also be proportionately less useful. There are two important reasons to want a safe, powerful AI much more than a safe, not-so-powerful AI:

  • Such an AI will help us stop other people from building a non-safe, highly-competent AI and releasing it into the wild.
  • AI that shares human values would help us cure disease, save the environment, explore the stars, etc, but only if it is highly competent.

You could do all those things with more limited agents, it would just take longer and be less efficient.

Sure - we could do all these things with slide rules and scratch paper, given enough time and resources. But more powerful technology serves as an important multiplier in making things happen.

And air cooled fission cores have amazing simplicity and power density.

What bugs me about the concept of a "seed AI" that basically rebuilds itself incrementally is I don't see a whole lot of difference than basically rigging a nuclear reactor to blow by using too much highly enriched uranium too close together. Or building an electrical panel where the wiring is all intermixed and you've got bricks of explosives put in there.

If you don't have a clean, rational design for an AI, with a clear purpose and clear definition of success/failure for each subsystem, you've done a bad job at engineering one. We absolutely could develop AIs that will automate most menial tasks because they are well defined. We could develop one that would act like a force mulitplier for existing engineers, allowing the engineer to specify the optimization parameters and the AI produces candidate designs it thinks based on simulations and past experience will optimally meet those parameters.

Even more difficult things like treatments for aging and nanomachinery design and production could be solved with limited function, specialized agents acting like a force multiplier. Hardly the same thing as going back to paper and slide rules.