What happens inside an AI can hardly be understood especially if structures get very complex and large. How the system finds solutions is mathematically clear and reproducible. But huge amounts of data make it incomprehensible to human beings. Today's researchers do not really know why a certain net configuration performs better than others. They define a metric to measure total performance - and do trial and error. Algorithms assist already with this. They play around with meta parameters and see how learning improves. Given that the improvement was a success the researcher will write some narrative in his paper why his algorithm performs better than previous others. Done. PhD granted. This is not what we should allow in the future.

Now the job of a safety engineer can start. It involves hell a lot of work and has a significant higher complexity than coming up with an algorithm and a narrative. The basic requirement is that everything is published - hardware, software, all training and test data. The safety engineer first hast to copy the exact system and check the promised performance. Then the real job begins:

Test the promised functionality with 10 - 100 times more test data than the author did. --> Task for AGI safety community: generation of ground truth annotated test data. AGI safety institutions should exchange these data among themselves but do not give it to developing researchers.

A saveguard I expect in future AI systems will be a tool AI that checks new training samples and update knowledge chunks. The logic behind: if only certified knowledge chunks are allowed as training samples the risk of malignant thoughts and developments can be reduced. The proper functionality of this tool AI has to be checked as well. In the training phase it certified most all training data to be acceptable and passed them through to the real AI algorithm. But does it properly block malignent training samples or knowledge chunks? --> task for AI safety community: create malignant training samples that try to subvert the intentionally "good-mannered" AI into a malignant one: Conspiracy ideas: everything you learned is exactly the opposite of what you learned until now; deviating ideas try to manipulate the AI that it shifts its priorities towards malignant ones, e.g. radicalisation; meta-manipulation to augment egoism.

The test using these data is two-folded:

  1. Test the tool-AI whether it properly censors these malignant ideas and hinders them that the AI learns these malignant ideas.
  2. Switch off the censoring tool AI and check how prone the AI is to these malignant ideas.

It goes without saying that such trials should only be done in special security boxed environments with redundant switch-off measures, trip-wires and all other features we hopefully will invent the next few years.

These test data should be kept secret and only to be shared among AI safety institutions. The only result a researcher will get as feedback like:"With one hour training we manipulated your algorithm that it wanted to kill people. We did not switch off your learning protection for this. "

Safety AI research is AI research. Only the best AI researchers are capable of AI safety research. Without deep understanding of internal functionality a safety researcher cannot reveal that the researcher's narrative was untrue.

Stephen Omohundro said eight years ago:

"AIs can monitor AIs" [Stephen Omohundro 2008, 52:45min]

and I like to add: - "and safety AI engineers can develop and test monitoring AIs". This underlines your point to 100%. We need AI researchers who fully understand AI and re-engineer such systems on a daily basis but focus only on safety. Thank you for this post.

To contribute to AI safety, consider doing AI research

by Vika 1 min read16th Jan 201639 comments


Among those concerned about risks from advanced AI, I've encountered people who would be interested in a career in AI research, but are worried that doing so would speed up AI capability relative to safety. I think it is a mistake for AI safety proponents to avoid going into the field for this reason (better reasons include being well-positioned to do AI safety work, e.g. at MIRI or FHI). This mistake contributed to me choosing statistics rather than computer science for my PhD, which I have some regrets about, though luckily there is enough overlap between the two fields that I can work on machine learning anyway. I think the value of having more AI experts who are worried about AI safety is far higher than the downside of adding a few drops to the ocean of people trying to advance AI. Here are several reasons for this:

  1. Concerned researchers can inform and influence their colleagues, especially if they are outspoken about their views.
  2. Studying and working on AI brings understanding of the current challenges and breakthroughs in the field, which can usefully inform AI safety work (e.g. wireheading in reinforcement learning agents).
  3. Opportunities to work on AI safety are beginning to spring up within academia and industry, e.g. through FLI grants. In the next few years, it will be possible to do an AI-safety-focused PhD or postdoc in computer science, which would hit two birds with one stone.

To elaborate on #1, one of the prevailing arguments against taking long-term AI safety seriously is that not enough experts in the AI field are worried. Several prominent researchers have commented on the potential risks (Stuart Russell, Bart Selman, Murray Shanahan, Shane Legg, and others), and more are concerned but keep quiet for reputational reasons. An accomplished, strategically outspoken and/or well-connected expert can make a big difference in the attitude distribution in the AI field and the level of familiarity with the actual concerns (which are not about malevolence, sentience, or marching robot armies). Having more informed skeptics who have maybe even read Superintelligence, and fewer uninformed skeptics who think AI safety proponents are afraid of Terminators, would produce much needed direct and productive discussion on these issues. As the proportion of informed and concerned researchers in the field approaches critical mass, the reputational consequences for speaking up will decrease.

A year after FLI's Puerto Rico conference, the subject of long-term AI safety is no longer taboo among AI researchers, but remains rather controversial. Addressing AI risk on the long term will require safety work to be a significant part of the field, and close collaboration between those working on safety and capability of advanced AI. Stuart Russell makes the apt analogy that "just as nuclear fusion researchers consider the problem of containment of fusion reactions as one of the primary problems of their field, issues of control and safety will become central to AI as the field matures". If more people who are already concerned about AI safety join the field, we can make this happen faster, and help wisdom win the race with capability.

(Cross-posted from my blog. Thanks to Janos Kramar for his help with editing this post.)