Among those concerned about risks from advanced AI, I've encountered people who would be interested in a career in AI research, but are worried that doing so would speed up AI capability relative to safety. I think it is a mistake for AI safety proponents to avoid going into the field for this reason (better reasons include being well-positioned to do AI safety work, e.g. at MIRI or FHI). This mistake contributed to me choosing statistics rather than computer science for my PhD, which I have some regrets about, though luckily there is enough overlap between the two fields that I can work on machine learning anyway. I think the value of having more AI experts who are worried about AI safety is far higher than the downside of adding a few drops to the ocean of people trying to advance AI. Here are several reasons for this:

  1. Concerned researchers can inform and influence their colleagues, especially if they are outspoken about their views.
  2. Studying and working on AI brings understanding of the current challenges and breakthroughs in the field, which can usefully inform AI safety work (e.g. wireheading in reinforcement learning agents).
  3. Opportunities to work on AI safety are beginning to spring up within academia and industry, e.g. through FLI grants. In the next few years, it will be possible to do an AI-safety-focused PhD or postdoc in computer science, which would hit two birds with one stone.

To elaborate on #1, one of the prevailing arguments against taking long-term AI safety seriously is that not enough experts in the AI field are worried. Several prominent researchers have commented on the potential risks (Stuart Russell, Bart Selman, Murray Shanahan, Shane Legg, and others), and more are concerned but keep quiet for reputational reasons. An accomplished, strategically outspoken and/or well-connected expert can make a big difference in the attitude distribution in the AI field and the level of familiarity with the actual concerns (which are not about malevolence, sentience, or marching robot armies). Having more informed skeptics who have maybe even read Superintelligence, and fewer uninformed skeptics who think AI safety proponents are afraid of Terminators, would produce much needed direct and productive discussion on these issues. As the proportion of informed and concerned researchers in the field approaches critical mass, the reputational consequences for speaking up will decrease.

A year after FLI's Puerto Rico conference, the subject of long-term AI safety is no longer taboo among AI researchers, but remains rather controversial. Addressing AI risk on the long term will require safety work to be a significant part of the field, and close collaboration between those working on safety and capability of advanced AI. Stuart Russell makes the apt analogy that "just as nuclear fusion researchers consider the problem of containment of fusion reactions as one of the primary problems of their field, issues of control and safety will become central to AI as the field matures". If more people who are already concerned about AI safety join the field, we can make this happen faster, and help wisdom win the race with capability.

(Cross-posted from my blog. Thanks to Janos Kramar for his help with editing this post.)

New Comment
39 comments, sorted by Click to highlight new comments since: Today at 5:44 PM

Another side benefit: If you actually work in AI research you'll learn the associated shibboleths and thus be able to make convincing arguments to others in the field.

People working in AI nuts-and-bolts research quickly get sick to death of poorly informed arguments from first year philosophy undergrads and you only need one or 2 keywords that associate your argument with that kind of crap to make them dismiss it all with some variant on "oh, not this shit again".

Another side benefit: If you actually work in AI research you'll learn the associated shibboleths and thus be able to make convincing arguments to others in the field.

Curious if this is actually true. Intuitively I feel the opposite is more likely: people immersed in profession x might try to convince others at first, but gradually decline in their efforts and eventually stop and mostly focus their efforts inward because the returns aren't as good as they originally thought.

More intuition, I think there is some binary "willing to listen" switch out there. Because assuming the slippery slope (not sure if that's the right way to describe it) in the previous paragraph actually occurs the wisest move would be to share your knowledge with people who are actually willing to listen. (It's tempting to mention charisma here, but I'd rather keep it simple for now)

The recent advances of deep learning projects combined with easy access to mighty tools like Torch or TensorFlow might trigger a different way: Start-ups will strive for some low-hanging fruits. Who is fastest gets all of the cake. Who is second has lost. The result of this were on display on CES: IoT systems full of security holes were pushed into the market. Luckily AI hardware/software is not yet capable to create an existential risk. Imagine you research as team member on a project that turns out to make your bosses billionairs... how are your chances being heard when you come up with your risk assessment: Boss, we need 6 months extra to design safeguards...

Hi Victoria,

I am highly interested in AI safety research. Unfortunately, I do not have a strong math background and I live in an area distant from AI research. After spending some time thinking about my future I have come to the decision to go for a math intensive PhD in some area not far from MIRI or FLI. I have only the bachelor degree in Engineering with major in Computer Science and Software Engineering. Currently, I spend most of my time working full time as a software developer, preparing for a GRE general exam and thinking about PhD and FAI.

Andrew Critch from MIRI and Berkeley is very enthusiastic about pursuing the PhD. He suggested the Statistics. I would be glad to know your opinions about PhD/AI & FAI research. Here is a list of some questions, which are bothering me.

  • What do you think would be more relevant for AI safety research - CS, Statistics or something else?
  • What areas of research are the most promising for AI safety, in your opinion?
  • Is it better to pick the research area close to what MIRI working on, or a more general AI research one (such as a ML).
  • Is it possible to increase the chances of successful admission by gaining some research experience before the admissions in this year? Or is it better to spend the time in some other way?
  • Does the Math GRE subject test increase the chance of admission?

I would recommend doing a CS PhD and take statistics courses, rather than doing a statistics PhD.

For examples of promising research areas, I recommend taking a look at the work of FLI grantees. I'm personally working on the interpretability of neural nets, which seems important if they become a component of advanced AI. There's not that much overlap between MIRI's work and mainstream CS, so I'd recommend a more broad focus.

Research experience is always helpful, though it's harder to get if you are working full time in industry. If your company has any machine learning research projects, you could try to get involved in those. Taking machine learning / stats courses and doing well in them is also helpful for admission. Math GRE subject test probably helps (not sure how much) if you have a really good score.

Why is regulation ungood? I want to understand the thoughts of other LWers why regulation is not wanted. Safe algorithms can only be evaluated if they are fully disclosed. There are many arguments against regulation - I know:

  • Nobody wants to disclose algorithms and test data.
  • Nobody wants projects being delayed.
  • Nobody wants to pay extra costs for external independent safety certifcation.
  • Developers do not want to "waste" their time with unproductive side issues.
  • Nobody wants to lose against a non-regulated competitor.
  • Safety concepts are complicated to understand and complex to implement.
  • Safety consumes performance at extra costs.

BUT: We ALL are facing an existential risk! Once algorithms manage to influence political decision making we do not even have the chance to lay down such regulations in law. We have to prepare the regulatory field by now! We should start this by starting a public debate. Like Nick Bostrum, Stephen Hawking, Elon Musk and many others already did. Today only a few ppm of the population know about these issues. And even top researchers are unaware of. At least a lecture on AI safety issues should become compulsory for IT, engineering, mathematics and physics students all over in the world.

In biotechnology Europe and especially Germany imposed strict regulations. The result was that even German companies joined or created subsidiary research companies in the US or UK, where regulations are minimal. This is no prototype solution for the Control Problem.

Local separation might work for GMOs - for AGI definitively not. AGI will be a game changer. Who is second has lost. If the US and EU would impose AI regulations and China and Israel not - where would the game winner come from? We have to face the full complexity of our world, dominated by multinational companies and their agendas. We should prepare a way how effective regulation can be made effective and acceptable for 192 countries and millions of companies. The only binding force among us all is the existential risk. There are viable methods to make regulation work: Silicon chip manufacturing luckily needs fabs that cost billions of dollars. It is a centralised point where regulation could be made effective. We could push hardware tripwires and enforce the use of certificated AI safeguard tools that interact compulsory with this special hardware. We can do it similarly like the content industry that pushed hardware manufactures to implement DRM hard- and software.

The trouble is: Nobody to this point has a clear idea how a globally acceptable regulation could look like; could work technically; could be made effective and could be monitored.

To lay out a framework how global regulation could be designed is to me one core element of AI safety engineering. The challenge is to find a high level of abstraction to include all thinkable developments. A body of AI safety engineers should derive from this detailed regulations that can be applied by AI developers, testers and AI safety Institutions.

The TÜV "Technischer Überwachungs-Verein" was founded in Germany after several incidents of exploded steam engine boilers with severe casualties. On the background of newspaper articles about these accidents and public pressure the manufacturers of boilers accepted the enforcement of technical steam boiler regulations and time and money consuming test procedures.

We cannot try out two or three Singularities and then change our mind on regulation.

As there are so many reasons why nobody in the development process wants regulation the only way is to enforce it trough a political process. To start this we need professionals with AI experience.

Meta: Whenever I ask for regulation I got downvoted. Therefore i disconneced this point from my previous one. Please downvote only including comment.

Why is regulation ungood?

Because all regulation does is redistribute power between fallible humans.

We should prepare a way how effective regulation can be made effective and acceptable for 192 countries and millions of companies.

Who is that "we"?

We can do it similarly like the content industry that pushed hardware manufactures to implement DRM hard- and software.

LOL. So, do you think I have problems finding torrents of movies to watch?

the only way is to enforce it trough a political process. To start this we need professionals with AI experience.

Why would the politicians need AI professionals when they'll just hijack the process for their own political ends?

Why is regulation ungood?

Because all regulation does is redistribute power between fallible humans.

I am missing a step in your argument. Why is redistributing power between fallible humans ungood? I mean, surely some humans are more fallible than others, some have more information than others, some have incentives to be fallible in particularly harmful ways, etc.

(I am not arguing in favour of any particular bit of regulation; I just don't see that "regulation is bad because it just redistributes things between fallible humans" makes any more sense than "trade is bad because it just redistributes things between fallible humans".)

I am not making an argument about regulation in general here. I'm talking within the context of TRIZ-Ingenieur's comment and the first point is that regulation will not save him from the dangers that he's afraid of.

I don't see how it works even in this specific context. TRIZ-Ingenieur hopes that regulation of AI research could reduce dangerous AI research. All regulation of dangerous things simply redistributes power between fallible humans; that's no truer of AI research regulation than it is of any other regulation.

(It may be that there are special reasons why AI research is particularly unsuited for attempts to make it safer by regulation, but you didn't mention any or even allude to any.)

The implication I'm reading in TRIZ-Ingenieur's words is that humans are weak, fallible, corruptible -- but a regulatory body is not. To quote him,

The regulatory body takes power away from the fallible human

This is a common fallacy where some body (organization, committee, council, etc.) is considered to be immune to human weaknesses as if it were composed of selfless enlightened philosopher-kings.

Essentially, the argument here is that mere humans can't be trusted with AI development. Without opining on the truth of the subject claim, my point is that if they can't, having a regulatory body won't help.

My idea of a regulatory body is not that of a powerful institution that it deeply interacts with all ongoing projects because of the known fallible members who could misuse their power.

My idea of a regulatory body could be more that of a TÜV interconnected with institutions who do AI safety research and develop safety standards, test methods and test data. Going back to the TÜVs foundation task: pressure vessel certification. Any qualified test institution in the world can check if it is safe to use a given pressure vessel based on established design tests, safety measures checks, material testing methods and real pressure check tests. The amount of safety measures, tests and certification effort depends on the danger potential (pressure, volume, temperature, medium). Standards define based on danger potential and application which of the following safety measures must be used: safety valve; rupture disk; pressure limiter, temperature limiter, liquid indicator, overfill protection; vacuum breakers; reaction blocker; water sprinkling devices.

Nick Bostrum named following AI safety measures: boxing methods, incentive methods, stunting and tripwires. Pressure vessels and AI have following common elements (AI related argument plausible, but no experience exists):

  • Human casualties are result of a bursting vessel or AI turning evil.
  • Good design, tests and safety measures reduce risk of failing.
  • Humans want to use both.

Companies, institutions and legislation had 110 years of development and improvement of standards for pressure vessels. With AI we are still scratching on the surface. AI and pressure vessels have following differences:

  • Early designs of pressure vessels were prone to burst - AI is stil far away from high risk level.
  • Many bursting vessel events successively stimulated improvement of standards - With AI the first singularity will be the only one.
  • Safety measures of pressure vessels are easily comprehensible - Easy AI safety measures reduce its functionality to a high degree, complex safety measures allow full functionality but are complex to implement, complex to test and to standardize.
  • The risk of a bursting pressure vessel is obvious - the risk of an evil Singularity is opaque and diffuse.
  • Safety measure research for pressure vessels is straight forward following physical laws - safety research for AI is a multifaceted cloud of concepts.
  • A bursting pressure vessel may kill a few dozen people - an evil Singularity might eradicate humankind.

Given the existential risk of AI I think most AI research institutions could agree on a code of conduct that would include e.g.

  • AIs will be classified in danger classes. The rating depends on computational power, taught knowledge areas, degree of self-optimization capacity. An AI with programming and hacking abilities will be classified as high risk application even if it is running on moderate hardware because of its intrinsic capabilities to escape into the cloud.
  • The amount of necessary safety measures depends on this risk rating:
    • Low risk applications have to be firewalled against acquisition of computing power in other computers.
    • Medium risk applications must additionally have internal safety measures e.g. stunting or tripwires.
    • High risk applications in addition must be monitored internally and externally by independently developed tool AIs.
  • Design and safeguard measures of medium and high risk applications will be independently checked and pentested by independent safety institutions.

In a first step safety AI research institutes develop monitoring AIs, tool AIs, pentesting datasets and finally guidelines like the one above.

In a second step public financed AI projects have to follow these guidelines. This applies to university projects in particular.

Public pressure and stockholders could push companies to apply these guidelines. Maybe an ISO certificate can indicate to the public: "All AI projects of this company follow the ISO Standard for AI risk assessment and safeguard measures"

The public opinion and companies hopefully will push governments to enforce these guidelines as well within their intelligence agencies. A treaty in the mind of the Non-Proliferation Treaty could be signed. All signing states ensure to obey the ISO Standard on AI within their institutions.

I accept that there are many IFs and obstacles on that path. But it is at least an IDEA how civil society can push AI developers to implement safeguards into their designs.

How many researchers join the AI field will only marginally change the acceleration of computing power. If only a few people work on AI they have enough to do to grab all the low-hanging fruit. If many join AI research more meta research and safety research will be possible. If only a fraction of this depicted path will turn into reality it will give jobs to some hundred researchers.

I agree that if TRIZ-Ingenieur thinks regulatory bodies are strong, infallible, and incorruptible, then he is wrong. I don't see any particular reason to think he thinks that, though. It may in fact suffice for regulatory bodies' weaknesses, errors and corruptions to be different from those of the individual humans being regulated, which they often are.

(I do not get the impression that T-I thinks "mere humans can't be trusted with AI development" in any useful sense[1].)

[1] Example of a not-so-useful sense: it is probably true that mere humans can't with 100% confidence of safety be trusted with AI development, or with anything else, and indeed the same will be true of regulatory bodies. But this doesn't yield a useful argument against AI development for anyone who cares about averages and probabilities rather than only about the very worst case.

"Why is redistributing power between fallible humans ungood? I mean, surely some humans are more fallible than others, some have more information than others, some have incentives to be fallible in particularly harmful ways, etc."

This is what Stalin said as well.

[This comment is no longer endorsed by its author]Reply

Reversed stupidity is not intelligence.

(Perhaps that's actually your point and you're not agreeing with Lumifer but suggesting that he dislikes regulation only because he associates it with the USSR, or something. In that case, I think you're being unfair; he's smarter than that.)

The system you described requires someone to be on top.

For a more elaborate response, see Animal Farm. :)

I didn't describe a system.

What you're saying seems to be a fully general counterargument against all forms of government. There's nothing necessarily wrong with that -- anarchism is a real thing, after all -- but you're not going to be taken seriously if you suggest that anything other than anarchism must be wrong because Stalin ran a government.

Because all regulation does is redistribute power between fallible humans.

Yes. The regulatory body takes power away from the fallible human. If this human teams up with his evil AI he will become master of the universe. Above all of us including you. The redistribution will take power from to the synergetic entity of human and AI and all human beings on earth will gain power except the few ones entangled with that AI.

Who is that "we"?

Citizens concerned about possible negative outcomes of Singularity. Today this "we" is only a small community. In a few years this "we" will include most of the educated population of earth. As soon as a wider public is aware of the existential risks the pressure to create regulatory safeguards will rise.

LOL. So, do you think I have problems finding torrents of movies to watch?

DRM is easy to circumvent because it is not intrinsically part of the content but an unnecessary encryption. A single legal decryption can create a freely distributable copy. With computing power this could be designed differently, especially when specially designed chips will be used. Although GPUs are quite good for current deep learning algorithms there will be a major speed-up as soon as hardware becomes available that embeds these deep learning network architectures. The vital backpropagation steps required for learning could be made conditional on a hardware based enabling scheme that is under control of a tool AI that monitors all learning behaviour. For sure you could create FPGA alternatives - but these workarounds will come with significant losses in performance.

Why would the politicians need AI professionals when they'll just hijack the process for their own political ends?

No - my writing was obviously unclear. We (the above mentioned "we") need AI professionals to develop concepts how a regulatory process could be designed. Politicians are typically opportunistic, uninformed and greedy for power. When nothing can be done they do nothing. Therefore "we" should develop concepts of what can be done. If our politicians get intensively pushed by public pressure we maybe can hijack them to push regulation.

Today the situation is like this: Google, Facebook, Amazon, Baidu, NSA and some other players are in a good starting position to "win" Singularity. They will suppress any regulatory move because they could lose the lead. Once any of these players reaches Singularity he has in an instant the best hardware+the best software + the best regulatory ideas + the best regulatory stunting solutions - to remain solely on top and block all others. Then all of the sudden "everybody" = "we" are manipulated to want regulation. This will be especially effective if the superintelligent AI manages to disguise its capabilities and let the world think it had managed regulation. In this case not "we" have manged regulation, but the unbound and uncontrollable master-of-the-universe-AI.

The regulatory body takes power away from the fallible human.

The "regulatory body" is the same fallible humans. Plus power corrupts.

If this human teams up with his evil AI

Why wouldn't a "regulatory body" team up with an evil AI? Just to maintain the order, you understand...

In a few years this "we" will include most of the educated population of earth.

Colour me sceptical. In fact, I'll just call this hopeful idiocy.

how a regulatory process could be designed

In the real world? Do tell.

Do you have any idea how to make development teams invest substantial parts in safety measures?

To start with you need some sort of a general agreement about what "safety measures" are, and that should properly start with threat analysis.

Let me point out that the Skynet/FOOM theory isn't terribly popular in the wide world out there (outside of Hollywood).

So the AI turns its attention to examining certain blobs of binary code - code composing operating systems, or routers, or DNS services - and then takes over all the poorly defended computers on the Internet. [AI Foom Debate, Eliezer Yudkowski]

Capturing resource bonanzas might be enough to make AI go FOOM. It is even more effective if the bonanza is not only a dumb computing resource but offers useful data, knowledge and AI capabilities.

Therefore attackers (humans, AI-assisted humans, AIs) may want:

  • overtake control to use existing capabilities
  • extract capabilities to augment own capabilities
  • overtake resources for other uses
  • disguise resource owners and admins

Attack principles

  • Resource attack (hardware, firmware, operating system, firewall) or indirect spear attack on the admin or offering of cheap or free resources for AI execution on attacker's hardware followed by a direct system attack (copy/modify/replace existing algorithms)

  • Mental trojan horse attack: hack communication if not accessible and try to alter the ethical bias from friendly AI that is happy being boxed/stunted/monitored to an evil AI that wants to break out. Teach the AI how to open the door from inside and the attacker can walk in.

  • Manipulate owner attack: Make the owner or admin greedy to improve its AI's capabilities. Admins install malignant knowledge chunks or train subvertable malicious training samples. Trojan horse is saddled.

Possible Safeguard Concepts:

To make resource attacks improbable existing networking communication channels must be replaced with something intrinsically safe. Our brain is air-gapped and there is hardly any direct access to its neural network. Via five perceptive senses (hearing, sight, touch, smell and taste) it can receive input. With gestures, speach, smell, writing, shaping and arbitrarily manipulation using tools it can communicate to the outside world. All channels except for vision have a quite low bandwidth.

This analogon could shape a possible safeguard concept for AIs: make the internal AIs network inaccessible to user and admin. If even the admin cannot access it, the attacker cannot either. As soon as we jump from GPU computing to special featured hardware we can implement this. Hardware fuses on the chip can disable functionalities same as on todays CPUs debugging features are deactivated in chips for the market. Chips could combine fixed values and unalterable memories and free sections with learning allowed. Highest security is possible with base values and drives in fixed conscience-ROM structures.

Safeguards against malicious training samples will be more complex. To identify hidden malicious aspects of communication or learning samples is a task for an AI in itself. I see this as a core task for AI safety research.

An event with a duration of one minute can traumatize a human for an entire life. Humans can lose interest in anything they loved to do before and let them drop into suicidal depression. Same could happen to an AI. It could be that a traumatizing event could trigger a revenge drive that takes over all other aims of the utility function. Given the situation an AI is in love with her master and another AI kills her master while the AI is witnessing this. Given the situation that the adversary AI is not a simple one but a Hydra with many active copies. To eradicate this mighty adversary a lot of resources are needed. The revenge seeking AI will prepare its troops by conquering as many systems as possible. The less safe our systems are the faster such an evil AI can grow.

Safe design could include careful use of impulsive revenge drives with hard wired self-regulatory counter controlling measures e.g. distraction or forgetting.

Safe designs should filter out possible traumaticizing inputs. This will reduce the functionality a bit but the safety tradeoff will be worth it. The filtering could be implemented in a soft manner like a mother explaining the death of the loved dog to the child in warm words with positive perspectives.

Upvoted to encouraging people to get hands-on. Learning is good. Trying to go for a higehr level of understanding in whatever you do is a core rationality skill.

Sadly you stopped there though. For the sake of discussion, I've heard Artificial Intelligence: A Modern Approach is a good book on the subject. Hopefully a discussion could start here; perhaps there's something flawed, or perhaps the book is outdated. If anyone here, and I'm looking at you, the AI, AGI, FAI, IDK and other acronym-users whom I can't keep up with can provide some more directions for the potentially aspiring AI researchers lurking around, it would be very appreciated.

Well, there's this ...

[ETA: link is to MIRI's research guide, some traditional AI but more mathy/philosophical. Proceed with caution.]


What does that have to do with artificial intelligence?

... quite a lot, no?

Those links are specific to MIRI's rather idiosyncratic philosophy/math oriented research agenda. If you actually read all those books, you're pretty much committing to knowing very little about practical AI and machine learning, simply by virtue of time opportunity cost.


There's only two items on that list that are artificial intelligence related. One is an introductory survey textbook, and the other is really about probabilistic reasoning with some examples geared towards AI. The rest has about as much to do with AI as, say, the C++ programming manual.

Your definition what counts as "AI related" seems to be narrower than mine, but fine. I trust readers can judge whether the linked resources are of interest.

Assuming you have some exposure to linear algebra, calculus, and a little programming, I recommend Andrew Ng's machine learning course on youtube. AI: A Modern Approach is still a good textbook, but I think machine learning specifically is where interesting stuff is happening right now.

There is also an argument for doing stuff that's less in vogue right now.

Sure... but machine learning is very important for AGI, it's not going to suddenly get replaced with hand-designed agents. This advice might apply better to subfields, like deep neural networks vs. hierarchical Bayesian models.

There are a lot of good online resources on deep learning specifically, including,, etc. As a more general ML textbook, Pattern Recognition & Machine Learning does a good job. I second the recommendation for Andrew Ng's course as well.

I recommend against starting with deep learning.

reason? (I intuitively agree with you, just curious)

Here is one reason, but it's up for debate:

Deep learning courses rush through logistic regression and usually just mention SVMs. Arguably it's important for understanding deep learning to take the time to really, deeply understand how these linear models work, both theoretically and practically, both on synthetic data and on high dimensional real life data.

More generally, there are a lot of machine learning concepts that deep learning courses don't have enough time to introduce properly, so they just mention them, and you might get a mistaken impression about their relative importance.

Another related thing: right now machine learning competitions are dominated by gradient boosting. Deep learning, not really. This says nothing about starting with deep learning or not, but a good argument against stopping at deep learning.

It depends on the competitions. All kaggle image-related competitions I have seen have been obliterated by deep neural networks.

I am a researcher, albeit a freshman one, and I completely disagree. Knowing about linear and logistic regressions is interesting because neural networks evolved from there, but it's something you can watch a couple of videos on, maybe another one about maximum likelihood and you are done. Not sure why SVMs are that important.

Since the topic has been around for quite a while, and many SF writers have used it and examined it from different angles, I also wonder if they've looked at a lot of the SF ideations of AI, just to see what some possible outsomes might be and try to think of ways to avoid negative reactions. Of course, its impossible to figure out every possibility, and since its not entirely certain AI will act/react/think like humans do simply because humans built it (as long as it isn't a digitalized human mind, which probaly would minus chemical influences) we might not be ale to see and comprehend the logic jumps an AI might make, it would still give them a better understanding of possibilities, i would think. its fictional work, sure, and bears no basis in history, but then AI itself hasn't either, its new ground.

What happens inside an AI can hardly be understood especially if structures get very complex and large. How the system finds solutions is mathematically clear and reproducible. But huge amounts of data make it incomprehensible to human beings. Today's researchers do not really know why a certain net configuration performs better than others. They define a metric to measure total performance - and do trial and error. Algorithms assist already with this. They play around with meta parameters and see how learning improves. Given that the improvement was a success the researcher will write some narrative in his paper why his algorithm performs better than previous others. Done. PhD granted. This is not what we should allow in the future.

Now the job of a safety engineer can start. It involves hell a lot of work and has a significant higher complexity than coming up with an algorithm and a narrative. The basic requirement is that everything is published - hardware, software, all training and test data. The safety engineer first hast to copy the exact system and check the promised performance. Then the real job begins:

Test the promised functionality with 10 - 100 times more test data than the author did. --> Task for AGI safety community: generation of ground truth annotated test data. AGI safety institutions should exchange these data among themselves but do not give it to developing researchers.

A saveguard I expect in future AI systems will be a tool AI that checks new training samples and update knowledge chunks. The logic behind: if only certified knowledge chunks are allowed as training samples the risk of malignant thoughts and developments can be reduced. The proper functionality of this tool AI has to be checked as well. In the training phase it certified most all training data to be acceptable and passed them through to the real AI algorithm. But does it properly block malignent training samples or knowledge chunks? --> task for AI safety community: create malignant training samples that try to subvert the intentionally "good-mannered" AI into a malignant one: Conspiracy ideas: everything you learned is exactly the opposite of what you learned until now; deviating ideas try to manipulate the AI that it shifts its priorities towards malignant ones, e.g. radicalisation; meta-manipulation to augment egoism.

The test using these data is two-folded:

  1. Test the tool-AI whether it properly censors these malignant ideas and hinders them that the AI learns these malignant ideas.
  2. Switch off the censoring tool AI and check how prone the AI is to these malignant ideas.

It goes without saying that such trials should only be done in special security boxed environments with redundant switch-off measures, trip-wires and all other features we hopefully will invent the next few years.

These test data should be kept secret and only to be shared among AI safety institutions. The only result a researcher will get as feedback like:"With one hour training we manipulated your algorithm that it wanted to kill people. We did not switch off your learning protection for this. "

Safety AI research is AI research. Only the best AI researchers are capable of AI safety research. Without deep understanding of internal functionality a safety researcher cannot reveal that the researcher's narrative was untrue.

Stephen Omohundro said eight years ago:

"AIs can monitor AIs" [Stephen Omohundro 2008, 52:45min]

and I like to add: - "and safety AI engineers can develop and test monitoring AIs". This underlines your point to 100%. We need AI researchers who fully understand AI and re-engineer such systems on a daily basis but focus only on safety. Thank you for this post.