Central claim: Ambitious alignment of superintelligent AI will not be solved in time, therefore different variants of plan B should be explored and implemented.
Below is a short overview and links to my works on AI safety. The central theme of all them is “plan B”. Thus it could be called the “plan B agenda”.
Ambitious alignment of arbitrary superintelligent AI can’t be solved before AGI creation: a) proof is too complex and unreliable b) it needs at least superintelligent AI safety team to be created с) Dangerous AI may appear soon – 2030. Many different types of AI-related x-risks are possible, not only paperclipper.
Safer AI architecture is needed: data-driven AI and human-like functional architecture for interpretability. Human-like AI could be modelled around a real-world person: it is not an upload, but very similar to side-loading. E.g. trained on someone’s texts.
Not only local but global solution of AI safety is needed: AI-nanny based on global surveillance and empowered by agreement between nuclear powers. Human-like AI or a first upload could be a “virtual tzar" of this.
The nature of human values depends on who is “moral subject": society, brain, personality or subpersonality. Human values are not stable preferences. “Real choices” are more important than values, as “values” are not always useful philosophical construction. Also, meta-values are more important: life extension, personal universes, – as they give anyone freedom to have their own values.
An AI that is learning human values is dangerous as it doesn’t know that values yet.
Unfriendly AI will not be very much interested to kill humans for atoms, as atoms have very small instrumental value, and living humans have larger instrumental value on all stages of AI’s evolution. But we have to help to understand this for future AI, especially for "Young AI”. We could empower our position by making some precomitments.
My works on AI Safety:
AI safety defense levels, a roadmap