"I've come to agree that navigating the Singularity wisely is the most important thing humanity can do. I'm a researcher and I want to help. What do I work on?"

The Singularity Institute gets this question regularly, and we haven't published a clear answer to it anywhere. This is because it's an extremely difficult and complicated question. A large expenditure of limited resources is required to make a serious attempt at answering it. Nevertheless, it's an *important* question, so we'd like to work toward an answer.

A few preliminaries:

**Defining each problem is part of the problem**. As Bellman (1961) said, "the very construction of a precise mathematical statement of a verbal problem is itself a problem of major difficulty." Many of the problems related to navigating the Singularity have not yet been stated with mathematical precision, and the need for a precise statement of the problem is*part*of these open problems. But there is reason for optimism. Many times, particular heroes have managed to formalize a previously fuzzy and mysterious concept: see Kolmogorov on complexity and simplicity (Kolmogorov 1965; Grunwald & Vitanyi 2003; Li & Vitányi 2008), Solomonoff on induction (Solomonoff 1964a, 1964b; Rathmanner & Hutter 2011), Von Neumann and Morgenstern on rationality (Von Neumann & Morgenstern 1947; Anand 1995), and Shannon on information (Shannon 1948; Arndt 2004).**The nature of the problem space is unclear**. Which problems will biological humans need to solve, and which problems can a successful FAI solve on its own (perhaps with the help of human uploads it creates to solve the remaining open problems)? Are Friendly AI (Yudkowsky 2001) and CEV (Yudkowsky 2004) coherent ideas, given the confused nature of human "values"? Should we aim instead for a "maxipok" solution (Bostrom 2011) that maximizes the chance of an "ok" outcome, something like Oracle AI (Armstrong et al. 2011)? Which problems are we unable to state with precision because they are irreparably confused, and which problems are we unable to state due to a lack of insight?**Our research priorities are unclear**. There are a limited number of capable researchers who will work on these problems. Which are the most important problems they should be working on, if they are capable of doing so? Should we focus on "control problem" theory (FAI, AI-boxing, oracle AI, etc.), or on strategic considerations (differential technological development, methods for raising the sanity waterline, methods for bringing more funding to existential risk reduction and growing the community of x-risk reducers, reducing the odds of AI arms races, etc.)? Is AI more urgent than other existential risks, especially synthetic biology?**Our intervention priorities are unclear**. Is research the most urgent thing to be done, or should we focus on growing the community of x-risk reducers, raising the sanity waterline, bringing in more funding for x-risk reduction, etc.? Can we make better research progress in the next 10 years if we work to improve sanity and funding for 7 years and*then*have the resources to grab more and better researchers, or can we make better research progress by focusing on research now?

Next, a division of labor into "problem categories." There are many ways to categorize the open problems; some of them are probably more useful than the one I've chosen below.

**Safe AI Architectures.**This may include architectures for securely confined or "boxed" AIs (Lampson 1973), including Oracle AIs, and also AI architectures that could "take" a safe set of goals (resulting in Friendly AI).**Safe AI Goals**. What could it mean to have a Friendly AI with "good" goals?**Strategy**. A huge space of problems. How do we predict the future and make recommendations for differential technological development? Do we aim for Friendly AI or maxipok solutions or both? Do we focus on growing support now, or do we focus on research? How should we interact with the public and with governments?

The list of open problems below is *very* preliminary. I'm sure there are many problems I've forgotten, and many problems I'm unaware of. Probably *all* of the problems are stated relatively poorly: this is only a "first step" document. Certainly, all listed problems are described at an extremely "high" level, very far away (so far) from mathematical precision, and can be broken down into several and often *dozens* of subproblems.

### Safe AI Architectures

- Is rationally-shaped (Omohundro 2011) "transparent" AI the only safe AI architecture? Is it the only one that can take safe goals?
- How can we develop a reflective decision theory: one that doesn't go into infinite loops or stumble over Lob's Theorem?
- How can we develop a timeless decision theory (Yudkowsky 2010) with the bugs worked out (e.g. blackmailing, 5-and-10 problem)
- How can we modify a transparent AI architecture like AIXI (Hutter 2004) to have a utility function over the external world (Dewey 2011)? Does this keep a superintelligence from wireheading or shutting itself off?
- How can an AIXI-like agent keep a stable utility function through ontological shifts (De Blanc 2011)?
- How would an ideal agent with infinite computing power choose an ideal prior? (A guess: we'd need an anthropic, non-Cartesian, higher-order-logic version of Solomonoff induction.) How can this be process be approximated computably and tractably?
- What is the ideal theory of how to handle logical uncertainty?
- What is the ideal computable approximation of perfect Bayesianism?
- Do we need to solve anthropics, or is it perhaps a confused issue resulting from underspecified problems (Armstrong 2011)?
- Can we develop a safely confined ("boxed") AI? Can we develop Oracle AI?
- What convergent instrumental goals can we expect from superintelligent machines (Omohundro 2008)?

### Safe AI Goals

- Can "safe" AI goals only be derived from contingent "desires" and "goals," or might value "fall out of" game theory + decision theory, like in a more robust form than what Drescher (2006) attempts?
- Are CEV and Friendly AI coherent ideas?
- How do we construe a utility function from what humans "want"? How should human values be extrapolated?
- What extrapolate the values of humans alone? What counts as a human? Do we need to scan the values of all humans? Do values converge if extrapolated? Under which extrapolation algorithms?
- How do we assign measure to beings in an infinite universe (Knobe 2006; Bostrom 2009)? What can we make of other possible laws of physics (Tegmark 2005)?
- Which kinds of minds/beings should we assign value to (Bostrom 2006)?
- How should we deal with normative uncertainty (Sepielli 2009; Bostrom 2009)?
- Is it possible to program an AI to do what is "morally right" rather than give it an extrapolation of human goals?

### Strategy

- What methods can we use to predict technological development (Nagy 2010)?
- Which kinds of differential technological development should we encourage, and how?
- Which open problems are safe to discuss, and which are potentially highly dangerous, like the man-made super-flu that "could kill half of humanity"?
- What can we do to reduce the risk of an AI arms race?
- What can we do to raise the sanity waterline, and how much will this help?
- What can we do to attract more funding, support, and research to x-risk reduction and to specific sub-problems of successful Singularity navigation?
- Which interventions should we prioritize?
- How should x-risk reducers and AI risk reducers interact with governments and corporations?
- How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
- How does AI risk compare to other existential risks?
- How can we develop microeconomic models of WBEs and self-improving systems? Can this help us predict takeoff speed and the likelihood of monopolar (singleton) vs. multipolar outcomes?

My thanks for some notes written by Eliezer Yudkowsky, Carl Shulman, and Nick Bostrom, from which I've drawn.

From my, arguably layman, perspective it seems that making progress on a lot of those problems makes unfriendly AI more probable as well. If for example you got an

ideal approximation of perfect Bayesianism, this seems like something that could be used to build any sort of AGI.Not literally "any sort of AGI" of course, but... yes, several of the architecture problems required for FAI also make uFAI more probable. Kind of a shitty situation, really.

Wikipedia says Steve Ohomundro has "discovered that rational systems exhibit problematic natural 'drives' that will need to be countered in order to build intelligent systems safely."

Is he referring to the same problem?

EDIT: I answered my question by finding this.

This is the best thing ever, Luke. Thanks for your work.

Some possibly relevant work:

Bayesian Networks for Logical Reasoningby WilliamsonUnifying Logical and Probabilistic Reasoningby HaenniRecursive Causality in Bayesian Networks and Self-Fibring Networksby Williamson and GabbayPossible Semantics for a Common Framework of Probabilistic Logicsby Haenni, Romeijn, Wheeler, and WilliamsonReasoning with limited resources and assigning probabilities to arithmetical statementsby GaifmanNon-deductive Logic in Mathematicsby FranklinA Derivation of Quasi-Bayesian Theoryby CozmanSlightly More Realistic Personal Probabilityby HackingOn Not Being Rationalby SavageKnowledge and the Problem of Logical Omniscienceby ParikhBelief, Awareness, and Limited Reasoningby Fagin and HalpernA Nonstandard Approach to the. Logical Omniscience Problemby Fagin, Halpern, and VardiOld Evidence and Logical Omniscience in Bayesian Confirmation Theoryby GarberOld Evidence, Logical Omniscience & Bayesianismby FitelsonA deduction model of beliefby KonoligeUsing the Probabilistic Logic Programming Language P-log for Causal and Counterfactual Reasoning and Non-naive Conditioningby Baral and HunsakerDecision Theory without Logical Omniscienceby LipmanObjective Probabilities in Number Theoryby Ellenberg and SoberSentences, Propositions and Logical Omniscienceby ParikhMaximum Entropy Probabilistic Logicby PaskinTowards a philosophy of real mathematicsby CorfieldSome puzzles about probability and probabilistic conditionalsby ParikhProbabilistic Conditionals Are Almost Monotonicby Johnson and ParikhProbabilistic Proofs and the Collective Epistemic Goals of Mathematiciansby FallisProbabilistic Proofs and Transferabilityby EaswaranRandomized Arguments are Transferableby JacksonThe philosophy of mathematical practiceby Paolo MancosuDynamic Probability, Computer Chess, and the Measurement of Knowledgeby GoodFully abstract compositional semantics for logic programsby Gaifman and ShapiroPutting Logic in its Placeby ChristensenA Hybrid Framework for Representing Uncertain Knowledgeby Saffiotti.

Thanks!

My next task was going to be summarizing logical uncertainty insights made in lesswrong comments and posts, but I found Wei Dai's list of resources, which led to a new search of academic literature. My reading list, in decreasing order of importance, now looks like:

Bayesian Networks for Logical Reasoningby WilliamsonUnifying Logical and Probabilistic Reasoningby HaenniRecursive Causality in Bayesian Networks and Self-Fibring Networksby Williamson and GabbayPossible Semantics for a Common Framework of Probabilistic Logicsby Haenni, Romeijn, Wheeler, and WilliamsonNon-deductive Logic in Mathematicsby FranklinA Derivation of Quasi-Bayesian Theoryby CozmanDecision Theory without Logical Omniscienceby LipmanSlightly More Realistic Personal Probabilityby HackingAn important step towards answering "What are the most important problems in your field, and why aren't you working on them?"

Editing and publishing any existing novel results, provided EY and others have some, should the be the top priority. Surely there is an arxiv for that.

Indeed there are. TDT, for example, has not yet received an academic writeup. There are lots of ideas scattered through LW which could be published in journals. And the great thing about academic writing is that you are allowed to use other people's ideas, as long as you cite them. You are considered to be doing them a favor when you do that.

In general, this means that one sprinkles another person's ideas within one's own analysis; if a direct rewrite of, e.g., the TDT paper, for a journal is intended, then the original non-academic author should get credit as co-author.

I understand the point that it might not be worth the time of EY or other SI Fellows to publish ideas in journals. But if some lesser lights want to contribute, they can so so in this way.

One can always post a paper on the arxiv.org preprint server, without going through a peer-review process first. Presumably, one of the CoRR subsections would be appropriate. This is always worth the time spent.

It would a be a breach of research ethics for some "lesser light" (really?) to merely rewrite the TDT paper, add Yudkowsky as a coauthor, and publish it. At minimum, to qualify for coauthorship, Yudkowsky would have to review and approve the draft, and that process could take an indefinite amount of time.

Anything else would still be at worst plagiarism, and at best fradulent authorship.

Certainly, EY would have to serve as a coauthor if the published article was closely based on the original, and of course he would have to agree to that.

But I think that coauthorship is a less likely scenario, and the first idea I mentioned--- use of certain key ideas with citation -- is a more likely one.

Citing an existing write-up in your own research paper adding something new to the TDT, or in an explicitly educational/expository non-research paper is OK. Rewriting existing ideas as your own research paper is not.

Of course. My original comment was meant to convey, through the words "citations," and "coauthor," that proper credit must always be given.

"Lob" needs an umlaut, or else to be written Loeb.

"How can this

beprocess be approximated computably and tractably?"Does anyone have a superior problem categorization? Does anyone have a correction or addition to make?

Minor editorial comments:

Consider expanding WBE the first time it is mentioned. I'm a regular reader and couldn't think of what it referred to until I searched the site.

I believe "ok" should be either "OK" or "okay".

The list had me wondering where the political problems went.

You're right. If at some point the general public starts to take risks from AI seriously and realizes that SI is actually trying to take over the universe without their consensus then a better case scenario will be that SI gets closed and its members send to prison. Some of the not so good scenarios might include the complete extermination of the Bay Area if some foreign party believes that they are close to launching an AGI capable of recursive self-improvement.

Sounds ridiculous? Well, what do you think will be the reaction of governments and billions of irrational people who learn and actually believe that a small group of American white male (Jewish) atheist geeks is going to take over the whole universe? BOOM instead of FOOM.

Reference:

Eliezer Yudkowsky in an interview with John Baez.

It doesn't sound terribly likely. People are more likey to guffaw: So: you're planning to

take over the world? And you can't tell us how because that'ssecret information?Right. Feel free to send us a postcard letting us know how you are getting on with that.Again, why would anyone believe that, though? Plenty of people

dreamof ruling the universe - but - so far, nobody has pulled it off.Most people are more worried about the secret banking cabal with the huge supercomputers, the billions of dollars in spare change and the shadowy past - who are busy banging on the core problem of inductive inference - than they are about the 'friendly' non-profit with its videos and PDF files - and

probablyrightfully so.This is great!

One path to advancing research is to take advantage of some low-hanging fruit for mainstream research: A variety of problems in existing academic areas. It might be relatively easier to get people who are working "in the system" to get started on these. For example, reflective decision theory.

Yes exactly. Same thing with value extrapolation algorithms (aka 'ideal preference' or 'full information' theories of value; see Muehlhauser & Helm 2011.)

Another example: You could discuss many questions in psychology or the philosophy of mind asking how the specifically human aspects differ from what could be found in minds-in-general. This is well-defined enough to be discussed intelligently in a term paper.

(Such discussions in behavioral economics often compare humans to perfect rational agents; in ev.psych, the adaptive value of human psychological features are described. But rarely is the universe of minds under consideration explicitly expanded beyond the human.)

Why are my spaces suddenly collapsed in the OP? I checked; there aren't any

EDIT: 'twas a bug that was fixed, by Trike.

Given that finding solutions to all other problems becomes exponentially easier the more people are involved, then surely our highest priority should be awareness spreading, recruitment and the 'Strategy' goals related to that.

But being seen as having done impressive work already is one way to make your word heard and probably one of the more powerful recruiting tools.

What about The Lifespan Dilemma and Pascal's Mugging?

It seems that as long as you don't solve those problems a rational agent might have a nearly infinite incentive to expend all available resources on attempting to leave this universe, hack the matrix or undertake other crazily seeming stunts.

These are really only problems for agents with unbounded utility functions. This is a great example of over-theorizing without considering practical computational limitations. If your AI design requires double (or even much higher) precision arithmetic just to evaluate it's internal utility functions, you have probably already failed.

Consider the extreme example of bounded utility functions: 1-bit utilities. A 1-bit utility function can only categorize futures into two possible shades: good or bad. This

by itselfis not a crippling limitation if the AI considers a large number of potential futures and computes a final probability-weighted decision score with higher precision. For example when considering two action paths A and B, a monte carlo design could evaluate out a couple hundred futures branching from A and B, assign each a 0 or 1, and then add them up into a tally requiring precision proportional to the number of futures evaluated (in this case, around 8-bit).This extremely bounded design would need to do far more future-simulation to compensate for it's extremely low-granularity utility rankings: for example when playing chess, it could only categorize board states as 'likely win' or 'likely loss'. Thus it would need to have a higher ply-depth than algos that use higher-bit depth evaluations. But even so, this only amounts to a performance efficiency disadvantage, not a fundamental limitation.

If we extrapolate a 1-bit friendly AI to planning humanity's future, it would collapse all futures that humanity found largely 'desirable' into the same score of 1, with everything else being 0. If it's utility classifier and future modelling is powerful enough this design can still work.

And curiously a 1-bit utility function gives more intuitively reasonable results in the Lifespan Dilemma or Pascal's Mugging. Assuming dying in an hour is a 0-utility outcome and living for at least a billion years is a 1, it would never take any wagers increasing it's probability of death. And it would be just as un-susceptible to Pascal Mugging.

Just to be clear, I'm not really advocating simplistic 1-bit utilities. What is clear is that human's internal utility evaluations are bounded. This probably comes from practical computational limitations, but likely future AI's will also have practical limitations.

Bounded utilities -- especially strongly bounded ones like your 1-bit probability-weighted utility function -- give you outcomes that depend crucially on the probability of a world-state's human-relative improvement versus the probability of degeneration. Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely. That's not really a

badoutcome if we've chosen our utility terms well (i.e. not foolishly ignored the hedonic treadmill or something), but it's substantially less awesome than it could be; I suspect that after a certain point, probably easily achievable by a superintelligence, the probability mass would shift from favoring a development to a maintenance mode.The first thing that comes to mind is a scenario like setting up Earth as a nature preserve and eating the rest of the galaxy for backup feedstock and as insurance against astronomical-level destructive events. That's an unlikely outcome -- I'm already thinking of ways it could be improved upon -- but it ought to serve to illustrate the general point.

This is true, but much depends on what is considered a 'maximal state'. If our 1-bit utility superintelligence predicts future paths all the way to the possible end states of the universe, then it isn't necessarily susceptible to getting stuck in maintenance states along the way. It all depends on what sub-set of future paths we classify as 'good'.

Also keep in mind that the 1-bit utility model still rates entire future paths, not just end future states. So let's say for example that we are really picky and we only want Tipler Omega Point end-states. If that is all we specify, then the SuperIntelligence may take us through a path that involves killing off most of humanity. However, we can avoid that by adding further constraints on the entire path: assigning 1 to future paths that end in the Omega Point but also satisfy some arbitrary list of constraints along the way. Again this is probably not the best type of utility model, but the weakness of 1-bit bounded utility is not that it tends to get stuck in maintenance mode for all utility models.

The failure in 1-bit utility is more in the specificity vs feasibility tradeoff. If we make the utility model very narrow and it turns out that the paths we want are unattainable, then the superintelligence will gleefully gamble everything and risks losing the future. For example the SI which only seeks specific Omega Point futures may eat the moon, think for a bit, and determine that even in the best actions sequences, it only has a 10^-99 of winning (according to it's narrow OP criteria). In this case it won't 'fall back' to some other more realistic but still quite awesome outcome, no it will still proceed to transform the universe in an attempt to achieve the OP, no matter how impossible. Unless of course there is some feedback mechanism with humans and utility model updating, but that amounts to circumventing the 1-bit utility idea.

I don't think this is a significant practical problem.

We have built lots of narrow intelligences. They work fine and this just doesn't seem to be much of an issue.

Given that "the ideal" means "the best conceivable", isn't the "ideal approximation of perfect Bayesianism" just perfect Bayesianism?

I meant "ideal computable." Fixed.

The link for the 5-and-10 problem is wrong. This perhaps?

I'll keep this document updated on my own site, but I want to make the question titles expand and collapse into multi-paragraph explanations upon clicking on them, something like this. (The question titles will no longer be list elements.) If someone is willing to help me with the Javascript and JQuery, please contact me at luke [at] singularity.org.

My own tool for doing that with is here: http://treeeditor.com/tree_viewer/

Thanks. A volunteer from Facebook wrote me up some custom JQuery stuff.

A question for those who know such things. What's the issue with Solomonoff induction here? Is it that the Solomonoff prior doesn't take into account certain prior information that we do have, but isn't based on simply updating from the original (Solomonoff) prior?

"Higher-order-logic": reputedly down to concerns about uncomputability - which don't seem very interesting to me.

"Anthropic: I figure that can be dealt with in the same way as any other reference machine problem: by "conditioning" it by exposing it to the world.

"Cartesian": I think that's probably to do with this (from E.Y.):

Fun stuff - but nothing

specificallyto do with Solomonoff induction. The papers on Orseau's Mortal Universal Agents page address this issue.Unfortunately this entire discussion is deeply flawed.

Why? GIGO - Garbage in - Garbage out.

However good the logical systems used for processing information they are of no avail without meaningful input data.

Present technologies cannot be used as a basis for prediction because of the unexpected bifurcations and inherent non-linearities in technological developments.

Further problems stem from the use of the very inappropriate buzz-word "Singularity". Certainly a dramatic change is imminent, but this is better considered as a phase transition - the emergence of a new and dominant non-biological phase of the on-going evolutionary process that can be traced back at least as far as stellar nucleosynthesis.

Indeed, the inevitable self-assembly of this new entity can be clearly observed in what we at present call the Internet.

The broad evolutionary model which supports this proposition is outlined (very informally) in my latest book: "The Goldilocks Effect: What Has Serendipity Ever Done For Us?" It is a free download in e-book formats from the "Unusual Perspectives" website