Mentioned in

Open Problems Related to the Singularity (draft 1)

21XiXiDu

20lukeprog

0robertzk

15[anonymous]

11[anonymous]

4[anonymous]

1lukeprog

1[anonymous]

7thomblake

6shminux

3JoshuaFox

2shminux

1[anonymous]

1JoshuaFox

3shminux

2JoshuaFox

6[anonymous]

5lukeprog

4ESRogs

1timtyler

7XiXiDu

7timtyler

4JoshuaFox

4lukeprog

1JoshuaFox

2lukeprog

2FiftyTwo

9FAWS

1XiXiDu

0jacob_cannell

7Nornagest

0jacob_cannell

0timtyler

1[anonymous]

2lukeprog

0[anonymous]

0lukeprog

0timtyler

0lukeprog

0Matt_Simpson

0timtyler

-4PeterKinnon

New Comment

42 comments, sorted by Click to highlight new comments since: Today at 2:18 AM

Wikipedia says Steve Ohomundro has "discovered that rational systems exhibit problematic natural 'drives' that will need to be countered in order to build intelligent systems safely."

Is he referring to the same problem?

EDIT: I answered my question by finding this.

What is the ideal theory of how to handle logical uncertainty?

Some possibly relevant work:

*Bayesian Networks for Logical Reasoning*by Williamson*Unifying Logical and Probabilistic Reasoning*by Haenni*Recursive Causality in Bayesian Networks and Self-Fibring Networks*by Williamson and Gabbay*Possible Semantics for a Common Framework of Probabilistic Logics*by Haenni, Romeijn, Wheeler, and Williamson*Reasoning with limited resources and assigning probabilities to arithmetical statements*by Gaifman*Non-deductive Logic in Mathematics*by Franklin*A Derivation of Quasi-Bayesian Theory*by Cozman*Slightly More Realistic Personal Probability*by Hacking*On Not Being Rational*by Savage*Knowledge and the Problem of Logical Omniscience*by Parikh*Belief, Awareness, and Limited Reasoning*by Fagin and Halpern*A Nonstandard Approach to the. Logical Omniscience Problem*by Fagin, Halpern, and Vardi*Old Evidence and Logical Omniscience in Bayesian Confirmation Theory*by Garber*Old Evidence, Logical Omniscience & Bayesianism*by Fitelson*A deduction model of belief*by Konolige*Using the Probabilistic Logic Programming Language P-log for Causal and Counterfactual Reasoning and Non-naive Conditioning*by Baral and Hunsaker*Decision Theory without Logical Omniscience*by Lipman*Objective Probabilities in Number Theory*by Ellenberg and Sober*Sentences, Propositions and Logical Omniscience*by Parikh*Maximum Entropy Probabilistic Logic*by Paskin*Towards a philosophy of real mathematics*by Corfield*Some puzzles about probability and probabilistic conditionals*by Parikh*Probabilistic Conditionals Are Almost Monotonic*by Johnson and Parikh*Probabilistic Proofs and the Collective Epistemic Goals of Mathematicians*by Fallis*Probabilistic Proofs and Transferability*by Easwaran*Randomized Arguments are Transferable*by Jackson*The philosophy of mathematical practice*by Paolo Mancosu*Dynamic Probability, Computer Chess, and the Measurement of Knowledge*by Good*Fully abstract compositional semantics for logic programs*by Gaifman and Shapiro*Putting Logic in its Place*by Christensen*A Hybrid Framework for Representing Uncertain Knowledge*by Saffiotti

My next task was going to be summarizing logical uncertainty insights made in lesswrong comments and posts, but I found Wei Dai's list of resources, which led to a new search of academic literature. My reading list, in decreasing order of importance, now looks like:

*Bayesian Networks for Logical Reasoning*by Williamson*Unifying Logical and Probabilistic Reasoning*by Haenni*Recursive Causality in Bayesian Networks and Self-Fibring Networks*by Williamson and Gabbay*Possible Semantics for a Common Framework of Probabilistic Logics*by Haenni, Romeijn, Wheeler, and Williamson*Non-deductive Logic in Mathematics*by Franklin*A Derivation of Quasi-Bayesian Theory*by Cozman*Decision Theory without Logical Omniscience*by Lipman*Slightly More Realistic Personal Probability*by Hacking

[This comment is no longer endorsed by its author]

"existing novel results, provided EY and others have some"

Indeed there are. TDT, for example, has not yet received an academic writeup. There are lots of ideas scattered through LW which could be published in journals. And the great thing about academic writing is that you are allowed to use other people's ideas, as long as you cite them. You are considered to be doing them a favor when you do that.

In general, this means that one sprinkles another person's ideas within one's own analysis; if a direct rewrite of, e.g., the TDT paper, for a journal is intended, then the original non-academic author should get credit as co-author.

I understand the point that it might not be worth the time of EY or other SI Fellows to publish ideas in journals. But if some lesser lights want to contribute, they can so so in this way.

I understand the point that it might not be worth the time of EY or other SI Fellows to publish ideas in journals.

One can always post a paper on the arxiv.org preprint server, without going through a peer-review process first. Presumably, one of the CoRR subsections would be appropriate. This is always worth the time spent.

It would a be a breach of research ethics for some "lesser light" (really?) to merely rewrite the TDT paper, add Yudkowsky as a coauthor, and publish it. At minimum, to qualify for coauthorship, Yudkowsky would have to review and approve the draft, and that process could take an indefinite amount of time.

Anything else would still be at worst plagiarism, and at best fradulent authorship.

Certainly, EY would have to serve as a coauthor if the published article was closely based on the original, and of course he would have to agree to that.

But I think that coauthorship is a less likely scenario, and the first idea I mentioned--- use of certain key ideas with citation -- is a more likely one.

The list had me wondering where the political problems went.

You're right. If at some point the general public starts to take risks from AI seriously and realizes that SI is actually trying to take over the universe without their consensus then a better case scenario will be that SI gets closed and its members send to prison. Some of the not so good scenarios might include the complete extermination of the Bay Area if some foreign party believes that they are close to launching an AGI capable of recursive self-improvement.

Sounds ridiculous? Well, what do you think will be the reaction of governments and billions of irrational people who learn and actually believe that a small group of American white male (Jewish) atheist geeks is going to take over the whole universe? BOOM instead of FOOM.

Reference:

...—though it may be an idealistic dream—I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)

Eliezer Yudkowsky in an interview with John Baez.

If at some point the general public starts to take risks from AI seriously and realizes that SI is actually trying to take over the universe without their consensus then a better case scenario will be that SI gets closed and its members send to prison.

It doesn't sound terribly likely. People are more likey to guffaw: So: you're planning to *take over the world*? And you can't tell us how because that's *secret information*? *Right*. Feel free to send us a postcard letting us know how you are getting on with that.

Well, what do you think will be the reaction of governments and billions of irrational people who learn and actually believe that a small group of American white male (Jewish) atheist geeks is going to take over the whole universe?

Again, why would anyone believe that, though? Plenty of people *dream* of ruling the universe - but - so far, nobody has pulled it off.

Most people are more worried about the secret banking cabal with the huge supercomputers, the billions of dollars in spare change and the shadowy past - who are busy banging on the core problem of inductive inference - than they are about the 'friendly' non-profit with its videos and PDF files - and *probably* rightfully so.

This is great!

One path to advancing research is to take advantage of some low-hanging fruit for mainstream research: A variety of problems in existing academic areas. It might be relatively easier to get people who are working "in the system" to get started on these. For example, reflective decision theory.

Yes exactly. Same thing with value extrapolation algorithms (aka 'ideal preference' or 'full information' theories of value; see Muehlhauser & Helm 2011.)

Another example: You could discuss many questions in psychology or the philosophy of mind asking how the specifically human aspects differ from what could be found in minds-in-general. This is well-defined enough to be discussed intelligently in a term paper.

(Such discussions in behavioral economics often compare humans to perfect rational agents; in ev.psych, the adaptive value of human psychological features are described. But rarely is the universe of minds under consideration explicitly expanded beyond the human.)

What about The Lifespan Dilemma and Pascal's Mugging?

Should we penalize computations with large space and time requirements? This is a hack that solves the problem, but is it true? Are computationally costly explanations less likely? Should I think the universe is probably a coarse-grained simulation of my mind rather than real quantum physics, because a coarse-grained human mind is exponentially cheaper than real quantum physics? Should I think the galaxies are tiny lights on a painted backdrop, because that Turing machine would require less space to compute?

Given that, in general, a Turing machine can increase in utility vastly faster than it increases in complexity, how should an Occam-abiding mind avoid being dominated by tiny probabilities of vast utilities?

It seems that as long as you don't solve those problems a rational agent might have a nearly infinite incentive to expend all available resources on attempting to leave this universe, hack the matrix or undertake other crazily seeming stunts.

What about The Lifespan Dilemma and Pascal's Mugging?

These are really only problems for agents with unbounded utility functions. This is a great example of over-theorizing without considering practical computational limitations. If your AI design requires double (or even much higher) precision arithmetic just to evaluate it's internal utility functions, you have probably already failed.

Consider the extreme example of bounded utility functions: 1-bit utilities. A 1-bit utility function can only categorize futures into two possible shades: good or bad. This *by itself* is not a crippling limitation if the AI considers a large number of potential futures and computes a final probability-weighted decision score with higher precision. For example when considering two action paths A and B, a monte carlo design could evaluate out a couple hundred futures branching from A and B, assign each a 0 or 1, and then add them up into a tally requiring precision proportional to the number of futures evaluated (in this case, around 8-bit).

This extremely bounded design would need to do far more future-simulation to compensate for it's extremely low-granularity utility rankings: for example when playing chess, it could only categorize board states as 'likely win' or 'likely loss'. Thus it would need to have a higher ply-depth than algos that use higher-bit depth evaluations. But even so, this only amounts to a performance efficiency disadvantage, not a fundamental limitation.

If we extrapolate a 1-bit friendly AI to planning humanity's future, it would collapse all futures that humanity found largely 'desirable' into the same score of 1, with everything else being 0. If it's utility classifier and future modelling is powerful enough this design can still work.

And curiously a 1-bit utility function gives more intuitively reasonable results in the Lifespan Dilemma or Pascal's Mugging. Assuming dying in an hour is a 0-utility outcome and living for at least a billion years is a 1, it would never take any wagers increasing it's probability of death. And it would be just as un-susceptible to Pascal Mugging.

Just to be clear, I'm not really advocating simplistic 1-bit utilities. What is clear is that human's internal utility evaluations are bounded. This probably comes from practical computational limitations, but likely future AI's will also have practical limitations.

Bounded utilities -- especially strongly bounded ones like your 1-bit probability-weighted utility function -- give you outcomes that depend crucially on the probability of a world-state's human-relative improvement versus the probability of degeneration. Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely. That's not really a *bad* outcome if we've chosen our utility terms well (i.e. not foolishly ignored the hedonic treadmill or something), but it's substantially less awesome than it could be; I suspect that after a certain point, probably easily achievable by a superintelligence, the probability mass would shift from favoring a development to a maintenance mode.

The first thing that comes to mind is a scenario like setting up Earth as a nature preserve and eating the rest of the galaxy for backup feedstock and as insurance against astronomical-level destructive events. That's an unlikely outcome -- I'm already thinking of ways it could be improved upon -- but it ought to serve to illustrate the general point.

Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely.

This is true, but much depends on what is considered a 'maximal state'. If our 1-bit utility superintelligence predicts future paths all the way to the possible end states of the universe, then it isn't necessarily susceptible to getting stuck in maintenance states along the way. It all depends on what sub-set of future paths we classify as 'good'.

Also keep in mind that the 1-bit utility model still rates entire future paths, not just end future states. So let's say for example that we are really picky and we only want Tipler Omega Point end-states. If that is all we specify, then the SuperIntelligence may take us through a path that involves killing off most of humanity. However, we can avoid that by adding further constraints on the entire path: assigning 1 to future paths that end in the Omega Point but also satisfy some arbitrary list of constraints along the way. Again this is probably not the best type of utility model, but the weakness of 1-bit bounded utility is not that it tends to get stuck in maintenance mode for all utility models.

The failure in 1-bit utility is more in the specificity vs feasibility tradeoff. If we make the utility model very narrow and it turns out that the paths we want are unattainable, then the superintelligence will gleefully gamble everything and risks losing the future. For example the SI which only seeks specific Omega Point futures may eat the moon, think for a bit, and determine that even in the best actions sequences, it only has a 10^-99 of winning (according to it's narrow OP criteria). In this case it won't 'fall back' to some other more realistic but still quite awesome outcome, no it will still proceed to transform the universe in an attempt to achieve the OP, no matter how impossible. Unless of course there is some feedback mechanism with humans and utility model updating, but that amounts to circumventing the 1-bit utility idea.

It seems that as long as you don't solve those problems a rational agent might have a nearly infinite incentive to expend all available resources on attempting to leave this universe, hack the matrix or undertake other crazily seeming stunts.

I don't think this is a significant practical problem.

We have built lots of narrow intelligences. They work fine and this just doesn't seem to be much of an issue.

I'll keep this document updated on my own site, but I want to make the question titles expand and collapse into multi-paragraph explanations upon clicking on them, something like this. (The question titles will no longer be list elements.) If someone is willing to help me with the Javascript and JQuery, please contact me at luke [at] singularity.org.

How would an ideal agent with infinite computing power choose an ideal prior? (A guess: we'd need an anthropic, non-Cartesian, higher-order-logic version of Solomonoff induction.) How can this be process be approximated computably and tractably?

A question for those who know such things. What's the issue with Solomonoff induction here? Is it that the Solomonoff prior doesn't take into account certain prior information that we do have, but isn't based on simply updating from the original (Solomonoff) prior?

What's the issue with Solomonoff induction here?

"Higher-order-logic": reputedly down to concerns about uncomputability - which don't seem very interesting to me.

"Anthropic: I figure that can be dealt with in the same way as any other reference machine problem: by "conditioning" it by exposing it to the world.

"Cartesian": I think that's probably to do with this (from E.Y.):

AIXI devises a Cartesian model of a separated environmental theatre, rather than devising a model of a naturalistic reality that includes AIXI.

Fun stuff - but nothing *specifically* to do with Solomonoff induction. The papers on Orseau's Mortal Universal Agents page address this issue.

Unfortunately this entire discussion is deeply flawed.

Why? GIGO - Garbage in - Garbage out.

However good the logical systems used for processing information they are of no avail without meaningful input data.

Present technologies cannot be used as a basis for prediction because of the unexpected bifurcations and inherent non-linearities in technological developments.

Further problems stem from the use of the very inappropriate buzz-word "Singularity". Certainly a dramatic change is imminent, but this is better considered as a phase transition - the emergence of a new and dominant non-biological phase of the on-going evolutionary process that can be traced back at least as far as stellar nucleosynthesis.

Indeed, the inevitable self-assembly of this new entity can be clearly observed in what we at present call the Internet.

The broad evolutionary model which supports this proposition is outlined (very informally) in my latest book: "The Goldilocks Effect: What Has Serendipity Ever Done For Us?" It is a free download in e-book formats from the "Unusual Perspectives" website

"I've come to agree that navigating the Singularity wisely is the most important thing humanity can do. I'm a researcher and I want to help. What do I work on?"

The Singularity Institute gets this question regularly, and we haven't published a clear answer to it anywhere. This is because it's an extremely difficult and complicated question. A large expenditure of limited resources is required to make a serious attempt at answering it. Nevertheless, it's an

importantquestion, so we'd like to work toward an answer.A few preliminaries:

Defining each problem is part of the problem. As Bellman (1961) said, "the very construction of a precise mathematical statement of a verbal problem is itself a problem of major difficulty." Many of the problems related to navigating the Singularity have not yet been stated with mathematical precision, and the need for a precise statement of the problem ispartof these open problems. But there is reason for optimism. Many times, particular heroes have managed to formalize a previously fuzzy and mysterious concept: see Kolmogorov on complexity and simplicity (Kolmogorov 1965; Grunwald & Vitanyi 2003; Li & Vitányi 2008), Solomonoff on induction (Solomonoff 1964a, 1964b; Rathmanner & Hutter 2011), Von Neumann and Morgenstern on rationality (Von Neumann & Morgenstern 1947; Anand 1995), and Shannon on information (Shannon 1948; Arndt 2004).The nature of the problem space is unclear. Which problems will biological humans need to solve, and which problems can a successful FAI solve on its own (perhaps with the help of human uploads it creates to solve the remaining open problems)? Are Friendly AI (Yudkowsky 2001) and CEV (Yudkowsky 2004) coherent ideas, given the confused nature of human "values"? Should we aim instead for a "maxipok" solution (Bostrom 2011) that maximizes the chance of an "ok" outcome, something like Oracle AI (Armstrong et al. 2011)? Which problems are we unable to state with precision because they are irreparably confused, and which problems are we unable to state due to a lack of insight?Our research priorities are unclear. There are a limited number of capable researchers who will work on these problems. Which are the most important problems they should be working on, if they are capable of doing so? Should we focus on "control problem" theory (FAI, AI-boxing, oracle AI, etc.), or on strategic considerations (differential technological development, methods for raising the sanity waterline, methods for bringing more funding to existential risk reduction and growing the community of x-risk reducers, reducing the odds of AI arms races, etc.)? Is AI more urgent than other existential risks, especially synthetic biology?Our intervention priorities are unclear. Is research the most urgent thing to be done, or should we focus on growing the community of x-risk reducers, raising the sanity waterline, bringing in more funding for x-risk reduction, etc.? Can we make better research progress in the next 10 years if we work to improve sanity and funding for 7 years andthenhave the resources to grab more and better researchers, or can we make better research progress by focusing on research now?Next, a division of labor into "problem categories." There are many ways to categorize the open problems; some of them are probably more useful than the one I've chosen below.

Safe AI Architectures.This may include architectures for securely confined or "boxed" AIs (Lampson 1973), including Oracle AIs, and also AI architectures that could "take" a safe set of goals (resulting in Friendly AI).Safe AI Goals. What could it mean to have a Friendly AI with "good" goals?Strategy. A huge space of problems. How do we predict the future and make recommendations for differential technological development? Do we aim for Friendly AI or maxipok solutions or both? Do we focus on growing support now, or do we focus on research? How should we interact with the public and with governments?The list of open problems below is

verypreliminary. I'm sure there are many problems I've forgotten, and many problems I'm unaware of. Probablyallof the problems are stated relatively poorly: this is only a "first step" document. Certainly, all listed problems are described at an extremely "high" level, very far away (so far) from mathematical precision, and can be broken down into several and oftendozensof subproblems.## Safe AI Architectures

## Safe AI Goals

## Strategy

My thanks for some notes written by Eliezer Yudkowsky, Carl Shulman, and Nick Bostrom, from which I've drawn.