Computation Hazards

9evand

2Alex_Altair

2evand

0Alex_Altair

0evand

8Viliam_Bur

1Alex_Altair

6Viliam_Bur

2FeepingCreature

-2Alex_Altair

2FeepingCreature

1Alex_Altair

0Alex_Altair

1Viliam_Bur

7Gastogh

4lavalamp

2Nornagest

2DanielLC

0Alex_Altair

1Thomas

1Alex_Altair

1DanielLC

0Thomas

0DanielLC

0amit

0amit

0Alex_Altair

0yli

0billswift

0Alex_Altair

0DanielLC

1gwern

0DanielLC

0Alex_Altair

0DanielLC

1Alex_Altair

0JGWeissman

1Alex_Altair

2JGWeissman

0Alex_Altair

0gwern

0JGWeissman

0Alex_Altair

-6shminux

3gwern

-5shminux

1gwern

0Zaine

0Kyre

-2[anonymous]

1DanArmak

-8Pentashagon

0moridinamael

0Randaly

4Zack_M_Davis

4Randaly

0Pentashagon

1asr

New Comment

58 comments, sorted by Click to highlight new comments since: Today at 10:12 AM

A computation hazard is a large negative consequence that may arise merely from vast amounts of computation, such as in a future supercomputer.

Are you including anything in this beyond the hazards of accidental simulation? It sounds to me like you aren't.

computations that run most algorithms, and computations that are particularly likely to run algorithms that are computation hazards.

I can't imagine any computation hazard arising from a computer that runs most algorithms, ie Solomonoff induction, actually being a hazard on any size computer and timescale that is commensurate with, say, turning the solar system into computronium and burning all the energy of the Sun. I don't think the selection power present in "run most algorithms" actually simulates anything sentient for a sufficient length of time for me to care. Now, a computer program that selectively simulates algorithms likely to be sentient might be a different matter, but then you're having a discussion about the ethics of simulation, not accidental computation hazards or "most algorithms". In other words, I think the claim expressed above stems from a fundamental misunderstanding of exactly how strong the term "uncomputably complex" is. I suspect you have not yet truly understood the growth curve of the busy beaver function.

Imagine a supercomputer’s power is being tested on a simple game, like chess or Go

I've written programs to play games. I cannot possibly see where such a hazard comes from. Have you looked at the proofs that such games can be Turing complete? That is, not just noted their existence, but actually examined the proofs. The level of machine that can be built on a standard size Go or Chess board is trivial. I think the accidental simulation hazard of such a computer is far less than that from noise errors on 8-bit microcontrollers running industrial control programs. Turing completeness relies on assumptions of infinte or very large (for merely approximate completeness) board sizes. And once you let the board size scale like that, you either have something that is exceedingly selective about which paths it explores (and therefore isn't "accidental" in the sense of running most algorithms), or doesn't simulate anything for long enough to exhibit sentient behavior, or even to have a very good chance of producing something with anything close to the potential of an unhatched ant.

In other words, I fail to see why this line of inquiry is deserving of anything more than an off-hand assessment of "not a hazard", let alone the first quarter of the post.

The remainder of the post seems to me to do a poor job of separating out the hazards of simulating people inside an AGI, vs the hazards presented by simply having the AGI. The latter hazards are thoroughly discussed elsewhere, and I don't find that this discussion adds anything. By mixing those hazards in with computation hazards, you make it very unclear whether those are intended to be included in your definition (which I originally thought excluded the results of the computation), and you make the rest of the post much less clear and much less useful.

I can't imagine any computation hazard arising from a computer that runs most algorithms actually being a hazard on any size computer and timescale that is commensurate with, say, turning the solar system into computronium and burning all the energy of the Sun.

I'm not sure about this. It's difficult for me to do the order-of-magnitude calculation, because I don't know how many flops we can gets from using the solar system as substrate, and because I have no idea how small a program can be before it has moral status.

I suspect you have not yet truly understood the growth curve of the busy beaver function.

From my understanding, the busy beaver function is about the maximum number of steps an n-state Turing machine can take before you know it won't halt. This doesn't seem to have anything to do with the probability of simulating people; both halting and non-halting programs could have moral status.

Overall, I agree that the "runs most algorithms" category is not realistic. It's mostly just for philosophical interest, and completeness.

The remainder of the post seems to me to do a poor job of separating out the hazards of simulating people inside an AGI, vs the hazards presented by simply having the AGI.

Yeah, I'm worried that the concept of "computational hazard" as I've used it here isn't very useful, or rather, doesn't carve reality at its joints.

The latter hazards are thoroughly discussed elsewhere, and I don't find that this discussion adds anything.

I wasn't really trying to add original material to the discussion. This post was supposed to be a summary of ideas already discovered on LW, and I wanted to know if I was missing any, or if this was a good summary of the ideas.

Thanks for all your feedback!

The busy beaver function is the lower bound on how fast a function has to grow before it counts as "uncomputable". When people say that the Solomonoff induction or Kolmogorov complexity is uncomputable, that's what they mean. When you say you're having trouble with an order of magnitude estimate, I have to wonder whether you attempted to come up with an order of magnitude estimate for the exponent. For example, I'm pretty sure that no conceivable amount of computronium will get to 10^10000 territory, and that I can safely ignore ethical problems that only arise once we're in that territory. I'm rather skeptical of the idea that we could even get to 10^1000 territory by brute force, even if we turned the galaxy into computronium.

Therefore, when you say you're concerned about computing Solmonoff induction or Kolmogorov priors, I'm left wondering whether you're worried about 5- or 6-state machines, because there is no conceivable way the program would ever get to the 7-state machines.

And you also can't simply dismiss the longest-running small Turing machines, in my opinion. If there exists an accidental computational hazard worth worrying about, I would assume it's in the form of something analogous to "run Conway's life until the board is empty" -- in other words, precisely a small Turing machine that is likely to run a very long time before halting, and whose halting status is difficult to determine.

I wasn't really trying to add original material to the discussion. This post was supposed to be a summary of ideas already discovered on LW, and I wanted to know if I was missing any, or if this was a good summary of the ideas.

I think the summary is overly confused. If the accidental hazards are of purely philosophical interest, they shouldn't occupy such a large fraction of the post, and shouldn't be the first thing discussed. I almost didn't bother reading the rest.

I think the non-simulation hazards have little in common with the simulation hazards from an ethics standpoint, and less from a practical FAI programming view. If you're having trouble separating them, then I would take that as a strong clue that you've chosen the wrong points to carve at.

Thanks for all your feedback!

You're welcome! There's some interesting stuff here, though I'm skeptical that there's much of interest in anything except the intentional, directed simulations questions.

The busy beaver function is the lower bound on how fast a function has to grow before it counts as "uncomputable".

Ah, I see. That's not the definition, but it is a fact about it. Although I think you might be confused about the definition of "uncomputable". It doesn't have to do with functions growing. It's just a separate, awesome fact that all computable functions grow slower than the (uncomputable) busy beaver function. There are many uncomputable functions that grow slower than computable functions.

Hmm. Seems I've confused some stuff. What you've said is correct, but I think my point is still valid.

The Kolmogorov prior is uncomputable. The time required to approximate it by simulating Turing machines to size n (in other words, the naive brute force approach in question) grows at busy beaver speeds, because it requires simulating all non-halting machines within the relevant size, and there is no generalizable way to shortcut those simulations.

Now, there are ways to approximate Solomonoff / Kolmogorov / AIXI with sane computational limits. However, once you start doing that, you can no longer claim that you run the risk of "running most algorithms", at least by the mathematical definition of "most" that I assumed you were using. Or rather, you will eventually, as you wait for infinite computation to be expended on the problem. But I'd say that either you're being selective to a degree that the hazard lies in selecting for simulation, rather than in running "most" algorithms, or you're in no more danger than in the brute force case. This is related to the point I made above: I think the accidentally dangerous portion of the search space is likely to lie in the algorithms whose runtime is long relative to their complexity, which are precisely the ones that will be avoided by approximations intended to be tractable.

I am not sure that *simulating* people is the same as *creating* people. Or more generally, that simulating a universe is the same as creating the universe, and stopping the simulation is the same as destroying the universe.

Even if we accept that the simulated people are *real*, they are real even if we don't simulate them -- they already exist somewhere in the multiverse. (They may have very low prior probability, but so do we, right?) Your metaphor for simulation is "creating a copy", my metaphor is "looking through a window". Which one is correct? They have different ethical consequences, because creating a copy of suffering means increasing suffering, but looking through a window at suffering does not. In other words, *not* simulating suffering is just as helpful as closing your eyes; it does not remove the suffering from the world.

So the question is, if the universe *A* contains a simulation of a universe *B*, does this increase the "existence" of the universe *B*? If the universe *A* runs the simulation of the universe *B* thousand times, is the increase thousand times greater? What if the simulation runs only once, but the data are copied thousand times to a redundant disk array? What if we write the algorithm, but don't actually run it? The calculation "2+2=4" may be also part of many universes, some of them including sentient beings; does writing it have ethical consequences too? If we can simulate a universe by billion computations of Life, is it also ethically wrong to make billion Life computations on random boards? (The individual steps of Life don't have an identity, do they?) Every configuration of atoms around us is some kind of computation, we just don't have the means to extract the data; compared with this, are our computer computations even significant? How about approximations, are they simulations too?

You are right. Many Worlds says that *our universe* exists in a superposition of many states, all of them governed by the *same* physical laws.

But if we assume the possibility of *other universes* with *different* physical laws (which I did implicitly), Solomonoff prior provides a framework for reasoning about them. Simply said, every universe exists, but some of them "exist more" and others "exist less", whatever that means. Simpler universes "exist more", complex universes "exist less"; each additional bit of description reduces the "existence" in half. Therefore very complicated universes have so little "existence" that we don't have to care about them.

This hypothesis feels even more weird than the Many Worlds hypothesis, but it explains some things that are otherwise difficult to explain, such as why our universe is fine-tuned for us. Without the hypothesis of multiple universes, the anthropic principle provides only a partial answer. It explains why we can't exist where we can't exist, but it does not explain why there is a universe where we *can* exist. On the other hand, if everything exists, why does our universe follow any laws? Solomonoff prior says that universes which follow laws "exist more", because it is easier to describe them (you only have to describe the initial state and the laws, not every possible exception). Thus, the anthropic principle + multiverse + Solomonoff prior together say that we do most probably exist in the *simplest* universe where we can exist; where simplest does not mean smallest in space and time, but most easy to fully describe mathematically. (Though I am not really sure if this universe really is simpler than other possible intelligent-life-containing universes. Maybe something is wrong with my explanation.)

The only way this belief is useful to me, is that it provides explanations to a few questions I would otherwise spend time answering; plus a wrong answer on them might make my real life worse.

First, avoiding generalized Pascal mugging: Yeah, everything is possible, including the chance that if I don't give you $1000 now, I will be tortured forever by an omnipotent sadist; but the probability is epsilon, so I won't give you those $1000 anyway.

Second, avoiding generalized quantum suicide: Yeah, whatever I do, in some universe it will have good consequences. And in some universe it will have bad consequences. But I should focus on whether the average (expected) results are positive or negative. For example, in case of a quantum suicide, the average result is me dead; in case of a lottery, the average result is not winning; in case of religion, the average result is no afterlife. On the other hand, when rationally doing useful things, the average result is more utilons.

The line between MWI and Tegmark Multiverse is not very clear, some of my arguments could be used for both. Using only MWI can answer questions about quantum randomness or generally about lawful randomness (which is probably on some level fueled by a quantum randomness: for example if I throw a coin, the exact movement of my muscles is determined by exact firing of my neurons, and a quantum event can make this signal a little bit weaker or stronger). But mere MWI cannot answer to questions like "what if this universe is just a simulation?", because that is outside of its framework (a simulation in what? possibly in a universe with different laws of physics? how do I calculate a probability of that?).

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

Nitpick: we can date this concern at least as far back as Vernor Vinge's *A Fire Upon the Deep*:

Pham Nuwen's ticket to the Transcend was based on a Power's sudden interest in the Straumli perversion. This innocent's ego might end up smeared across a million death cubes, running a million million simulations of human nature.

Some algorithms are obviously not people.

I disagree. I don't think sentience is all-or-nothing. Given that, I'd expect that it would be almost impossible (in the mathematical one-in-infinity sense) for a given system to have exactly zero sentience. Some algorithms are just not very much people. Some algorithms will produce less sentience in a thousand years than you will in a microsecond.

I don't think sentience is all-or-nothing.

Fascinating! I can imagine this being true. So maybe I should say, "Some algorithms are obviously not in the utility function of pretty much anybody.". But then again, I don't think "people" means "sentience". I don't care about simulating rocks, whether or not they have 0.001 sentience.

An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction.

Solomonoff Induction is uncomputable, so it's not a computation. Would be correct if you had written

An example of a computation that runs most algorithms could be some program that approximates a mathematical formalism Solomonoff induction.

Also, strictly speaking no real-world computation could run "most" algorithms, since there are infinitely many and it could only run a finite number. It would make more sense to use an expression like "computations that search through the space of all possible algorithms".

A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. Some algorithms are obviously not people. For example, any algorithm whose output is repeating with a period less than gigabytes...

Is this supposed to be about avoiding the algorithms simulating suffering people, or avoiding them doing something dangerous to the outside world? Obviously an algorithm could simulate a person while still having a short output, so I'm thinking it has to be about the second one. But then the notion of nonperson predicates doesn't apply, because it's about avoiding simulating people (that might suffer and that will die when the simulation ends). Also, a dangerous algorithm could probably do some serious damage with under a gigabyte of output. So having less than a gigabyte output doesn't really protect you from anything.

A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. Some algorithms are obviously not people. For example, any algorithm whose output is repeating with a period less than gigabytes...

Is this supposed to be about avoiding the algorithms simulating suffering people, or avoiding them doing something dangerous to the outside world? Obviously an algorithm could simulate a person while still having a short output, so I'm thinking it has to be about the second one. But then the notion of nonperson predicates doesn't apply, because it's about avoiding simulating suffering people.

[This comment is no longer endorsed by its author]

The first link in the reference section is wrong, it should be to http://www.aleph.se/papers/oracleAI.pdf

In the situations above, the people will be created and, happy or not, eliminated as soon as they are no longer needed.

Also, I think it's not obvious whether we should create more happy people, or just improve the lives of the currently existing people. I kind of get the idea that post-singularity, we will all be combined into One Big Super Person, like reverse Ebborians, and it won't end up mattering.

In the situations above, the people will be created and, happy or not, eliminated as soon as they are no longer needed.

I don't mean that they'll exist permanently. It's good for a happy person to exist, even if it's only for a little while.

Also, I think it's not obvious whether we should create more happy people, or just improve the lives of the currently existing people.

You shouldn't go out of your way to avoid running programs that create happy people. More generally, if it would be helpful to run such a program, but not quite worth the resources on its own, it may be worth while if it's a happy person. That will happen about as often as a program being worth while on its own, but not worth running because it creates a sad person.

Ah, I was thinking of "computational hazard" as meaning the computation itself is bad, not its consequences on the computing substrate or outside environment. I thought a "self-improving agent" was an example of something that might compute a hazard as a result of computing lots of stuff, some of which turns out to be hazardous. But short of instantiating that particular computational hazard, I don't think it does bad *merely* by computation, rather the computation helps it direct its actions to achieve bad consequences.

If your consequentialist ethics cares only about suffering sentient beings, then unless the simulations can affect the simulating agent in some way and render its actions less optimal, creating suffering beings is the *only* way there can be computation hazards.

If your ethics cares about other things like piles made of prime-numbered rocks, then that's a computation hazard; or if the simulations can affect the simulator, that obviously opens a whole kettle of worms.

(For example, there's apparently a twisty problem of 'false proofs' in the advanced decision theories where simulating a possible proof makes the agent decide to take a suboptimal choice; or the simulator could stumble upon a highly optimized program which takes it over. I'm sure there are other scenarios like that I haven't thought of.)

If your consequentialist ethics cares only about suffering sentient beings, then unless the simulations can affect the simulating agent in some way and render its actions less optimal, creating suffering beings is the only way there can be computation hazards.

Agreed. The sentence I quoted seemed to indicate that Alex thought he had a counterexample, but it turns out we were just using different definitions of "computational hazards".

## This is a summary of material from various posts and discussions. My thanks to Eliezer Yudkowsky, Daniel Dewey, Paul Christiano, Nick Beckstead, and several others.

Several ideas have been floating around LessWrong that can be organized under one concept, relating to a subset of AI safety problems. I’d like to gather these ideas in one place so they can be discussed as a unified concept. To give a definition:

A

computation hazardis a large negative consequence that may arisemerelyfrom vast amounts of computation, such as in a future supercomputer.For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

There are other hazards that may arise in the course of running large-scale computations. In general, we might say that:

Large amounts of computation will likely consist in running many diverse algorithms. Many algorithms are computation hazards. Therefore, all else equal, the larger the computation, the more likely it is to produce a computation hazard.

Of course, most algorithms may be morally neutral. Furthermore, algorithms must be somewhat complex before they could possibly be a hazard. For instance, it is intuitively clear that no eight-bit program could possibly be a computation hazard on a normal computer. Worrying computations therefore fall into two categories: computations that run

mostalgorithms, and computations that are particularly likely to run algorithms that are computation hazards.An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction. First published in 1964, it is an attempt to formalize the scientific process of induction using the theory of Turing machines. It is a brute-force method that finds hypotheses to explain data by testing all possible hypotheses. Many of these hypotheses may be algorithms that describe the functioning of people. At a sufficient precision, these algorithms themselves may experience consciousness and suffering. Taken literally, Solomonoff induction runs all algorithms; therefore it produces all possible computation hazards. If we are to avoid computation hazards, any implemented approximations of Solomonoff induction will need to determine

ahead of timewhich algorithms are computation hazards.Computations that run most algorithms could also hide in other places. Imagine a supercomputer’s power is being tested on a simple game, like chess or Go. The testing program simply tries all possible strategies, according to some enumeration. The best strategy that the supercomputer finds would be a measure of how many computations it could perform, compared to other computers that ran the same program. If the rules of the game are complex enough to be Turing complete (a surprisingly easy achievement) then this game-playing program would eventually simulate all algorithms, including ones with moral status.

Of course, running

mostalgorithms is quite infeasible simply because of the vast number of possible algorithms. Depending on the fraction of algorithms that are computation hazards, it may be enough that a computation run an enormous number which act as a random sample of all algorithms. Computations of this type might include evolutionary programs, which are blind to the types of algorithms they run until the results are evaluated for fitness. Or they may be Monte Carlo approximations of massive computations.But if computation hazards are relatively rare, then it will still be unlikely for large-scale computations to stumble across them unguided. Several computations may fall into the second category of computations that are particularly likely to run algorithms that

arecomputation hazards. Here we focus on three types of computations in particular: agents, predictors and oracles. The last two types are especially important because they are often considered safer types of AI than agent-based AI architectures. First I will stipulate definitions for these three types of computations, and then I will discuss the types of computation hazards they may produce.## Agents

An agent is a computation which decides between possible actions based on the consequences of those actions. They can be thought of as “steering” the future towards some target, or as selecting a future from the set of possible futures. Therefore they can also be thought of as having a goal, or as maximizing a utility function.

Sufficiently powerful agents are extremely powerful because they constitute a feedback loop. Well-known from physics, feedback loops often change their surroundings incredibly quickly and dramatically. Examples include the growth of biological populations, and nuclear reactions. Feedback loops are dangerous if their target is undesirable. Agents will be feedback loops as soon as they are able to improve their ability to improve their ability to move towards their goal. For example, humans can improve their ability to move towards their goal by using their intelligence to make decisions. A student aiming to create cures can use her intelligence to learn chemistry, therefore improving her ability to decide what to study next. But presently, humans cannot improve their intelligence, which would improve their ability to improve their ability to make decisions. The student cannot yet learn how to modify her brain in order for her to more quickly learn subjects.

## Predictors

A predictor is a computation which takes data as input, and predicts what data will come next. An example would be certain types of trained neural networks, or any

approximationof Solomonoff induction. Intuitively, this feels safer than an agent AI because predictors do not seem to have goals or take actions; they just report predictions as requested by human.## Oracles

An oracle is a computation which takes questions as input, and returns answers. They are broader than predictors in that one could ask an oracle about predictions. Similar to a predictor, oracles do not seem to have goals or take actions. (Some material summarized here.)

## Examples of hazards

Agent-like computations are the most clearly dangerous computation hazards. If any large computation starts running the beginning of a

self-improving agentcomputation, it is difficult to say how far the agent may safely be run before it is a computation hazard. As soon as the agent is sufficiently intelligent, it will attempt to acquire more resources like computing substrate and energy. It may also attempt to free itself from control of the parent computation.Another major concern is that, because people are an important part of the surroundings, even non-agent predictors or oracles will

simulate peoplein order to make predictions or give answers respectively. Someone could ask a predictor, “What will this engineer do if we give him a contract?” It may be that the easiest way for the predictor to determine the answer is to simulate the internal workings of the given engineer's mind. If these simulations are sufficiently precise, then they will be people in and of themselves. The simulations could cause those people to suffer, and will likely kill them by ending the simulation when the prediction or answer is given.Similarly, one can imagine that a predictor or oracle might

simulate powerful agents; that is, algorithms which efficiently maximize some utility function. Agents may be simulated because many agent-like entities exist in the real world, and their behavior would need to be modeled. Or, perhaps oracles would investigate agents for the purpose of answering questions better. These agents, while being simulated, may have goals that require acting independently of the oracle. These agents may also be more powerful than the oracles, especially since the oracles were not designed with self-improvement behavior in mind. Therefore these agents may attempt to “unbox” themselves from the simulation and begin controlling the rest of the universe. For instance, the agents may use previous questions given to the oracle to deduce the nature of the universe and the psychology of the oracle-creators. (For a fictional example, see That Alien Message.) Or, the agent might somehow distort the output of the predictor, in a way that what the oracle predicts will cause us to unbox the agent.Predictors also have the problem of

self-fulfilling prophecies(first suggested here). An arbitrarily accurate predictor will know that its prediction will affect the future. Therefore, to be a correct prediction, it must make sure that delivering its prediction doesn’t cause the receiver to act in a way that negates the prediction. Therefore, the predictor may have to choose between predictions which cause the receiver to act in a way that fulfills the prediction. This is a type of control over the user. Since the predictor is super-intelligent, any control may rapidly optimize the universe towards some unknown goal.Overall, there is a large worry that sufficiently intelligent oracles or predictors

may become agents. Beside the above possibilities, some are worried that intelligence is inherently an optimization process, and therefore oracles and predictors are inherently satisfying some utility function. This, combined with the fact that nothing can be causally isolated from the rest of the universe, seems to invite an eventual AI-takeoff.## Methods for avoiding computational hazards

It is often thought that, while no proposal has yet been shown safe from computational hazards, oracles and predictors are safer than deliberately agent-based AGI. Other methods have been proposed to make these even safer. Armstrong et al. describe many AI safety measures in general. Below we review some possible techniques for avoiding computational hazards specifically.

One obvious safety practice is to limit the complexity, or the size of computations. In general, this will also limit the algorithm below general intelligence, but it is a good step while progressing towards FAI. Indeed, it is clear that all current prediction or AI systems are too simple to either be general intelligences, or pose as a computational hazard.

A proposal for regulating complex oracles or predictors is to develop safety indicators. That is, develop some function that will evaluate the proposed algorithm or model, and return whether it is potentially dangerous. For instance, one could write a simple program that rejects running an algorithm if any part of it is isomorphic to the human genome (since DNA clearly creates general intelligence and people under the right circumstances). Or, to measure the impact of an action suggested by an oracle, one could ask how many humans would be alive one year after the action was taken.

But one could only run an algorithm if they were

sureit was not a person. A function that could evaluate an algorithm and return 0onlyif it isnota person is called a nonperson predicate. Some algorithms are obviously not people. For example, squaring the numbers from 1 to 100 will not simulate people. Any algorithm whose behavior is periodic with a short period is unlikely to be a person, or nearly any presently constructed software. But in general this seems extremely difficult to verify. It could be that writing nonperson predicates or other safety indicators is FAI-complete in that sense that if we solve them, we will have discovered friendliness theory. Furthermore, it may be that some attempts to evaluate whether an algorithm is a person actually causes a simulation of a person, by running parts of the algorithm, by modeling a person for comparison, or by other means. Similarly, it may be that attempts to investigate the friendliness of a particular agent cause that agent to unbox itself.Predictors seem to be one of the most goal-agnostic forms of AGI. This makes them a very attractive model in which to perfect safety. Some ideas for avoiding self-fulfilling predictions suggest that we ask the predictor to tell us what it would have predicted if we hadn’t asked (first suggested here). This frees the predictor from requiring itself to make predictions consistent with our behavior. Whether this will work depends on the exact process of the predictor; it may be so accurate that it cannot deal with counterfactuals, and will simply report that it would have predicted that we would have asked anyway. It is also problematic that the prediction is now inaccurate; because it has told us, we will act, possibly voiding any part of the prediction.

A very plausible but non-formal solution is to aim for a soft takeoff. For example, we could build a predictor that is not generally intelligent, and use it to investigate safe ways advance the situation. Perhaps we could use a sub-general intelligence to safely improve our own intelligence.

Have I missed any major examples in this post? Does “computation hazards” seem like a valid concept as distinct from other types of AI-risks?

## References

Armstrong S., Sandberg A., Bostrom N. (2012). “Thinking inside the box: using and controlling an Oracle AI”. Minds and Machines, forthcoming.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1-22, March 1964.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224-254, June 1964.