General purpose intelligence: arguing the Orthogonality thesis

[-]Eliezer Yudkowsky13y290

For utility function maximisers, the AIXI is the theoretically best agent there is, more successful at reaching its goals (up to a finite constant) than any other agent (Hutter, 2005).

False. AIXI as defined can maximize only a sensory reward channel, not a utility function over an environmental model with a known ontology. As Dewey demonstrates, this problem is not easy to fix; AIXI can have utility functions over (functions of) sensory data, but its environment-predictors vary freely in ontology via Solomonoff induction, so it can't have a predefined utility function over the future of its environment without major rewriting.

AIXI is the optimal function-of-sense-data maximizer for Cartesian agents with unbounded computing power and access to a halting oracle, in a computable environment as separated from AIXI by the Cartesian boundary, given that your prior belief about the possible environments matches AIXI's Solomonoff prior.

4Stuart_Armstrong13y

Thanks for the correction. Daniel hadn't mentioned that as a problem when he reviewed the paper, so, I took it as being at least approximately correct, but it is important to be as rigorous as possible. I'll see what can be rescued, and what needs to be reworked.

[-]Wei Dai13y220

Here's an attack on section 4.1. Consider the possibility that "philosophical ability" (something like the ability to solve confusing problems that can't be easily formalized) is needed to self-improve beyond some threshold of intelligence, and this same "philosophical ability" also reliably causes one to decide that some particular goal G is the right goal to have, and therefore beyond some threshold of intelligence all agents have goal G. To deny this possibility seems to require more meta-philosophical knowledge than we currently possess.

6Stuart_Armstrong13y

Yes, to deny it requires more meta-philosophical knowledge than we currently possess. But to affirm it as likely requires more meta-philosophical knowledge than we currently possess. My purpose is to show that it's very unlikely, not that it's impossible. Do you feel I didn't make that point? Should I have addressed "moral realism" explicitly? I didn't want to put down the words, because it raises defensive hackles if I start criticising a position directly.

6Wei Dai13y

Perhaps I should have said "To conclude that this possibility is very unlikely" instead of "To deny this possibility". My own intuition seems to assign a probability to it that is greater than "very unlikely" and this was largely unchanged after reading your paper. For example, many of the items in the list in section 4.5, that have to be true if orthogonality was false, can be explained by my hypothesis, and the rest do not seem very unlikely to begin with.

[-]Stuart_Armstrong13y100

My own intuition seems to assign a probability to it that is greater than "very unlikely"

Why? You're making an extraordinary claim. Something - undefined - called philosophical ability is needed (for some reason) to self improve and, for some extraordinary and unexplained reason, this ability causes an agent to have a goal G. Where goal G is similarly undefined.

Let me paraphrase: Consider the possibility that "mathematical ability" is needed to self-improve beyond some threshold of intelligence, and this same "mathematical ability" also reliably causes one to decide that some particular goal G is the right goal to have, and therefore beyond some threshold of intelligence all agents have goal G.

Why is this different? What in your intuition is doing the work "philosophical ability" -> same goals? If we call it something else than "philosophical ability", would you have the same intuition? What raises the status of that implication to the level that it's worthy of consideration?

I'm asking seriously - this is the bit in the argument I consistently fail to understand, the bit that never makes sense to me, but who's outline I can feel in most counterarguments.

8Wei Dai13y

It seems to me there are certain similarities and correlations between thinking about decision theory (which potentially makes one or an AI one builds more powerful) and thinking about axiology (what terminal goals one should have). They're both "ought" questions, and If you consider the intelligences that we can see or clearly reason about (individual humans, animals, Bayesian EU maximizer, narrow AIs that exist today), there seems a clear correlation between "ability to improve decision theory via philosophical reasoning" (as opposed to CDT-AI changing into XDT and then being stuck with that) and "tendency to choose one's goals via philosophical reasoning". One explanation for this correlation (and also the only explanation I can see at the moment, besides it being accidental) is that something we call "philosophical ability" is responsible for both. Assuming that's the case, that still leaves the question of whether philosophical ability backed up with enough computing power eventually leads to goal convergence. One major element of philosophical reasoning seems to be a distaste for and tendency to avoid arbitrariness. It doesn't seem implausible that for example "the ultimate philosopher" would decide that every goal except pursuit of pleasure / avoidance of pain is arbitrary (and think that pleasure/pain is not arbitrary due to philosophy-of-mind considerations).

5JGWeissman13y

If an agent has goal G1 and sufficient introspective access to know its own goal, how would avoiding arbirtrariness in its goals help it achieve goal G1 better than keeping goal G1 as its goal? I suspect we humans are driven to philosophize about what our goals ought to be by our lack of introspective access, and that searching for some universal goal, rather than what we ourselves want, is a failure mode of this philosophical inquiry.

[-]Wei Dai13y180

I think we don't just lack introspective access to our goals, but can't be said to have goals at all (in the sense of preference ordering over some well defined ontology, attached to some decision theory that we're actually running). For the kind of pseudo-goals we have (behavior tendencies and semantically unclear values expressed in natural language), they don't seem to have the motivational strength to make us think "I should keep my goal G1 instead of avoiding arbitrariness", nor is it clear what it would mean to "keep" such pseudo-goals as one self-improves.

What if it's the case that evolution always or almost always produces agents like us, so the only way they can get real goals in the first place is via philosophy?

5JGWeissman13y

The primary point of my comment was to argue that an agent that has a goal in the strong sense would not abandon its goal as a result of philosophical consideration. Your response seems more directed at my afterthought about how our intuitions based on human experience would cause us to miss the primary point. I think that we humans do have goals, despite not being able to consistantly pursue them. I want myself and my fellow humans to continue our subjective experiences of life in enjoyable ways, without modifying what we enjoy. This includes connections to other people, novel experiences, high challenge, etc. There is, of course, much work to be done to complete this list and fully define all the high level concepts, but in the end I think there are real goals there, which I would like to be embodied in a powerful agent that actually runs a coherent decision theory. Philosophy probably has to play some role in clarifying our "pseudo-goals" as actual goals, but so does looking at our "pseudo-goals", however arbitrary they may be.

5Wei Dai13y

Such an agent would also not change its decision theory as a result of philosophical consideration, which potentially limits its power. I wouldn't argue against this as written, but Stuart was claiming that convergence is "very unlikely" which I think is too strong.

2JGWeissman13y

I don't think that follows, or at least the agent could change its decision theory as a result of some consideration, which may or may not be "philosophical". We already have the example that a CDT agent that learns in advance it will face Newcomb's problem could predict it would do better if it switched to TDT.

2Wei Dai13y

I wrote earlier XDT (or in Eliezer's words, "crippled and inelegant form of TDT") is closer to TDT but still worse. For example, XDT would fail to acausally control/trade with other agents living before the time of its self-modification, or in other possible worlds.

0JGWeissman13y

Ah, yes, I agree that CDT would modify to XDT rather than TDT, though the fact that it self modifies at all shows that goal driven agents can change decision theories because the new decision theory helps it achieve its goal. I do think that it's important to consider how a particular decision theory can decide to self modify, and to design an agent with a decision theory that can self modify in good ways.

1Dolores198413y

Not strictly. If strongly goal'd agent determines that a different decision theory (or any change to itself) better maximizes its goal, it would adopt that new decision theory or change.

3CuSithBell13y

I agree that humans are not utility-maximizers or similar goal-oriented agents - not in the sense we can't be modeled as such things, but in the sense that these models do not compress our preferences to any great degree, which happens to be because they are greatly at odds with our underlying mechanisms for determining preference and behavior.

-6private_messaging13y

-2Juno_Watt12y

Avoiding arbitrariness is useful to epistemic rationality and therefore to instrumental rationality. If an AI has rationality as a goal it will avoid arbitrariness, whether or not that assists with G1.

2JGWeissman12y

Avoiding giving credence to arbitrary beliefs is useful to epistemic rationality and therefor to instrumental rationality, and therefor to goal G1. Avoiding arbitrariness in goals still does not help with achieving G1 if G1 is considered arbitrary. Be careful not to conflate different types of arbitrariness. Rationality is not an end goal, it is that which you do in pursuit of a goal that is more important to you than being rational.

-6TheAncientGeek12y

-2[anonymous]13y

Robin Hanson's 'far mode' (his take on construal level theory) is a plausible match to this 'something'. Hanson points out that far mode is about general categories and creative metaphors. This is a match to something from AGI research...categorization and analogical inference. This can be linked to Bayesian inference by considering analogical inference as a natural way of reasoning about 'priors'. A plausible explanation is that analogical inference is associated with sentience (subjective experience), as suggested by Douglas Hofstadter (who has stated he thinks 'analogies' are the core of conscious cognition). Since sentience is closely associated with moral reasoning, it's at least plausible that this ability could indeed give rise to converge on a particular G. Here is a way G can be defined: Analogical inference is concerned with Knowledge Representation (KR), so we could redefine ethics based on 'representations of values' ('narratives', which as Daniel Dennett has pointed out,indeed seem to be closely linked to subjective experience) rather than external consequences. At this point we can bring in the ideas of Schmidhuber and recall a powerful point made by Hanson (see below). For maximum efficiency, all AGIs with the aforementioned 'philosophical ability' (analogical inference and production of narratives) would try to minimize the complexity of the cognitive processes generating its internal narratives. This could place universal contraints of what these values are. For example, Schmidhuber pointed out that data compression could be used to get a precise definition of 'beauty'. Lets now recall a powerful point Hanson made a while back on OB: the brain/mind can be totally defined in terms of a 'signal processor'. Given this perspective, we could then view the correct G as the 'signal' and moral errors as 'noise'. Algorithmic information theory could then be used to define a complexity metric that would precisely define this G.

6Paul Crowley13y

Schmidthuber's definition of beauty is wrong. He says, roughly, that you're most pleased when after great effort you find a way to compress what was seemingly incompressible. If that were so, I could please you again and again by making up new AES keys with the first k bits random and the rest zero, and using them to generate and give you a few terabytes of random data. You'd have to brute force the key, at which point you'll have compressed down from terabytes to kilobytes. What beauty! Let's play the exact game again, with the exact same cipher but a different key, forever.

0Will_Newsome13y

Right. That said, wireheading, aka the grounding problem, is a huge unsolved philosophical problem, so I'm not sure Schmidhuber is obligated to answer wireheading objections to his theory.

5CuSithBell13y

But the theory fails because this fits it but isn't wireheading, right? It wouldn't actually be pleasing to play that game.

5wedrifid13y

I think you are right. The two are errors that practically, with respect to hedonistic extremism, operate in opposing directions. They are similar in form in as much as they fit the abstract notion "undesirable outcomes due to lost purposes when choosing to optimize what turns out to be a poor metric for approximating actual preferences".

0Will_Newsome13y

Meh, yeah, maybe? Still seems like other, more substantive objections could be made. Relatedly, I'm not entirely sure I buy Steve's logic. PRNGs might not be nearly as interesting as short mathematical descriptions of complex things, like Chaitin's omega. Arguably collecting as many bits of Chaitin's omega as possible, or developing similar maths, would in fact be interesting in a human sense. But at that point our models really break down for many reasons, so meh whatever.

4wedrifid13y

Unsolved philsophical problem? Huh? No additional philosophical breakthroughs are required for wireheading to not be a problem. If I want (all things considered, etc) to wirehead, I'll wirehead. If I don't want to wirehead I will not wirehead. Wireheading introduces no special additional problems and is handled the same way all other preferences about future states of the universe can be handled. (Note: It is likely that you have some more specific point regarding in what sense you consider wireheading 'unsolved'. I welcome explanations or sources.)

0Will_Newsome13y

Unsolved in the sense that we don't know how to give computer intelligences intentional states in a way that everyone would be all like "wow that AI clearly has original intentionality and isn't just coasting off of humans sitting at the end of the chain interpreting their otherwise entirely meaningless symbols". Maybe this problem is just stupid and will solve itself but we don't know that yet, hence e.g. Peter's (unpublished?) paper on goal stability under ontological shifts. (ETA: I likely don't understand how you're thinking about the problem.)

1wedrifid13y

Being able to do this would also be a step towards the related goal of trying to give computer intelligences intelligence that we cannot construe as 'intentionality' in any morally salient sense, so as to satisfy any "house-elf-like" qualms that we may have. I assume you mean Ontological Crises in Artificial Agents’ Value Systems? I just finished republishing that one. Originally published form. New SingInst style form. A good read.

-4private_messaging13y

Engineering ability suffices: http://lesswrong.com/lw/cej/general_purpose_intelligence_arguing_the/6lst Do philosophers have an incredibly strong ugh field around anything that can be deemed 'implementation detail'? Clearly, 'superintelligence' the string of letters can have what ever 'goals' the strings of letters, no objection here. The superintelligence in form of distributed system with millisecond or worse lag between components, and nanosecond or better clock speed, on the other hand...

2Stuart_Armstrong13y

Looking at your post at http://lesswrong.com/lw/2id/metaphilosophical_mysteries, I can see the sketch of an argument. It goes something like "we know that some decision theories/philosophical processes are 'objectively 'inferior, hence some are objectively superior, hence (wave hands furiously) it is at least possible that some system is objectively best". I would counter: 1) The argument is very weak. We know some mathematical axiomatic systems are contradictory, hence inferior. It doesn't follow from that that there is any "best" system of axioms. 2) A lot of philosophical progress is entirely akin to mathematical progress: showing the consequences of the axioms/assumptions. This is useful progress, but not really relevant to the argument. 3) All the philosophical progress seems to lie on the "how to make better decisions given a goal" side; none of it lies on the "how to have better goals" side. Even the expected utility maximisation result just says "if you are unable to predict effectively over the long term, then to achieve your current goals, it would be more efficient to replace these goals with others compatible with a utility function". However, despite my objections, I have to note that the argument is at least an argument, and provides some small evidence in that direction. I'll try and figure out whether it should be included in the paper.

5private_messaging13y

Other possibility that is easy to see if you are to think more like an engineer and less like philosopher: The AI is to operate with light-speed delay, and has to be made of multiple nodes. It is entirely possible that some morality systems would not allow efficient solutions to this challenge (i.e. would break into some sort of war between modules, or otherwise fail to intellectually collaborate). It is likely that there's only a limited number of good solutions to P2P intelligence design, and the one that would be found would be substantially similar to our own solution of fundamentally same problem, solution which we call 'morality', complete with various non-utilitarian quirks. edit: that is, our 'morality' is the set of rules for inter-node interaction in society, and some of such rules just don't work. Orthogonality thesis for anything in any sense practical is a conjunction of potentially very huge number of propositions (which are assumed false without consideration, by omission) - any sort of consideration not yet considered can break the symmetry between different goals, then another such consideration is incredibly unlikely to add symmetry back.

2JGWeissman13y

If an agent with goal G1 acquires sufficient "philosophical ability", that it concludes that goal G is the right goal to have, that means that it decided that the best way to achieve goal G1 is to pursue goal G. For that to happen, I find it unlikely that goal G is anything other than a clarification of goal G1 in light of some confusion revealed by the "philosophical ability", and I find it extremely unlikely that there is some universal goal G that works for any goal G1.

7Will_Newsome13y

Offbeat counter: You're assuming that this ontology that privileges "goals" over e.g. morality is correct. What if it's not? Are you extremely confident that you've carved up reality correctly? (Recall that EU maximizers haven't been shown to lead to AGI, and that many philosophers who have thought deeply about the matter hold meta-ethical views opposed to your apparent meta-ethics.) I.e., what if your above analysis is not even wrong?

3JGWeissman13y

I don't believe that goals are ontologically fundamental. I am reasoning (at a high level of abstraction) about the behavior of a physical system designed to pursue a goal. If I understood what you mean by "morality", I could reason about a physical system designed to use that and likely predict different behaviors than for the physical system designed to pursue a goal, but that doesn't change my point about what happens with goals. I don't expect EU maximizers to lead to AGI. I expect EU maximizing AGIs, whatever has led to them, to be effective EU maximizers.

5Will_Newsome13y

Sorry, I meant "ontology" in the information science sense, not the metaphysics sense; I simply meant that you're conceptually (not necessarily metaphysically) privileging goals. What if you're wrong to do that? I suppose I'm suggesting that carving out "goals" might be smuggling in conclusions that make you think universal convergence is unlikely. If you conceptually privileged rational morality instead, as many meta-ethicists do, then your conclusions might change, in which case it seems you'd have to be unjustifiably confident in your "goal"-centric conceptualization.

1JGWeissman13y

I think I am only "privileging" goals in a weak sense, since by talking about a goal driven agent, I do not deny the possibility of an agent built on anything else, including your "rational morality", though I don't know what that is. Are you arguing that a goal driven agent is impossible? (Note that this is a stronger claim than it being wiser to build some other sort of agent, which would not contradict my reasoning about what a goal driven agent would do.)

0Will_Newsome13y

(Yeah, the argument would have been something like, given a sufficiently rich and explanatory concept of "agent", goal-driven agents might not be possible --- or more precisely, they aren't agents insofar as they're making tradeoffs in favor of local homeostatic-like improvements as opposed to traditionally-rational, complex, normatively loaded decision policies. Or something like that.)

0amcknight13y

Let me try to strengthen your point. If an agent with goal G1 acquires sufficient "philosophical ability", that it concludes that goal G is the right goal to have, that means that it decided that the best way to achieve goal G1 is to pursue what it thinks is the "right goal to have". This would require it to take a kind of normative stance on goal fulfillment, which would require it to have normative machinery, which would need to be implemented in the agents mind. Is it impossible to create an agent without normative machinery of this kind? Does philosophical ability depend directly on normative machinery?

-6Juno_Watt12y

[-]A1987dM13y140

‘maximising paperclips’

Since you want a non-LWian audience, make that “maximising the number of paperclips in the universe”, otherwise the meaning might be unclear.

5MaoShan13y

Although, his point would still hold if the reader was imagining the goal of making extremely large paperclips.

[-]Wei Dai13y130

Couple of comments:

The section "Bayesian Orthogonality thesis" doesn't seem right, since a Bayesian would think in terms of probabilities rather than possibilities ("could construct superintelligent AIs with more or less any goals"). If you're saying that we should assign a uniform distribution for what AI goals will be realized in the future, that's clearly wrong.
I think the typical AI researcher, after reading this paper, will think "sure, it might be possible to build agents with arbitrary goals if one tried, but my approach will probably lead to a benevolent AI". (See here for an example of this.) So I'm not sure why you're putting so much effort into this particular line of argument.

6Stuart_Armstrong13y

This is the first step (pointed more towards philosophers). Formalise the "we could construct an AI with arbitrary goals", and with that in the background, zoom in on the practical arguments with the AI researchers. Will restructure the Bayesian section. Some philosophers argue things like "we don't know what moral theories are true, but a rational being would certainly find them"; I want to argue that this is equivalent, from our perspective, with the AI's goals ending up anywhere. What I meant to say is that ignorance of this type is like any other type of ignorance, hence the "Bayesian" terminology.

6Wei Dai13y

Ok, in that case I would just be wary about people being tempted to cite the paper to AI researchers without having the followup arguments in place, who would then think that their debating/discussion partners are attacking a strawman.

2Stuart_Armstrong13y

Hum, good point; I'll try and put in some disclaimer, emphasising that this is a partial result...

1Wei Dai13y

Thanks. To go back to my original point a bit, how useful is it to debate philosophers about this? (When debating AI researchers, given that they probably have a limited appetite for reading papers arguing that what they're doing is dangerous, it seems like it would be better to skip this paper and give the practical arguments directly.)

4Stuart_Armstrong13y

Maybe I've spent too much time around philosophers - but there are some AI designers who seem to spout weak arguments like that, and this paper can't hurt. When we get a round to writing a proper justification for AI researchers, having this paper to refer back to avoids going over the same points again. Plus, it's a lot easier to write this paper first, and was good practice.

3jacob_cannell13y

Without getting in to the likelihood of a 'typical AI researcher' successfully creating a benevolent AI, do you doubt Goertzel's "Interdependency Thesis"? I find both to be rather obviously true. Yes its possible in principle for almost any goal system to be combined with almost any type or degree of intelligence, but that's irrelevant because in practice we can expect the distributions over both to be highly correlated in some complex fashion. I really don't understand why this Orthogonality idea is still brought up so much on LW. It may be true, but it doesn't lead to much. The space of all possible minds or goal systems is about as relevant to the space of actual practical AIs as the space of all configuration of a human's molecules is to the space of a particular human's set of potential children.

[-]Will_Newsome13y120

We will also take the materialistic position that humans themselves can be viewed as non-deterministic algorithms[2]

I'm not a philosopher of mind but I think "materialistic" might be a misleading word here, being too similar to "materialist". Wouldn't "computationalistic" or maybe "functionalistic" be more precise? ("-istic" as opposed to "-ist" to avoid connotational baggage.) Also it's ambiguous whether footnote two is a stipulation for interpreting the paper or a brief description of the consensus view in physics.

At various points you make somewhat bold philosophical or conceptual claims based off of speculative mathematical formalisms. Even though I'm familiar with and have much respect for the cited mathematics, this still makes me nervous, because when I read philosophical papers that take such an approach my prior is high for subtle or subtly unjustified equivocation; I'd be even more suspicious were I a philosopher who wasn't already familiar with universal AI, which isn't a well-known or widely respected academic subfield. The necessity of finding clearly trustworthy analogies between mathematical and phenomena... (read more)

[-]MaoShan13y70

Just some minor text corrections for you:

From 3.1

The utility function picture of a rational agent maps perfectly onto the Orthogonality thesis: here have the goal structure, the utility fu...

...could be "here we have the...

From 3.2

Human minds remain our only real model of general intelligence, and this strongly direct and informs...

this strongly directs and informs...

From 4.1

“All human-designed rational beings would follow the same morality (or one of small sets of moralities)” sound plausible; in contract “All human-designed superefficient

... (read more)

6Simon Fischer13y

From 3.3 to do so(?) we would From 3.4 of a single given agent From 4.1 every, or change the rest of the sentence (superintelligences, they were) From 4.5

[-]drnickbone13y60

I like the paper, but am wondering how (or whether) it applies to TDT and acausal trading. Doesn't the trading imply a form of convergence theorem among very powerful TDT agents (they should converge on an average utility function constructed across all powerful TDT agents in logical space)?

Or have I missed something here? (I've been looking around on Less Wrong for a good post on acausal trading, and am finding bits and pieces, but no overall account.)

4Postal_Scale13y

It does indeed imply a form of convergence. I would assume Stuart thinks of the convergence as an artifact of the game environment the agents are in. Not a convergence in goals, just behavior. Albeit the results are basically the same.

6Wei Dai13y

If there's convergence in goals, then we don't have to worry about making an AI with the wrong goals. If there's only convergence in behavior, then we do, because building an AI with the wrong goals will shift the convergent behavior in the wrong direction. So I think it makes sense for Stuart's paper to ignore acausal trading and just talk about whether there is convergence in goals.

3Eugine_Nier13y

Not necessarily, it might destroy the earth before its goals converge.

3Vladimir_Nesov13y

Global scale acausal trading, if it's possible in practice (and it's probably not going to be, we only have this theoretical possibility but no indication that it's possible to actually implement), implies uniform expected surface behavior of involved agents, but those agents trade control over their own resources (world) for optimization of their own particular preference by the global acausal economy. So even if the choice of AI's preference doesn't have significant impact on what happens in AI's own world, it does have significant impact on what happens globally, on the order of what all the resources in AI's own world can buy.

-2Johnicholas13y

There was an incident of censorship by EY relating to acausal trading - the community's confused response (chilling effects? agreement?) to that incident explains why there is no overall account.

7Wei Dai13y

No, I think it's more that the idea (acausal trading) is very speculative and we don't have a good theory of how it might actually work.

1drnickbone13y

Thanks for this... Glad it's not being censored! I did post the following on one of the threads, which suggested to me a way in which it would happen or at least get started Again, apologies if this idea is nuts or just won't work. However, if true, it did strike me as increasing the chance of a simulation hypothesis. (It gives powerful TDT AIs a motivation to simulate as many civilizations as they can, and in a "state of nature", so that they get to see what the utility functions are like, and how likely they are to also build TDT-implementing AIs...)

2timtyler13y

It was censored, though there's a short excerpt here.

0amcknight13y

By the way, I still can't stop thinking about that post after 6 months. I think it's my favorite wild-idea scenario I've ever heard of.

[-]Paul Crowley13y60

If a goal is a preference order over world states, then there are uncountably many of them, so any countable means of expression can only express a vanishingly small minority of them. Trivially (as Bostrom points out) a goal system can be too complex for an agent of a given intelligence. It therefore seems to me that what we're really defending is an Upscalability thesis: if an agent A with goal G is possible, then a significantly more intelligent A++ with goal G is possible.

[-]gRR13y50

Thus to deny the Orthogonality thesis is to assert that there is a goal system G, such that, among other things:
(1) There cannot exist any efficient real-world algorithm with goal G.
(2) If a being with arbitrarily high resources, intelligence, time and goal G, were to try design an efficient real-world algorithm with the same goal, it must fail.
(3) If a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, i

... (read more)

2Stuart_Armstrong13y

As I said to Wei, we can start dealing with those arguments once we've got strong foundations. I'll see if the value drift issue can be better integrated in the argumentation.

[-]JoshuaZ13y30

our race spans foot-fetishists, religious saints, serial killers, instinctive accountants, role-players, self-cannibals, firefighters and conceptual artists. The autistic, those with exceptional social skills, the obsessive compulsive and some with split-brains. Beings of great empathy and the many who used to enjoy torture and executions as public spectacles

Some of these are not really terminal goals. A fair number of people with strong sexual fetishes would be perfectly happy without them, and in more extreme cases really would prefer not to have them... (read more)

[-]Luke_A_Somers13y120

A fair number of people with strong sexual fetishes

there are some serial killers

It was an existence argument. That some more people aren't examples doesn't really change matters, does it?

[-]A1987dM13y20

to avoid worrying about robot bodies and such-like, we may restrict the list of tasks to those accomplishable over the internet

Many of the tasks I accomplish over the internet require there to be people who know me in real life, some require me to have a body and voice which looks and sounds human (in photos and videos at least) and a few require me to be enrolled in my university, have a bank account, be a citizen of my country, vel sim. (Adding “anonymously” and “for free” ought to fix that.)

[-]timtyler13y20

I don't see why there are only two counter-theses in section 4. Or rather, it looks as though you want a too-strong claim - in order to criticise it.

Try a "partial convergence" thesis instead. For instance, the claim that goals that are the product of cultural or organic evolution tend to maximise entropy and feature universal instrumental values.

2Stuart_Armstrong13y

The incompleteness claim is weaker than the partial convergence claim.

2timtyler13y

Sure, but if you try harder with counter-theses you might reach a reasonable position that's neither very weak nor wrong.

[-]Johnicholas13y10

Minor text correction;

"dedicated committee of human-level AIs dedicated" repeats the same adjective in a small span.

More wide-ranging:

Perhaps the paper would be stronger if it explained why philosophers might feel that convergence is probable. For example, in their experience, human philosophers / philosophies converge.

In a society, where the members are similar to one another, and much less powerful than the society as a whole, the morality endorsed by the society might be based on the memes that can spread successfully. That is, a meme like '... (read more)

-1Stuart_Armstrong13y

I'm deliberately avoiding that route. If I attack, or mention, moral realism in any form, philosophers are going to get defensive. I'm hoping to skirt the issue by narrowing the connotations of the terms (efficiency rather than intelligence and, especially, rationality).

5Wei Dai13y

You don't think a moral realist will notice that your paper contradicts moral realism and get defensive anyway? Can you write out the thoughts that you're hoping a moral realist will have after reading your paper?

2Stuart_Armstrong13y

Less so. "All rational beings will be moral, but this paper worries me that AI, while efficient, may not end up being rational. Maybe it's worth worrying about."

2Wei Dai13y

Why not argue for this directly, instead of making a much stronger claim ("may not" vs "very unlikely")? If you make a claim that's too strong, that might lead people to dismiss you instead of thinking that a weaker version of the claim could still be valid. Or they could notice holes in your claimed position and be too busy trying to think of attacks to have the thoughts that you're hoping for. (But take this advice with a big grain of salt since I have little idea how academic philosophy works in practice.)

0Stuart_Armstrong13y

Actually scratch that and reverse it - I've got an idea how to implement your idea in a nice way. Thanks!

0Stuart_Armstrong13y

I'm not an expert on academic philosophy either. But I feel the stronger claim might work better; I'll try and hammer the point "efficiency is not rationality" again and again.

-2[anonymous]13y

I'm confused. "May not" is weaker than "very unlikely," in the supplied context.

[-]Paul Crowley13y10

Copying from a comment I already made cos no-one responded last time:

I'm not confident about any of the below, so please add cautions in the text as appropriate.

The orthogonality thesis is both stronger and weaker than we need. It suffices to point out that neither we nor Ben Goertzel know anything useful or relevant about what goals are compatible with very large amounts of optimizing power, and so we have no reason to suppose that superoptimization by itself points either towards or away from things we value. By creating an "orthogonality thesis&quo... (read more)

0jacob_cannell13y

The orthogonality thesis is non-controversial. Ben's point is that what matters is not the question of what types of goals are theoretically compatible with superoptimization, but rather what types of goals we can expect to be associated with superoptimization in reality. In reality AGI's with superoptimization power will be created by human agencies (or their descendants) with goal systems subject to extremely narrow socio-economic filters. The other tangential consideration is that AGI's with superoptimization power and long planning horizons/zero time discount may have highly convergent instrumental values/goals which are equivalent in effect to terminal values/goals for agents with short planning horizons (such as humans). From a human perspective, we may observe all super-AGIs to appear to have strangely similar ethics/morality/goals, even though what we are really observing are convergent instrumental values and short term opening plans as their true goals concern the end of the universe and are essentially unknowable to us.

8Stuart_Armstrong13y

The orthogonality thesis is highly controversial - among philosophers.

2Paul Crowley13y

Right, but none of this answers what I was trying to say, which is that the burden of proof is definitely with whoever wants to assert that superintelligence tells us anything about goals. In the absence of a specific argument, "this agent is superintelligent" shouldn't be taken as informative about its goals.

8jacob_cannell13y

A superintelligent agent doesn't just appear ex nihilio as a random sample out of the space of possible minds. Its existence requires a lengthy, complex technological development which implies the narrow socio-economic filter I mentioned above. Thus "this agent is superintelligent" is at least partially informative about the probability landscape over said agent's goals: they are much more likely than not to be related to or derived from prior goals of the agent's creators.

2Paul Crowley13y

Right, and that's one example of a specific argument. Another is the Gödelian and self-defeating examples in the main article. But neither of these do anything to prop up the Goertzel-style argument of "a superintelligence won't tile the Universe with smiley faces, because that's a stupid thing to do".

-2private_messaging13y

Well, Goertzel's argument is pretty much bulletproof-correct when it comes to learning algorithms like the ones he works at, where the goal is essentially set by training, alongside with human culture and human notion of stupid goal. I.e. the AI that reuses human culture as a foundation for superhuman intelligence. Ultimately, orthogonality dissolves once you start being specific what intelligence we're talking of - assume that it has speed of light lag and is not physically very small, and it dissolves, assume that it is learning algorithm that gets to adult human level by absorbing human culture, and it dissolves, etc etc. The orthogonality thesis is only correct in the sense that being entirely ignorant of the specifics of what the 'intelligence' is you can't attribute any qualities to it, which is trivially correct.

-2jacob_cannell13y

While that specific Goertzel-style argument is not worth bothering with, the more supportable version of that line of argument is: based on the current socio-economic landscape of earth, we can infer something of the probability landscape over near future earth superintelligent agent goal systems, namely that they will be tightly clustered around regions in goal space that are both economically useful and achievable. Two natural attractors in that goal space will be along the lines of profit maximizers or intentionally anthropocentric goal systems. The evidence for this distribution over goal space is already rather abundant if one simply surveys existing systems and research. Market evolutionary forces make profit maximization a central attractor, likewise socio-cultural forces pull us towards anthropocentric goal systems (and of course the two overlap). The brain reverse engineering and neuroscience heavy tract in the AGI field in particular should eventually lead to anthropocentric designs, although it's worth mentioning that some AGI researches (ie opencog) are aiming for explicit anthropocentric goal systems without brain reverse engineering.

0Paul Crowley13y

Isn't that specific Goertzel-style argument the whole point of the Orthogonality Thesis? Even in its strongest form, the Thesis doesn't do anything to address your second paragraph.

0jacob_cannell13y

I'm not sure. I don't think the specific quote of Goertzel is an accurate summary of his views, and the real key disagreements over safety concern this admittedly nebulous distribution of future AGI designs and goal systems.

[-]jacob_cannell13y00

I don't think section 4.1 defeats your wording of your Convergence Thesis.

Convergence: all human-designed superintelligences would have one of a small set of goals.

The way you have worded this, I read it as trivially true. The set of human designed superintelligences is necessarily a tiny subset of the space of all superintelligences, and thus the set of dependent goals of human-designed superintelligences is a tiny subset of the space of all goals.

Much depends on your useage of 'small'. Small relative to what?

I think you should clarify notions of conver... (read more)

[-]Shmi13y00

Who is your target audience? Can you pretend to be the actual person you are trying to convince and do your absolute best to demolish the arguments presented in this paper? (You can find their arguments in their publications and apply them to your paper.) And no counter-objections until you finished writing what essentially is a referee report. If you need some extra motivation, pretend that you are being paid $100 for each argument that convinces the rest of the audience and $1000 for each argument that convinces the paper author. When done, post the referee report here, and people will tell you whether you did a good job.

[-]Stuart_Armstrong13y100

Can you pretend to be the actual person you are trying to convince and do your absolute best to demolish the arguments presented in this paper?

No, I cannot. I've read the various papers, and they all orbit around an implicit and often unstated moral realism. I've also debated philosophers on this, and the same issue rears its head - I can counter their arguments, but their opinions don't shift. There is an implicit moral realism that does not make any sense to me, and the more I analyse it, the less sense it makes, and the less convincing it becomes. Every time a philosopher has encouraged me to read a particular work, it's made me find their moral realism less likely, because the arguments are always weak.

I can't really put myself in their shoes to successfully argue their position (which I could do with theism, incidentally). I've tried and failed.

If someone can help we with this, I'd be most grateful. Why does "for reasons we don't know, any being will come to share and follow specific moral principles (but we don't know what they are)", rise to seem plausible?

4davidpearce13y

Just how diverse is human motivation? Should we discount even sophisticated versions of psychological hedonism? Undoubtedly, the "pleasure principle" is simplistic as it stands. But one good reason not to try heroin, for example, is precisely that the reward architecture of our opioid pathways is so similar. Previously diverse life-projects of first-time heroin users are at risk of converging on a common outcome. So more broadly, let's consider the class of life-supporting Hubble volumes where sentient biological robots acquire the capacity to rewrite their genetic source code and gain mastery of their own reward circuitry. May we predict orthogonality or convergence? Certainly, there are strong arguments why such intelligences won't all become the functional equivalent of heroin addicts or wireheads or Nozick Experience Machine VR-heads (etc). One such argument is the nature of selection pressure. But _if_some version of the pleasure principle is correct, then isn't some version of the convergence conjecture at least feasible, i.e. they'll recalibrate the set-point of their hedonic treadmill and enjoy gradients of (super)intelligent (super)happiness? One needn't be a meta-ethical value-realist to acknowledge that subjects of experience universally find bliss is empirically more valuable than agony or despair. The present inability of natural science to explain first-person experiences doesn't confer second-rate ontological status. If I may quote physicist Frank Wiczek, "It is reasonable to suppose that the goal of a future-mind will be to optimize a mathematical measure of its well-being or achievement, based on its internal state. (Economists speak of 'maximizing utility'', normal people of 'finding happiness'.) The future-mind could discover, by its powerful introspective abilities or through experience, its best possible state the Magic Moment - or several excellent ones. It could build up a library of favourite states. That would be like a library of favourite

2JonatasMueller13y

David, what are those multiple possible defeaters for convergence? As I see it, the practical defeaters that exist still don't affect the convergence thesis, they just are possible practical impediments, from unintelligent agents, to the realization of the goals of convergence.

2TheOtherDave13y

I usually treat this behavior as something similar to the availability heuristic. That is, there's a theory that one of the ways humans calibrate our estimates of the likelihood of an event X is by trying to imagine an instance of X, and measuring how long that takes, and calculating our estimate of probability inverse-proportionally to the time involved. (This process is typically not explicitly presented to conscious awareness.) If the imagined instance of X is immediately available, we experience high confidence that X is true. That mechanism makes a certain amount of rough-and-ready engineering sense, though of course it has lots of obvious failure modes, especially as you expand the system's imaginative faculties. Many of those failure modes are frequently demonstrated in modern life. The thing is, we use much of the same machinery that we evolved for considering events like "a tiger eats my children" to consider pseudo-events like "a tiger eating my children is a bad thing." So it's easy for us to calibrate our estimates of the likelihood that a tiger eating my children is a bad thing in the same way: if an instance of a tiger eating my children feeling like a bad thing is easy for me to imagine, I experience high confidence that the proposition is true. It just feels obvious. I don't think this is quite the same thing as moral realism, but when that judgment is simply taken as an input without being carefully examined, the result is largely equivalent. Conversely, the more easily I can imagine a tiger eating my children not feeling like a bad thing, the lower that confidence. More generally, the more I actually analyze (rather than simply referencing) my judgments, the less compelling this mechanism becomes. What I expect, given the above, is that if I want to shake someone off that kind of naive moral realist position, it helps to invite them to consider situations in which they arrive at counterintuitive (to them) moral judgments. The more I do this,

1Stuart_Armstrong13y

But philosophers are extremely fond of analysis, and make great use of trolley problems and similar edge cases. I'm really torn - people who seem very smart and skilled in reasoning take positions that seem to make no sense. I keep telling myself that they are probably right and I'm wrong, but the more I read about their justifications, the less convincing they are...

2TheOtherDave13y

Yeah, that's fair. Not all philosophers do this, any more than all computer programmers come up with test cases to ensure their code is doing what it ought, but I agree it's a common practice. Can you summarize one of those positions as charitably as you're able to? It might be that given that someone else can offer an insight that extends that structure.

2Stuart_Armstrong13y

"There are sets of objective moral truths such that any rational being that understood them would be compelled to follow them". The arguments seem mainly to be: 1) Playing around with the meaning of rationality until you get something ("any rational being would realise their own pleasure is no more valid than that of others" or "pleasure is the highest principle, and any rational being would agree with this, or else be irrational") 2) Convergence among human values. 3) Moral progress for society: we're better than we used to be, so there needs to be some scale to measure the improvements. 4) Moral progress for individuals: when we think about things a lot, we make better moral decisions than when we were young and naive. Hence we're getting better a moral reasoning, so these is some scale on which to measure this. 5) Playing around with the definition of "truth-apt" (able to have a valid answer) in ways that strike me, uncharitably, as intuition-pumping word games. When confronted with this, I generally end up saying something like "my definitions do not map on exactly to yours, so your logical steps are false dichotomies for me". 6) Realising things like "if you can't be money pumped, you must be an expected utility maximiser", which implies that expected utility maximisation is superior to other reasoning, hence that there are some methods of moral reasoning which are strictly inferior. Hence there must be better ways of moral reasoning and (this is the place where I get off) a single best way (though that argument is generally implicit, never explicit).

5TheOtherDave13y

(nods) Nice. OK, so let me start out by saying that my position is similar to yours... that is, I think most of this is nonsense. But having said that, and trying to adopt the contrary position for didactic purposes... hm. So, a corresponding physical-realist assertion might be that there are sets of objective physical structures such that any rational being that perceived the evidence for them would be compelled to infer their existence. (Yes?) Now, why might one believe such a thing? Well, some combination of reasons 2-4 seems to capture it. That is: in practice, there at least seem to be physical structures we all infer from our senses such that we achieve more well-being with less effort when we act as though those structures existed. And there are other physical structures that we infer the existence of via a more tenuous route (e.g., the center of the Earth, or Alpha Centauri, or quarks, or etc.), to which #2 doesn't really apply (most people who believe in quarks have been taught to believe in them by others; they mostly didn't independently converge on that belief), but 3 and 4 do... when we posit the existence of these entities, we achieve worthwhile things that we wouldn't achieve otherwise, though sometimes it's very difficult to express clearly what those things actually are. (Yes?) So... ok. Does that case for physical realism seem compelling to you? If so, and if arguments 2-4 are sufficient to compel a belief in physical realism, why are their analogs insufficient to compel a belief in moral realism?

0Stuart_Armstrong13y

No - to me it just highlights the difference between physical facts and moral facts, making them seem very distinct. But I can see how if we had really strong 2-4, it might make more sense...

1TheOtherDave13y

I'm not quite sure I understood you. Are you saying "no," that case for physical realism doesn't seem compelling to you? Or are you saying "no," the fact that such a case can compellingly be made for physical realism does not justify an analogous case for moral realism?

0Stuart_Armstrong13y

The second one!

4TheOtherDave13y

So, given a moral realist, Sam, who argued as follows: "We agree that humans typically infer physical facts such that we achieve more well-being with less effort when we act as though those facts were actual, and that this constitutes a compelling case for physical realism. It seems to me that humans typically infer moral facts such that we achieve more well-being with less effort when we act as though those facts were actual, and I consider that an equally compelling case for moral realism." ...it seems you ought to have a pretty good sense of why Sam is a moral realist, and what it would take to convince Sam they were mistaken. No?

0Stuart_Armstrong13y

Interesting perspective. Is this an old argument, or a new one? (seems vaguely similar to the Pascalian "act as if you believe, and that will be better for you"). It might be formalisable in terms of bounded agents and stuff. What's interesting is that though it implies moral realism, it doesn't imply the usual consequence of moral realism (that all agents converge on one ethics). I'd say I understood Sam's position, and that he has no grounds to disbelieve orthogonality!

0TheOtherDave13y

I'd be astonished if it were new, but I'm not knowingly quoting anyone. As for orthogonality.. well, hm. Continuing the same approach... suppose Sam says to you: "I believe that any two sufficiently intelligent, sufficiently rational systems will converge on a set of confidence levels in propositions about physical systems, both coarse-grained (e.g., "I'm holding a rock") and fine-grained (e.g. some corresponding statement about quarks or configuration spaces or whatever). I believe that precisely because I'm a de facto physical realist; whatever it is about the universe that constrains our experiences such that we achieve more well-being with less effort when we act as though certain statements about the physical world are true and other statements are not, I believe that's an intersubjective property -- the things that it is best for me to believe about the physical world are also the things that it is best for you to believe about the physical world, because that's just what it means for both of us to be living in the same real physical world. For precisely the same reasons, I believe that any two sufficiently intelligent, sufficiently rational systems will converge on a set of confidence levels in propositions about moral systems." You consider that reasoning ungrounded. Why?

5Stuart_Armstrong13y

1) Evidence. There is a general convergence on physical facts, but nothing like a convergence on moral facts. Also, physcial facts, since science, are progressive (we don't say Newton was wrong, we say we have a better theory of which his was an approximation to). 2) Evidence. We have established what counts as evidence for a physical theory (and have, to some extent, separated it from simply "everyone believes this"). What then counts as evidence for a moral theory?

7TheOtherDave13y

Awesome! So, reversing this, if you want to understand the position of a moral realist, it sounds like you could consider them in the position of a physical realist before the Enlightenment. There was disagreement then about underlying physical theory, and indeed many physical theories were deeply confused, and the notion of evidence for a physical theory was not well-formalized, but if you asked a hundred people questions like "is this a rock or a glass of milk?" you'd get the same answer from all of them (barring weirdness), and there were many physical realists nevertheless based solely on that, and this is not terribly surprising. Similarly, there is disagreement today about moral theory, and many moral theories are deeply confused, and the notion of evidence for a moral theory is not well-formalized, but if you ask a hundred people questions like "is killing an innocent person right or wrong?" you'll get the same answer from all of them (barring weirdness), so it ought not be surprising that there are many moral realists based on that.

2Desrtopa13y

I think there may be enough "weirdness" in response to moral questions that it would be irresponsible to treat it as dismissible.

0TheOtherDave13y

Yes, there may well be.

1Stuart_Armstrong13y

Interesting. I have no idea if this is actually how moral realists think, but it does give me a handle so that I can imagine myself in that situation...

0TheOtherDave13y

Sure, agreed. I suspect that actual moral realists think in lots of different ways. (Actual physical realists do, too.) But I find that starting with an existence-proof of "how might I believe something like this?" makes subsequent discussions easier.

1Peterdjones12y

I could add: Objective punishments and rewards need objective justification.

-2Peterdjones12y

From my perspective, treating rationality as always instrumental, and never a terminal value is playing around with it's traditional meaning. (And indiscriminately teaching instrumental rationality is like indiscriminately handing out weapons. The traditional idea, going back to st least Plato, is that teaching someone to be rational improves them...changes their values)

0JonatasMueller13y

Stuart, here is a defense of moral realism: http://lesswrong.com/lw/gnb/questions_for_moral_realists/8g8l My paper which you cited needs a bit of updating. Indeed some cases might lead a superintelligence to collaborate with agents without the right ethical mindset (unethical), which constitutes an important existential risk (a reason why I was a bit reluctant to publish much about it). However, isn't the orthogonality thesis basically about the orthogonality between ethics and intelligence? In that case, the convergence thesis is would not be flawed if some unintelligent agents kidnap and force an intelligent agent to act unethically.

-1JonatasMueller13y

Another argumentation for moral realism: 1. Let's imagine starting with a blank slate, the physical universe, and building ethical value in it. Hypothetically in a meta-ethical scenario of error theory (which I assume is where you're coming from), or possible variability of values, this kind of "bottom-up" reasoning would make sense for more intelligent agents that could alter their own values, so that they could find, from "bottom-up", values that could be more optimally produced, and also this kind of reasoning would make sense for them in order to fundamentally understand meta-ethics and the nature of value. 2. In order to connect to the production of some genuine ethical value in this universe, arguably some things would have to be built the same way, with certain conditions, while hypothetically others things could vary, in the value production chain. This is because ethical value could not be absolutely anything, otherwise those things could not be genuinely valuable. If all could be fundamentally valuable, then nothing would really be, because value requires a discrimination in terms of better and worse. Somewhere in the value production chain, some things would have to be constant in order for there to be genuine value. Do you agree so far? 3. If some things have to be constant in the value production chain, and some things could hypothetically vary, then the constant things would be the really important in creating value, and the variable things would be accessory, and could be randomly specified with some degree of freedom, by those that be analyzing value production from a "bottom-up" perspective in a physical universe. It would seem therefore that the constant things could likely be what is truly valuable, while the variable and accessory things could be mere triggers or engines in the value production chain. 4. I argue that, in the case of humans and of this universe, the constant things are what really constitute value. There is some constant a

0timtyler13y

How about morality as an attractor - which nature approaches. Some goals are better than others - evolution finds the best ones.

3Stuart_Armstrong13y

Why do we have any reason to think this is the case?

0timtyler13y

So: game theory: reciprocity, kin selection/tag-based cooperation and virtue signalling. As J. Storrs-Hall puts it in: "Intelligence Is Good" Defecting typically ostracises you - and doesn't make much sense in a smart society which can track repuations. We already know about universal instrumental values. They illustrate what moral attractors look like. I discussed this issue some more in Handicapped Superintelligence.

1JoshuaZ13y

Doesn't most of this amount to morality as an attractor for evolved social species?

0timtyler13y

Evolution creates social species, though. Machines will be social too - their memetic relatedness might well be very high - an enormous win for kin selection-based theories based on shared memes. Of course they are evolving, and will evolve too - cultural evolution is still evolution.

1JoshuaZ13y

So this presumes that the machines in question will evolve in social settings? That's a pretty big assumption. Moreover, empirically speaking having in-group loyalty of that sort isn't nearly enough to ensure that you are friendly with nearby entities- look at how many hunter-gatherer groups are in a state of almost constant war with their neighbors. The attitude towards other sentients (such as humans) isn't going to be great even if there is some approximate moral attractor of that sort.

0timtyler13y

I'm not sure what you mean. It presumes that there will be more than one machine. The 'lumpiness' of the universe is likely to produce natural boundaries. It seems to be a small assumption. Sure, but cultural evolution produces cooperation on a massive scale. Right - so: high morality seems to be reasonably compatible with some ant-squishing. The point here is about moral attractors - not the fate of humans.

6JoshuaZ13y

It is a major assumption. To use the most obvious issue if someone is starting up an attempted AGI on a single computer (say it is the only machine that has enough power) then this won't happen. It also won't happen if one isn't having a large variety of machines which are actually engaging in generational copying. That means that say if one starts with ten slightly different machines, if the population doesn't grow in distinct entities this isn't going to do what you want. And if the entities lack a distinction between genotype and phenotype (as computer programs unlikely biological entities actually do) then this is also off because one will not be subject to a Darwinian system but rather a pseudo-Lamarckian one which doesn't act the same way. So your point seems to come down purely to the fact that evolved entities will do this, and a vague hope that people will deliberately put entities into this situation. This is both not helpful for the fundamental philosophical claim (which doesn't care about what empirically is likely to happen) and is not practically helpful since there's no good reason to think that any machine entities will actually be put into such a situation.

0timtyler13y

A multi-planetary living system is best described as being multiple agents, IMHO. The unity you suggest would represent relatedness approaching 1 - the ultimate win in terms of altruism and cooperation. Without copying there's no life. Copying is unavoidable. Variation is practically ineviable too - for instance, local adaptation. Computer programs do have the split between heredity and non heritble elements - which is the basic idea here, or it should be. Darwin believed in cultural evolution: "The survival or preservation of certain favoured words in the struggle for existence is natural selection" - so surely cultural evolution is Darwinian. Most of the game theory that underlies cooperation applies to both cultural and organic evolution. In particular, reciprocity, kin selection, and reputations apply in both domains. I didn't follow that bit - though I can see that it sounds a bit negative. Evolution has led to social, technological, intellectual and moral progress. It's conservative to expect these trends to continue.

0jacob_cannell13y

Attractors are features of evolutionary systems, it'd be wierd if their weren't attractors in goal space. Here's a paper which touches on that (I don't necessarily buy all of it, but the part about morality as an attractor in goal systems of evolving cooperating game theoretic agents is interesting)

0timtyler13y

Sure. Think about the optimal creature - for instance - and don't anybody tell me that fitness is relative to the environment - we can see the environment. Another point is that - even if there's no competition (and natural selection) involving alien races, the fear of such competiton is likely produce a similar adaptive effect - moving effective values towards universal instrumental values.

-2Shmi13y

You have made a number of posts on paraconsistent logic. Now it's time to walk the walk. For the purpose of this referee report, accept moral realism and use it explicitly to argue with your paper.

9Stuart_Armstrong13y

It's not that simple. I can't figure out what the proposition being defended is exactly. It shifts in ways I can't predict in the course of arguments and discussions. If I tried to defend it, my defence would end up being too caricatural or too weak.

0Shmi13y

Is your goal to affect their point of view? Or is it something else? For example, maybe your true target audience is those who donate to your organization and you just want to have a paper published to show them that they are not wasting their money. In any case, the paper should target your real audience, whatever it may be.

5Stuart_Armstrong13y

I want a paper to point those who make the thoughtless "the AI will be smart, so it'll be nice" argument to. I want a paper that forces the moral realists (using the term very broadly) to make specific counter arguments. I want to convince some of these people that AI is a risk, even if it's not conscious or rational according to their definitions. I want something to build on to move towards convincing the AGI researchers. And I want a publication.

[-]Cyan13y00

All of these seem extraordinarily strong claims to make!

A critic might respond: they are strong claims to make about an arbitrarily chosen individual goal system, but asserting that there exists some goal system fulfilling the conditions is a massive disjunction, and so is weaker than it appears from the list of conditions.

[-]private_messaging13y00

How's about that: the general purpose problem solving is altogether a different problem from implementing any form of real world motivation, and is likely to come separate from it (case in point: try make AIXI maximize paperclips without it also searching for a way to show itself paperclip porn; the problem appears entirely non solvable).

It seems that for danger of the AI you need some peculiar window into which the orthogonality must fly - too much orthogonality, no risk, too little, no FAI/UFAI distinction.

5Normal_Anomaly13y

You think it is in principle impossible to make (an implementation of) AIXI that understands the map/territory distinction, and values paperclips in the territory more than paper clips in the map? I may be misunderstanding the nature of AIXI, but as far as I know it's trying to maximize some "reward" number. If you program it so that the reward number is equal to "the number of paperclips in the territory as far as you know" it wouldn't choose to believe there were a lot of paperclips because that wouldn't increase its estimate (by its current belief-generating function) of the number of extant paperclips. Will someone who's read more on AIXI please tell me if I have it all backward? Thanks.

[-]Wei Dai13y130

AIXI's "reward number" is given directly to it via an input channel, and it's non-trivial to change it so that it's equal to "the number of paperclips in the territory as far as you know". UDT can be seen as a step in this direction.

0amcknight13y

I don't see how UDT is a step in this direction. Can you explain?

3Wei Dai13y

UDT shows how an agent might be able to care about something other than an externally provided reward, namely how a computation, or a set of computations, turn out. It's conjectured that arbitrary goals, such as "maximize the number of paperclips across this distribution of possible worlds" (and our actual goals, whatever they may turn out to be) can be translated into such preferences over computations and then programmed into an AI, which will then take actions that we'd consider reasonable in pursue of such goals. (Note this is a simplification that ignores issues like preferences over uncomputable worlds, but hopefully gives you an idea what the "step" consists of.)

-10private_messaging13y

0jacob_cannell13y

Any intelligent agent functioning in the real world is always ever limited to working with maps: internal information constructs which aim to represent/simulate the unknown external world. AIXI's definition (like any good formal mathematical agent definition), formalizes this distinction. AIXI assumes the universe is governed by some computable program, but it does not have direct access to that program, so instead it must create an internal simulation based on its observation history. AIXI could potentially understand the "map/territory distinction", but it could no more directly value or access objects in the territory than your or I. Just like us, and any other real world agents, AIXI can only work with it's map. All that being said, humans can build maps which at least attempt to distinguish between objects in the world, simulations of objects in simulated worlds, simulations of worlds in simulated worlds, and so on, and AIXI potentially could build such maps as well.

-3private_messaging13y

You need to somehow specify a conversion from the real world state (quarks, leptops, etc etc) to a number of paperclips, so that the paperclips can be ordered differently, or have slightly different compositions. That conversion is essentially a map. You do not want goal to distinguish between '1000 paperclips that are lying in a box in this specific configuration' and '1000 paperclips that are lying in a box in that specific configuration'. There isn't such discriminator in the territory. There is only in your mapping process. I'm feeling that much of the reasoning here is driven by verbal confusion. To understand the map-territory issue, is to understand the above. But to understand also has the meaning as in 'understand how to drive a car', with the implied sense that understanding of map territory distinction would somehow make you not be constrained by associated problems.

0TheAncientGeek12y

Indeed. The problem of making sure that you are maximizing the real entity you want to maximize , and not a proxy is roughly equivalent to the disproving solipsism, which, itself,is widely regarded as almost impossible,by philosophers. Realists tend to assume their way out of the quandary...but assumption isn't proof. In other words, there is no proof that humans are maximizing (good stuff) , and not just (good stuff porn)

[-]Paul Crowley13y00

Chess computer remark: am happy to be credited as "Paul Crowley ". Thanks!

0Stuart_Armstrong13y

Or did you want to be acknowledged just next to the quote, as well?

-1Stuart_Armstrong13y

You already are (see acknowledgements) :-)

0Paul Crowley13y

Ah didn't see that - was posting from phone! Because it credited "an online commentator" I thought maybe the attribution had been lost, or you didn't have my real name and couldn't credit "ciphergoth" in a natural way. Do whatever results in the best paper :) thanks!

[-]taw13y-10

Strong orthogonality hypothesis is definitely wrong - not being openly hostile to most other agents has enormous instrumental advantage. That's what's holding modern human societies together - agents like humans, corporations, states etc. - have mostly managed to keep their hostility low. Those that are particularly belligerent (and historical median has been far more belligerent towards strangers than all but the most extreme cases today) don't do well by instrumental standards at all.

Of course you can make a complicated argument why it doesn't matter (so... (read more)

1Kindly13y

I actually think this "complicated argument", either made or refuted, is the core of this orthogonality business. If you ask the question "Okay, now that we've made a really powerful AI somehow, should we check if it's Friendly before giving it control over the world?" then you can't answer it just based on what you think the AI would do in a position roughly equal to humans. Of course, you can just argue that this doesn't matter because we're unlikely to face really powerful AIs at all. But that's also complicated. If the orthogonality thesis is truly wrong, on the other hand, then the answer to the question above is "Of course, let's give the AI control over the world, it's not going to hurt humans and in the best case it might help us."

[-][anonymous]11y-20

It's so much easier to just change your moral reasoning than than to reingineer the entirety of human intelligence. How can artificial intelligence experts be so daft?

[-]FinalState13y-20

This one is actually true.

[+]private_messaging13y-60

LESSWRONG
Petrov Day
LW

LESSWRONG
Petrov Day
LW

33

General purpose intelligence: arguing the Orthogonality thesis

33

33

1 The Orthogonality thesis

1.1 Qualifying the Orthogonality thesis

2 Orthogonality for theoretic agents

3 Orthogonality for human-level AIs

3.1 Utility functions

3.2 The span of human motivations

3.3 Interim goals as terminal goals

3.4 Noise, anti-agents and goal combination

3.5 Further tricks up the sleeve

4 Orthogonality for superhuman AIs

4.1 No convergence

4.2 Oracles show the way

4.3 Tricking the controller

4.4 Temporary fragments of algorithms, fictional worlds and extra tricks

4.5 In summary

5 Bayesian Orthogonality thesis

6 Conclusion

7 Acknowledgements

8 Bibliography

Footnotes