Apteris — LessWrong

Let me clarify why I asked. I think the "multiple layers of abstraction" idea is essentially "build in a lot of 'manual' checks that the AI isn't misbehaving", and I don't think that is a desirable or even possible solution. You can write n layer of checks, but how do you know that you don't need n+1?

The idea being--as has been pointed out here on LW--that what you really want and need is a mathematical model of morality, which the AI will implement and which moral behaviour will fall out of without you having to specify it explicitly. This is what MIRI are working on with CEV & co.

Whether or not CEV or whatever emerges as the best model to use are gameable is itself a mathematical question,[1] central to the FAI problem.

[1] There are also implementation details to consider, e.g. "can I mess with the substrate" or "can I trust my substrate".

Examples of AI's behaving badly

Apteris10y20

What happens if an AI manages to game the system despite the n layers of abstraction?

Superintelligence 9: The orthogonality of intelligence and goals

Apteris11y00

Your argument would be stronger if you provided a citation. I've only skimmed CEV, for instance, so I'm not fully familiar with Eliezer strongest arguments in favour of goal structure tending to be preserved (though I know he did argue for that) in the course of intelligence growth. For that matter, I'm not sure what your arguments for goal stability under intelligence improvement are. Nevertheless, consider the following:

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

Yudkowsky, E. (2004). Coherent Extrapolated Volition. Singularity Institute for Artificial Intelligence

(Bold mine.) See that bolded part above? Those are TODOs. They would be good to have, but they're not guaranteed. The goals of a more intelligent AI might diverge from those of its previous self; it may extrapolate differently; it may interpret differently; its desires may, at higher levels of intelligence, interfere with ours rather than cohere.

If I want X, and I'm considering an improvement to my systems that would make me not want X, then I'm not going to get X if I take that improvement, so I'm going to look for some other improvement to my systems to try instead.

A more intelligent AI might:

find a new way to fulfill its goals, e.g. Eliezer's example of distancing your grandmother from the fire by detonating a nuke under her;
discover a new thing it could do, compatible with its goal structure, that it did not see before, and that, if you're unlucky, takes priority over the other things it could be doing, e.g. you tell it "save the seals" and it starts exterminating orcas; see also Lumifer's post.
just decide to do things on its own. This is merely a suspicion I have, call it a mind projection, but: I think it will be challenging to design an intelligent agent with no "mind of its own", metaphorically speaking. We might succeed in that, we might not.

Superintelligence 8: Cognitive superpowers

Apteris11y00

We might be approaching a point of diminishing returns as far as improving cultural transmission is concerned. Sure, it would be useful to adopt a better language, e.g. one less ambiguous, less subject to misinterpretation, more revealing of hidden premises and assumptions. More bandwidth and better information retrieval would also help. But I don't think these constraints are what's holding AI back.

Bandwidth, storage, and retrieval can be looked at as hardware issues, and performance in these areas improves both with time and with adding more hardware. What AI requires are improvements in algorithms and in theoretical frameworks such as decision theory, morality, and systems design.

Superintelligence 8: Cognitive superpowers

Apteris11y30

I think it will prove computationally very expensive, both to solve protein folding and to subsequently design a bootstrapping automaton. It might be difficult enough for another method of assembly to come out ahead cost-wise.

Superintelligence 7: Decisive strategic advantage

Apteris11y10

You're right, that is more realistic. Even so, I get the feeling that the human would have less and less to do as time goes on. I quote:

“He just loaded up on value stocks,” says Mr. Fleiss, referring to the AI program. The fund gained 41% in 2009, more than doubling the Dow’s 19% gain.

As another data point, a recent chess contest between a chess grandmaster (Daniel Naroditsky) working together with an older AI (Rybka, rated ~3050) and the current best chess AI (Stockfish 5, rated 3290) ended with a 3.5 - 0.5 win for Stockfish.

Superintelligence 7: Decisive strategic advantage

Apteris11y10

While not exactly investment, consider the case of an AI competing with a human to devise a progressively better high-frequency trading strategy. An AI would probably:

be able to bear more things in mind at one time than the human
evaluate outcomes faster than the human
be able to iterate on its strategies faster than the human

I expect the AI's superior capacity to "drink from the fire hose" together with its faster response time to yield a higher exponent for the growth function than that resulting from the human's iterative improvement.

Superintelligence 6: Intelligence explosion kinetics

Apteris11y00

The effectiveness of learning hyper-heuristics for other problems, i.e. how much better algorithmically-produced algorithms perform than human-produced algorithms, and more pertinently, where the performance differential (if any) is heading.

As an example, Effective learning hyper-heuristics for the course timetabling problem says: "The dynamic scheme statistically outperforms the static counterpart, and produces competitive results when compared to the state-of-the-art, even producing a new best-known solution. Importantly, our study illustrates that algorithms with increased autonomy and generality can outperform human designed problem-specific algorithms."

Similar results can be found for other problems, bin packing, traveling salesman, and vehicle routing being just some off-the-top-of-my-head examples.

Help Fund Lukeprog at SIAI

Apteris13y20

Only problem is cooking. Eats up like 4 hours a week.

This article by Roger Ebert on cooking is, I suspect, highly relevant to your interests. Mine too, as a matter of fact.

The Evil AI Overlord List

Apteris13y30

For example, consider a system that takes seriously the idea of souls. One might very well decide that all that matters is whether an entity has a soul, completely separate from its apparent intelligence level. Similarly, a sufficiently racist individual might assign no moral weight to people of some specific racial group, regardless of their intelligence.

Right you are. I did not express myself well above. Let me try and restate, just for the record.

Assuming one does not assign equal rights to all autonomous agents (for instance, if we take the position that a human has more rights than a bacterium), then discriminating based on cognitive capacity (of the species, not the individual) (as one of many possible criteria) is not ipso facto wrong. It may be wrong some of the time, and it may be an approach employed by bigots, but it is not always wrong. This is my present opinion, you understand, not established fact.

there's the additional problem that I pointed out that it wouldn't even necessarily be in humanity's best interest for the entity to have such an ethical system.

Agreed. But this whole business of "we don't want the superintelligence to burn us with its magnifying glass, so we in turn won't burn ants with our magnifying glass" strikes me as rather intractable. Even though, of course, it's essential work.

I would say a few more words, but I think it's best to stop here. This subthread has cost me 66% of my Karma. :)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments