Sorted by New

Wiki Contributions


Thanks for the reply, Robb. I've read your post and a good deal of the discussion surrounding it.

I think I understand the general concern, that an AI that either doesn't understand or care about our values could pose a grave threat to humanity. This is true on its face, in the broad sense that any significant technological advance carries with it unforeseen (and therefore potentially negative) consequences. If, however, the intelligence explosion thesis is correct, then we may be too late anyway. I'll elaborate on that in a moment.

First, though, I'm not sure I see how an AI "too dumb to understand human values" could program a superior general intelligence (i.e. an AI that is smart enough to understand human values). Even so, assuming it is possible, and assuming it could happen on a timescale and in such a way as to preclude or make irrelevant any human intervention, why would that change the nature of the superior intelligence from being, say, friendly to human interests, to being hostile to them? Why, for that matter, would any superintelligence (that understands human values, and that is "able to form deep and reliable abstractions about the world") be predisposed to any particular position vis-a-vis humans? And even if it were predisposed toward friendliness, how could we possibly guarantee it would always remain so? How, that is, having once made a friend, can we foolproof ourselves against betrayal? My intuition is that we can’t. No step can be taken without some measure of risk, however small, and if the step has potentially infinitely negative consequences, then even the very slightest of risks begins to look like a bad bet. I don’t know a way around that math.

The genie, as you say, doesn't care. But also, often enough, the human doesn't care. He is constrained, of course, by his fellow humans, and by his environment, but he sometimes still manages (sometimes alone, sometimes in groups) to sow massive horror among his fellows, sometimes even in the name of human values. Insanity, for instance, in humans, is always possible, and one definition of insanity might even be: behavior that contradicts, ignores or otherwise violates the values of normal human society. “Normal” here is variable, of course, for the simple reason that “human society” is also variable. That doesn’t stop us, however, from distinguishing, as we generally do, between the insane and the merely stupid, even if upon close inspection the lines begin to blur. Likewise, we occasionally witness - and very frequently we imagine (comic books!) - cases where a human is both super-intelligent and super-insane. The fear many people have with regard to strong AI (and it is perhaps well-grounded, or well-enough), is that it might be both super-intelligent and, at least as far as human values are concerned, super-insane. As an added bonus, and certainly if the intelligence explosion thesis is correct, it might also be unconstrained or, ultimately, unconstrainable. On this much I think we agree, and I assume the goal of FAI is precisely to find the appropriate constraints.

Back now, though, to the question of “too late.” The family of problems you propose to solve before the first so-called seed AIs are built include, if I understand you correctly, a formal definition of human values. I doubt very much that such a solution is possible - and “never” surely won’t help us any more than “too late” - but what would the discovery of (or failure to discover) such a solution have to do with a mistake such as tiling the universe with smiley-faces (which seems to me much more a semantic error than an error in value judgment)? If we define our terms - and I don’t know any definition of intelligence that would allow the universe-tiling behavior to be called intelligent - then smiley faces may still be a risk, but they are not a risk of intelligent behavior. They are one way the project could conceivably fail, but they are not an intelligent failure.

On the other hand, the formal-definition-of-human-values problem is related to the smiley faces problem in another way: any hard-coded solution could lead to a universe of bad definitions and false equivalencies (smiles taken for happiness). Not because the AI would make a mistake, but because human values are neither fixed nor general nor permanent: to fix them (in code), and then propagate them on the enormous scale the intelligence explosion thesis suggests, might well lead to some kind of funneling effect, perhaps very quickly, perhaps over a long period of time, that produces, effectively, an infinity of smiley faces. In other words, to reduce an irreducible problem doesn’t actually solve it. For example, I value certain forms of individuality and certain forms of conformity, and at different times in my life I have valued other and even contradictory forms of individuality and other and even contradictory forms of conformity. I might even, today, call certain of my old individualistic values conformist values, and vice-versa, and not strictly because I know more today than I knew then. I am, today, quite differently situated in the world than I was, say, twenty years ago; I may even be said to be somewhat of a different person (and yet still the same); and around me the world itself has also changed. Now, these changes, these changing and contradictory values may or may not be the most important ones, but how could they be formalized, even conceptually? There is nothing necessary about them. They might have gone the other way around. They might not have changed at all. A person can value change and stability at the same time, and not only because he has a fuzzy sense of what those concepts mean. A person can also have a very clear idea of what certain concepts mean, and those concepts may still fail to describe reality. They do fail, actually, necessarily, which doesn’t make them useless - not at all - but knowledge of this failure should at least make us wary of the claims we produce on their behalf.

What am I saying? Basically, that the pre-seed hard-coding path to FAI looks pretty hopeless. If strong AI is inevitable, then yes, we must do everything in our power to make it friendly; but what exactly is in our power, if strong AI (which by definition means super-strong, and super-super-strong, etc.) is inevitable? If the risks associated with strong AI are as grave as you take them to be, does it really seem better to you (in terms of existential risk to the human race) for us to solve FAI - which is to say, to think we’ve solved it, since there would be no way of testing our solution “inside the box” - than to not solve strong AI at all? And if you believe that there is just no way to halt the progress toward strong AI (and super, and super-super), is that compatible with a belief that “this kind of progress” can be corralled into the relatively vague concept of “friendliness toward humans”?

Better stop there for the moment. I realize I’ve gone well outside the scope of your comment, but looking back through some of the discussion raised by your original post, I found I had more to say/think about than I expected. None of the questions here are meant to be strictly rhetorical, a lot of this is just musing, so please respond (or not) to whatever interests you.

Why do I reject "intentionality if and only if subjective experience"? For one thing, there are simple states of consciousness - moods, for example - that have no intentionality, so subjectivity fails to imply intentionality. Nor can I see any reason that the implication holds in the direction from intentionality to subjectivity.

I think this is a bit confused. It isn't that simple states of consciousness, qualia, etc. imply intentionality, rather that they are prerequisites for intentionality. X if and only if Y just means there can be no X without Y. I'm not familiar enough with Searle to comment on his endorsement of the idea, but it makes sense to me at least that in order to have intention (in the sense of will) an agent would have first to be able to perceive (subjectively, of course) the surroundings/other agents on which it intends to act. You say intentionality is "easy". Okay. But what does it mean to talk of intentionality, without a subject to have the intention?

The Lector/AI analogy occurred to me as well. The problem, in strategic-- and perhaps also existential-- terms, is that Starling/Gatekeeper is convinced that Lector/AI is the only one holding the answer to some problem that Starling/Gatekeeper is equally convinced must be solved. Lector/AI, that is, has managed to make himself (or already is) indispensable to Starling/Gatekeeper.

On a side note, these experiments also reminded me of the short-lived game show The Moment of Truth. I watched a few episodes back when it first aired and was mildly horrified. Contestants were frequently willing to accept relatively paltry rewards in exchange for the ruination of what appeared at least to be close personal relationships. The structure here is that the host asks the contestants increasingly difficult (i.e. embarrassing, emotionally damaging) questions before an audience of their friends and family members. Truthful answers move the player up the prize-money/humiliation/relationship-destruction pyramid, while a false answer (as determined by a lie-detector test), ends the game and forfeits all winnings. Trying to imagine some potentially effective arguments for the AI in the box experiment, the sort of thing going on here came instantly to mind, namely, that oldest and arguably most powerful blackmail tool of them all: SHAME. As I understand it, Dark Arts are purposely considered in-bounds for these experiments. Going up against a Gatekeeper, then, I'd want some useful dirt in reserve. Likewise, going up against an AI, I'd have to expect threats (and consequences) of this nature, and prepare accordingly.

Your response avoids the basic logic here. A human emulation would count as an AI, therefore human behavior is one possible AI behavior. There is nothing controversial in the statement; the conclusion is drawn from the premise. If you don't think a human emulation would count as AI, or isn't possible, or something else, fine, but... why wouldn't a human emulation count as an AI? How, for example, can we even think about advanced intelligence, much less attempt to model it, without considering human intelligence?

...humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants.

I don't think this is generally an accurate (or complex) description of human behavior, but it does sound to me like an "informal system" - i.e. we tend to care. My reading of (at least this part of) PhilGoetz's position is that it makes more sense to imagine something we would call an advanced or super AI responding to requests and commands with a certain nuance of understanding (as humans do) than with the inflexible ("brittle") formality of, say, your average BASIC program.

I think the point is that if you accept this definition of intelligence, i.e. that it requires the ability to form deep and reliable abstractions about the world, then it doesn't make sense to talk about any intelligence (let alone a super one) being unable to differentiate between smiley-faces and happy people. It isn't a matter, at least in this instance, of whether it cares to make that differentiation or not. If it is intelligent, it will make the distinction. It may have values that would be unrecognizable or abhorrent to humans, and I suppose that (as Shane_Legg noted) it can't be ruled out that such values might lead it to tile the universe with smiley-faces, but such an outcome would have to be the result of something other than a mistake. In other words, if it really is "that stupid," it fails in a number of other ways long before it has a chance to make this particular error.

Occam's razor is, of course, not an arbitrary rule nor one justified by its practical success. It simply says that unnecessary elements in a symbolism mean nothing.

Signs which serve one purpose are logically equivalent, signs which serve no purpose are logically meaningless.

  • Ludwig Wittgenstein, Tractatus Logico-Philosophicus 5.47321

The conscientious. - It is more comfortable to follow one's conscience than one's reason: for it offers an excuse and alleviation if what we undertake miscarries--which is why there are always so many conscientious people and so few reasonable ones.

-- Nietzsche

Thanks for clarifying. The wording seems odd to me, but I get it now.


How is this so? Surely, as a general proposition, ignorance and intention are much more loosely correlated than the quote suggests. What if the statement were altered slightly: "If (after great effort and/or reflection and/or prayer) you (still) don't know..." Does it still make sense to speak of intention? Or if the point is that the failure to solve a simple problem indicates a will to fail, well then the author has more faith in human will than I do--and IMO greatly underestimates the possible ways of not-knowing.

Geulincx, from his own annotations to his Ethics (1665):

...our actions are as it were a mirror of Reason and God's law. If they reflect Reason, and contain in themselves what Reason dictates, then they are virtuous and praiseworthy; but if they distort Reason's reflection in themselves, then they are vicious and blameworthy. This has no effect on Reason, or God's law, which are no more beautiful or more ugly for it. Likewise, a thing represented in a mirror remains the same whether the mirror is true and faithfully represents it, or whether it is false and twists and distorts the likeness of the thing. The mirror does not distort the likeness of the thing reflected in the thing itself, but in itself - that is, in the mirror itself. Hence, corruption and ugliness belong with the mirror itself, not with the thing reflected. Similarly, we are also said to break God's law, to trample on it, to pervert it, and so on, but this takes place in ourselves, not in the law itself, so that the whole of the ugliness remains in ourselves, and nothing of it belongs with the law itself.

Load More