Phil_Goetz5 - LessWrong

"You didn't know, but the predictor knew what you'll do, and if you one-box, that is your property that predictor knew, and you'll have your reward as a result."

No. That makes sense only if you believe that causality can work backwards. It can't.

"If predictor can verify that you'll one-box (after you understand the rules of the game, yadda yadda), your property of one-boxing is communicated, and it's all it takes."

Your property of one-boxing can't be communicated backwards in time.

We could get bogged down in discussions of free will; I am assuming free will exists, since arguing about the choice to make doesn't make sense unless free will exists. Maybe the Predictor is always right. Maybe, in this imaginary universe, rationalists are screwed. I don't care; I don't claim that rationality is always the best policy in alternate universes where causality doesn't hold and 2+2=5.

What if I've decided I'm going to choose based on a coin flip? Is the Predictor still going to be right? (If you say "yes", then I'm not going to argue with you anymore on this topic; because that would be arguing about how to apply rules that work in this universe in a different universe.)

Is That Your True Rejection?

Phil_Goetz516y00

Vladimir, I understand the PD and similar cases. I'm just saying that the Newcomb paradox is not actually a member of that class. Any agent faced with either version - being told ahead of time that they will face the Predictor, or being told only once the boxes are on the ground - has a simple choice to make; there's no paradox and no PD-like situation. It's a puzzle only if you believe that there really is backwards causality.

...Recursion, Magic

Phil_Goetz516y100

"You speculate about why Eurisko slowed to a halt and then complain that Lenat has wasted his life with CYC, but you ignore that Lenat has his own theory which he gives as the reason he's been pursuing CYC. You should at least explain why you think his theory wrong; I find his theory quite plausible."

Around 1990, Lenat predicted that Cyc would go FOOM by 2000. In 1999, he told me he expected it to go FOOM within a couple of years. Where's the FOOM?
Cyc has no cognitive architecture. It's a database. You can ask it questions. It has templates for answering specific types of questions. It has (last I checked, about 10 years ago) no notion of goals, actions, plans, learning, or its own agenthood.

Observing Optimization

Phil_Goetz516y-10

If I want to predict that the next growth curve will be an exponential and put bounds around its doubling time, I need a much finer fit to the data than if I only want to ask obvious questions like..."Do the optimization curves fall into the narrow range that would permit a smooth soft takeoff?"

This implies that you have done some quantitative analysis giving a probability distribution of possible optimization curves, and finding that only a low-probability subset of that distribution allows for soft takeoff.

Presenting that analysis would be an excellent place to start.

Ends Don't Justify Means (Among Humans)

Phil_Goetz516y10

Note for readers: I'm not responding to Phil Goetz and Jef Allbright. And you shouldn't infer my positions from what they seem to be arguing with me about - just pretend they're addressing someone else.

Is that on this specific question, or a blanket "I never respond to Phil or Jef" policy?

Huh. That doesn't feel very nice.

Nor very rational, if one's goal is to communicate.

Ends Don't Justify Means (Among Humans)

Phil_Goetz516y20

All the discussion so far indicates that Eliezer's AI will definitely kill me, and some others posting here, as soon as he turns it on.

It seems likely, if it follows Eliezer's reasoning, that it will kill anyone who is overly intelligent. Say, the top 50,000,000 or so.

(Perhaps a special exception will be made for Eliezer.)

Hey, Eliezer, I'm working in bioinformatics now, okay? Spare me!

Eliezer: If you create a friendly AI, do you think it will shortly thereafter kill you? If not, why not?

Ends Don't Justify Means (Among Humans)

Phil_Goetz516y20

He may have some model of an AI as a perfect Bayesian reasoner that he uses to justify neglecting this. I am immediately suspicious of any argument invoking perfection.

It may also be that what Eliezer has in mind is that any heuristic that can be represented to the AI, could be assigned priors and incorporated into Bayesian reasoning.

Eliezer has read Judea Pearl, so he knows how computational time for Bayesian networks scales with the domain, particularly if you don't ever assume independence when it is not justified, so I won't lecture him on that. But he may want to lecture himself.

(Constructing the right Bayesian network from sense-data is even more computationally demanding. Of course, if you never assume independence, then the only right network is the fully-connected one. I'm pretty certain that suggesting that a non-narrow AI will be reasoning over all of its knowledge with a fully-connected Bayesian network is computationally implausible. So all arguments that require AIs to be perfect Bayesian reasoners are invalid.)

I'd like to know how much of what Eliezer says depends on the AI using Bayesian logic as its only reasoning mechanism, and whether he believes that is the best reasoning mechanism in all cases, or only one that must be used in order to keep the AI friendly.

Kaj: I will restate my earlier question this way: "Would AIs also find themselves in circumstances such that game theory dictates that they act corruptly?" It doesn't matter whether we say that the behavior evolved from accumulated mutations, or whether an AI reasoned it out in a millisecond. The problem is still there, if circumstances give corrupt behavior an advantage.

Ends Don't Justify Means (Among Humans)

Phil_Goetz516y20

Good point, Jef - Eliezer is attributing the validity of "the ends don't justify the means" entirely to human fallibility, and neglecting that part accounted for by the unpredictability of the outcome.

He may have some model of an AI as a perfect Bayesian reasoner that he uses to justify neglecting this. I am immediately suspicious of any argument invoking perfection.

I don't know what "a model of evolving values increasingly coherent over increasing context, with effect over increasing scope of consequences" means.

Ends Don't Justify Means (Among Humans)

Phil_Goetz516y270

The tendency to be corrupted by power is a specific biological adaptation, supported by specific cognitive circuits, built into us by our genes for a clear evolutionary reason. It wouldn't spontaneously appear in the code of a Friendly AI any more than its transistors would start to bleed.

This is critical to your point. But you haven't established this at all. You made one post with a just-so story about males in tribes perceiving those above them as corrupt, and then assumed, with no logical justification that I can recall, that this meant that those above them actually are corrupt. You haven't defined what corrupt means, either.

I think you need to sit down and spell out what 'corrupt' means, and then Think Really Hard about whether those in power actually are more corrupt than those not in power;and if so, whether the mechanisms that lead to that result are a result of the peculiar evolutionary history of humans, or of general game-theoretic / evolutionary mechanisms that would apply equally to competing AIs.

You might argue that if you have one Sysop AI, it isn't subject to evolutionary forces. This may be true. But if that's what you're counting on, it's very important for you to make that explicit. I think that, as your post stands, you may be attributing qualities to Friendly AIs, that apply only to Solitary Friendly AIs that are in complete control of the world.

Why Does Power Corrupt?

Phil_Goetz516y40

Eliezer: I don't get your altruism. Why not grab the crown? All things being equal, a future where you get to control things is preferable to a future where you don't, regardless of your inclinations. Even if altruistic goals are important to you, it would seem like you'd have better chances of achieving them if you had more power. ... If all people, including yourself, become corrupt when given power, then why shouldn't you seize power for yourself? On average, you'd be no worse than anyone else, and probably at least somewhat better; there should be some correlation between knowing that power corrupts and not being corrupted. ... Benevolence itself is a trap. The wise treat men as straw dogs; to lead men, you must turn your back on them.

These are all Very Bad Things to say to someone who wants to construct the first AI.

Do we know that not-yet-powerful Stalin would have disagreed (internally) with a statement like "preserving Communism is worth the sacrifice of sending a lot of political opponents to gulags"?

Let's think about the Russian revolution. You have 3 people, arrayed in order of increasing corruption before coming to power: Trotsky, Lenin, Stalin. Lenin was nasty enough to oust Trotsky. Stalin was nasty enough to dispose of everybody who was a threat to him. Steven's point is good - that these people were all pre-corrupted - but we also see the corrupt rise to the top.

In the Cuban revolution, Fidel was probably more corrupt than Che from the start. I imagine Fidel would likely have had Che killed, if he in fact didn't.

So we now have 4 hypotheses:

Males are inclined to perceive those presently in power as corrupt. (Eliezer)
People are corrupted by power.
People are corrupt. (Steven)
Power selects for people who are corrupt.

How can we select from among these?

LESSWRONG
LW

Posts

Wiki Contributions

Comments