Precisely Bound Demons and their Behavior

[-]dxu11y240

Since this seems like a pretty transparent metaphor for Friendly AI, it looks as though Eliezer is planning to go through with his idea of crowdsourcing FAI research. Any predictions for how this is going to go? I'm personally not optimistic that the subreddit is actually going to produce any important, novel results*, but at the very least, it'll increase exposure to the idea of FAI with a general audience. (After all, HPMoR was what originally brought me to LW.)

* It seems to me that the main strength of crowdsourcing in solving problems is the ability to propose a truly gigantic amount of solutions in a very short amount of time, which only helps if (a) the true solution is easy enough to guess that someone can stumble upon it largely by chance, (b) other people then recognize the solution as a good one and upvote it, and (c) the solution is easily testable to see if it is a good one or a bad one (otherwise people will keep on proposing solutions without realizing that they've already stumbled across the right answer). All three of these were true of HPMoR; all of them are probably false in the context of FAI research.

[-]jimrandomh11y170

One of the main things that stops me from writing about things - FAI included - is that if something feels very important, anxiety kicks in and inhibits the thought-to-keyboard process. If that problem is at all common, then a thin veil of frivolity will do wonders for research productivity.

[-]dxu11y30

That seems fair, but I'd say that unless you're already intelligent enough to do important, original work in the field of FAI (or any other field of mathematics, really), a productivity boost won't help much. To use an analogy: a car whose engine is broken won't run no matter how much gasoline you put in its tank.

(Not to imply that the people who frequent /r/hpmor are unintelligent, just that the bar for doing successful FAI research is really, really high, and unless you can clear that bar, increasing the number of people working on the problem isn't likely to help--in my view, anyway. I could be wrong.)

[-]skeptical_lurker11y120

I quite liked the story idea until I realised that its a pretty transparent metaphor for Friendly AI... no, wait, it actually is a story about FAI. Starting off with worldbuilding a fantasy magicpunk setting and then suddenly switching to FAI seems... kinda like bait and switch?

Having said that, I really like this setting. The main problem is that there seem to be two entirely different themes - FAI and sorcerers taking over the world. If you start discussing hard maths you are going to lose many readers, but then if the goal is to inspire FAI work, does this matter?

The secondary problem is if you can just reset the timeline as many times as you want, there is no sense of urgency or tension. Maybe they discover that each time they reset, cracks start to appear in the walls between realms , deamon summoning becomes easier, and the daemons are one step closer to being able to break through on their own?

Meta: is there any point in discussing this here when the reddit conversation is so much bigger? I'm probably just going to copy my comment over.

[-]Epictetus11y30

Mathematics always has some primitive, undefined concepts at the root of it. Demons have the option of interpreting these malevolently or asking for more and more clarification until a loophole is reached.

A demon told to accelerate a vehicle along an exactly given vector for a specified time, applying the same added acceleration at any given time to all particles in the vehicle, and causing no other impact on the material universe, will do only that... if the language of the contract can be mathematically specified in an absolutely unambiguous way.

Relativity of simultaneity. A demon can choose a reference frame with respect to which applying the acceleration to all particles at the same time leads to the vehicle being torn apart.

[-]Rob Bensinger11y10

To make this work, the demons will need to think in a particular mathematical language, whose primitives they take for granted and have relatively unmysterious empirical significance.

Alternatively, perhaps demons are somehow forced (or motivated) to 'do what i mean' -- they never perversely interpret the semantics of what you say. But they also don't coherently extrapolate your volition, which means they're free to perversely manipulate aspects of the situation that you didn't explicitly talk about (especially when you didn't consciously think about them either). E.g., if you give a demon the English-language instruction "pick up that bucket of water," it isn't free to come up with a suboptimal semantics (like "pick up" means "decapitate" and "bucket of water" means "all my friends"), but it is free to execute the correctly-interpreted instruction in a dangerous way (e.g., picking it up with so much speed and force that it produces a shockwave). If demons interpret the meaning of commands with maximal benevolence but choose a means to the specified end with maximal malice, then mathematically specifying everything about the means makes demons safe(r).

[-]Oligopsony11y30

If the demons understand harm and are very clever in figuring out what will lead to it, what happens when we ask them minimize harm, or maximize utility, or do the opposite of what it would want to do otherwise, or {rigidly specified version of something like this}?

Can we force demons to tell us (for instance) how they'd rank various policy packages in government, what personal choices they'd prefer I make, &c., so we can back-engineer what not to do? They're not infinitely clever, but how clever are they?

[-]TsviBT11y60

There are ten thousand wrong solutions and four good solutions. You don't get much info from being told a particular bad solution. The opposite of a bad solution is a bad solution.

[-]Jiro11y10

So ask a series of "which of X and Y would you prefer that we do". The demon always prefers the worst thing, but is constrained to truthfully describe its preferences. This is a single bit of data, but it's really useful.

[-]Jiro11y10

Actually, I can think of another loophole. Just ask the demon to do X in a manner which causes, by the demon's own standards, the least harm. Because it is stipulated that the demon always wants to do things that cause the most harm by human standards, it follows that the demons must have a concept of "harm" that is congruent with human standards. The demon is not only a malevolent genie, it's a consistently malevolent genie and you can take advantage of this.

It may seem that we have not really stipulated that the demon ranks everything by human standards, just that the demon's topmost preference is the one ranked the worst by human standards. However, you can ask the demon "do X in a way that is not (topmost preference)" and by stipulation it will still do the most harm, thus implying that the demon's second preference is also ranked by human standards; by induction all the demon's preferences are ranked by human standards.

This can break if the demon does things that do the most harm by human standards because it has its own standards opposite from a human and does the least harm by its own standards. If so, just ask it for something that causes the most harm by its standards instead.

(If you're wondering what happens if the demon picks the definition of "the demon's standards" that it prefers, it can't actually do that. One of the choices would be a lie, and the demon is a non-lying genie, not a lying-if-plausible-deniability genie.)

[-]Luke_A_Somers11y10

The looping does introduce a confounding factor - the best solutions are going to require foreknowledge and thus be rather inapplicable to real life.

BTW, is this inspired by the 'Infinite Loops' general-purpose fanfiction setting?

[-][anonymous]11y00

Do demons communicate between themselves? Can it be shown that Looping the world is the only way for it not to be forever ended? What are the worst sacrifices a Summoner can make to prolong the Loop for a unit of time? (If Looping is common knowledge) is there a way to make a Looping world better than a non-Looping? Like, you can optimize everything until you don't have disease, poverty, lack of fun, then you go forward and un-Loop?

[+]John Diaz7y-80

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

17

Precisely Bound Demons and their Behavior

17

17