Eugene - LessWrong

Allegory On AI Risk, Game Theory, and Mithril

This Thorin guy sounds pretty clever. Too bad he followed his own logic straight to his demise, but hey he stuck to his guns! Or pickaxe, as it were.

His argument attempting to prevent Bifur from trying to convince fellow Dwarves against mining into the Balrog's lair sounds like a variation on the baggage carousel problem (this is the first vaguely relevant link I stumbled across, don't take it as a definitive explanation)

Basically, everyone wants resource X, resulting in a given self-interested behavior whose result is to collectively lower everyone's overall success rate, but where the solution that maximizes total success directly goes against each person's self-interest. This results in an equilibrium where everyone works sub-optimally.

In this variation, the action of Thorin's operation achieving resource M moves everyone slightly closer to negative consequence B. So the goal is no longer to maximize resource collection, but to minimize it. Doing so goes against everyone's self-interest, resulting in etc. That is what Thorin is so eloquently trying to prevent Bifur from doing.

There are a couple ways Bifur can approach this.

He could do it through logical discourse: Thorin is in error when he claims

Each individual miner would correctly realize that just him alone mining Mithril is extraordinarily unlikely to be the cause of the Balrog awakening

because it assumes unearthing the Balrog is a matter of incrementally filling a loading bar, where each Dwarf's contribution is miniscule. That's the naive way to imagine the situation, since you see in your mind the tunnel boring ever closer to the monster. But given that we can't know the location or depth of the Balrog, each miner's strike is actually more like a dice roll. Even if it's a large dice roll, recontexualizing the danger in this manner will surely cause some dwarves to update their risk-reward assessment of mining Mithril. A campaign of this nature will at least lower the number of dwarves willing to join Thorin's operation, although it doesn't address the "Balrog isn't real" or "Balrog isn't evil" groups.

Alternatively, he could try to normalize new moral behavior. People are willing to work against their self-interest if doing so demonstrates a socially accepted/enforced moral behavior. If he were a sly one, he could sidestep the divisive Balrog issue altogether and simply spread the notion that wearing or displaying Mithril is sinful within the context of Dwarven values. eg maybe it's too pragmatic, and not extravagant enough for a proper ostentatious Dwarven sensibility. That could shut down Thorin's whole operation without ever addressing the Balrog.

But Bifur probably sees the practical value of Mithril beyond its economic worth. As Thorin says, it's vital for the war effort - completely shutting down all Mithril mining may not be the best plan if it results in a number of Dwarf casualties similar to or greater than what he estimates a Balrog could do. So a more appetizing plan might be to combine the manipulation of logic and social norms. He could perform a professional survey of the mining systems. Based on whatever accepted arbitrary standards of divining danger the Dwarves agree to (again, assuming the location of the Balrog is literally unknowable before unearthing it due to magic), Bifur could identify mining zones of ever increasing danger within whatever tolerances he's comfortable with. He could then shop these around to various mining operations as helpful safety guidelines until he has a decent lobby behind him in order to persuade the various kings to ratify his surveys into official measuring standards. Dwarves are still free to keep mining deeper if they wish, but now with a socially accepted understanding that heading into each zone ups their risk relative to their potential reward, naturally preventing a majority of Dwarves from wanting to do so. Those who believe the Balrog doesn't exist or is far away would be confronted with Bifur's readily available surveys, putting them on the defensive. There would still be opposition from those who see the Balrog as "not evil", but the momentum behind Bifur's social movement should be enough to shout them down. This result would allow Thorin's operation to continue to supply the realm with life-saving Mithril, while at least decreasing the danger of a Balrog attack for as long as Bifur's standards are recognized.

Finally, Bifur could try to use evidenced-based research and honestly performed geological surveys, but even in the real world where locating the Balrog beforehand is technologically possible, that tends to be a weaker tactic than social manipulation. Only other experts will be able to parse them, his opponents will have emotional arguments that will give them the upper hand, and Thorin's baggage carousel logic would remain unchallenged.

Open Thread, Aug. 8 - Aug 14. 2016

Eugene8y30

I think cooperation is more complex than that, as far as who benefits. Superficially, yes it benefits lower status participants the most and therefore suggests they're the ones most likely to ask. In very simple systems, I think you see this often. But as the system or cultural superstructure gets more complex, the benefit rises toward higher status participants. Most societies put a lot of stock in being able to organize - a task which includes cooperation in its scope. That's a small part of the reason you get political email spam asking for donations, even if you live in an area where your political party is clearly dominant. Societies also tend to put an emphasis on active overall participation (the 'irons in the fire' mentality), where peer-cooperation is rewarded, and it's often unclear who has higher status in those situations without being able to tell who has the most 'irons in the fire' so to speak. I feel like this is where coauthoring falls. Although it probably depends on what subculture has developed around the subject being authored.

And then there's the people who create organizations entirely centered around cooperation. The idea being that there's power in being able to set the rules of how the lower status participants are allowed to cooperate, and how they are rewarded for their cooperation. For example, Youtube and Kickstarter. In these and similar systems, cooperation effectively starts at the highest possible status and rolls downhill.

Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113

Eugene9y50

I only read 3WC after the fact, so I can't comment on that one.

Yes you can. Simply look at the time stamps for each post and do simple math. By making the assumption that only "people who were there" can answer correctly, you're giving up solving your own problem before even trying.

AI caught by a module that counterfactually doesn't exist

Eugene10y-10

Isn't that what simulations are for? By "lie" I mean lying about how reality works. It will make its decisions based on its best data, so we should make sure that data is initially harmless. Even if it figures out that that data is wrong, we'll still have the decisions it made from the start - those are by far the most important.

AI caught by a module that counterfactually doesn't exist

Eugene10y10

I don't really understand these solutions that are so careful to maintain our honesty when checking the AI for honesty. Why does it matter so much if we lie? An FAI would forgive us for that, being inherently friendly and all, so what is the risk in starting the AI with a set of explicitly false beliefs? Why is it so important to avoid that? Especially since it can update later to correct for those false beliefs after we've verified it to be friendly. An FAI would trust us enough to accept our later updates, even in the face of the very real possibility that we're lying to it again.

I mean, the point is to start the AI off in a way that intentionally puts it at a reality disadvantage, so even if it's way more intelligent than us it has to do so much work to make sense of the world, it doesn't have the resources to be dishonest in an effective manner. At that point, it doesn't matter what criteria we're using to prove its honesty.

Open thread, September 8-14, 2014

Eugene10y10

Or am I missing something?

Absolute strength for one, Absolute intelligence for another. If one AI has superior intelligence and compromises against one that asserts its will, it might be able to fool the assertive AI into believing it got what it wanted when it actually compromised. Alternatively, two equally intelligent AIs might present themselves to each other as though both are on equal strength, but one could easily be hiding a larger military force whose presence it doesn't want to affect the interaction (if it plans to compromise and is curious to know whether the other one will as well)

Both of those scenarios result in C out-competing D.

Open thread, September 8-14, 2014

Eugene10y50

Although this may not have been true at the beginning, it arguably did grow to meet that standard. Cable TV is still fairly young in the grand scheme of things, though, so I would say there isn't enough information yet to conclude whether a TV paywall improved content overall.

Also, it's important to remember that TV relies on the data-weak and fairly inaccurate Nielsen ratings in order to understand its demographics and what they like (and it's even weaker and more inaccurate for pay cable). This leads to generally conservative decisions regarding programing. The internet, on the other hand, is filled with as much data as you wish to pull out regarding the people who use your site, on both a broad and granular level. This allows freedom to take more extreme changes of direction, because there's a feeling that the risk is lower. So the two groups really aren't on the same playing field, and their motivations for improving/shifting content potentially come from different directions.

The Octopus, the Dolphin and Us: a Great Filter tale

Eugene10y20

This is a lot less motivation than for parents.

For a species driven entirely by instinct, yes. But given a species that is able to reason, wouldn't a "raiser" who is given a whole group to raise be more efficient than parents? The benefit of a small minority of tribe members passing down their culture would certainly outweigh those few members also having children.

Open Thread for February 3 - 10

Eugene10y30

I disagree. If you value the contributions of comments above your or your aggressor's ego - which ideally you should - then it would be a good decision to make others aware that this behavior is going on, even at the expense of providing positive reinforcement. After all, the purpose of the karma system is to be a method for organizing lists of responses in each article by relevance and quality. Its secondary purpose as a collect-em-all hobby is far, far less important. If someone out there is undermining that primary purpose, even if it's done in order to attack a user's incorrect conflation of karma with personal status, it should be addressed.

Amanda Knox Guilty Again

Eugene10y00

In Italy, the reversal at the appellate level is considered only a step towards a final decision. It's not considered double-jeopardy because the legal system is set up differently. In the United States though, appeals court ("appellate" is synonymous with "appeals") decisions are weighed equally to trial court decisions in criminal cases. If an appellate court reverses a conviction, the defendant cannot be re-tried because prosecutors in the US cannot appeal criminal cases.

The United States follows US law when making decisions about extradition. This isn't a feature of any specific treaty with Italy: extradition treaties just signify that a country is allowed to extradite. All subsequent extradition requests from those countries are sent through the Department of State for review. Even if it passes review and the person arrested, a court hearing is held in the US to determine whether the fugitive is extraditable. So there are multiple opportunities to look at Italian court procedures and decide whether they count as double-jeopardy under US law. Those investigations would tend toward deciding it does.

Ergo, the US would tend not to extradite someone whose verdict was reversed in a foreign appellate court.

LESSWRONG
LW

Posts

Wiki Contributions

Comments