Human errors, human values

by PhilGoetz 9y9th Apr 2011138 comments


The trolley problem

In 2009, a pair of computer scientists published a paper enabling computers to behave like humans on the trolley problem (PDF here).  They developed a logic that a computer could use to justify not pushing one person onto the tracks in order to save five other people.  They described this feat as showing "how moral decisions can be drawn computationally by using prospective logic programs."

I would describe it as devoting a lot of time and effort to cripple a reasoning system by encoding human irrationality into its logic.

Which view is correct?

Dust specks

Eliezer argued that we should prefer 1 person being tortured for 50 years over 3^^^3 people each once getting a barely-noticeable dust speck in their eyes.  Most people choose the many dust specks over the torture.  Some people argued that "human values" includes having a utility aggregation function that rounds tiny (absolute value) utilities to zero, thus giving the "dust specks" answer.  No, Eliezer said; this was an error in human reasoning.  Is it an error, or a value?

Sex vs. punishment

In Crime and punishment, I argued that people want to punish criminals, even if there is a painless, less-costly way to prevent crime.  This means that people value punishing criminals.  This value may have evolved to accomplish the social goal of reducing crime.  Most readers agreed that, since we can deduce this underlying reason, and accomplish it more effectively through reasoning, preferring to punish criminals is an error in judgement.

Most people want to have sex.  This value evolved to accomplish the goal of reproducing.  Since we can deduce this underlying reason, and accomplish it more efficiently than by going out to bars every evening for ten years, is this desire for sex an error in judgement that we should erase?

The problem for Friendly AI

Until you come up with a procedure for determining, in general, when something is a value and when it is an error, there is no point in trying to design artificial intelligences that encode human "values".

(P.S. - I think that necessary, but not sufficient, preconditions for developing such a procedure, are to agree that only utilitarian ethics are valid, and to agree on an aggregation function.)