Francis Jervis


Sorted by New

Wiki Contributions


While I entirely agree with the title of this essay, as a social scientist, I am not sure that this is really a social science agenda (yet).

The biggest issue is the treatment of human values as a "species-level" set of decision algorithms: this is orthogonal to the projects of anthropology and sociology (which broadly reject the idea of an essential core of human ethics). Trivially, the research here looks like it would produce a potentially misleading average result across cultural value systems, which might well not usefully represent either any individual system or produce a universally-acceptable "compromise" result in those cases which would be processed differently in different cultural frames.

Second, the use of the "debate" approach looks like a Schelling game mechanism - essentially the same adjudication "game" used in numerous cryptoeconomics schemes, notably Kleros and Aragon. Following that line of logic, what this proposal is really doing is training an AI to win Schelling bets: the AI is betting, like jurors in the Kleros project, that their "vote" will align with those of the human supervisors. This is a narrow field of value judgement, anthropologically speaking. Moreover, it is worth noting that there has been substantive progress in the use of ML systems to predict the outcome of litigation; however, I do not think that anyone working in that field would classify their work as progress towards friendly AI, and rightly so. Thus the question remains of whether an AI which was capable of producing consistently Schelling point-aligned judgements was actually running an "ethical decision algorithm" properly so called, or just a well-trained "case outcome predictor."

We can see how this mode of reasoning might apply, say, in the case of autonomous road vehicles, where the algorithm (doubtless to the delight of auto insurers) was trained to produce the outcomes, in the case of a crash, which would generate the lowest tort liability (civil claim) for the owner/insurer. This might well align with local cultural as well as legal norms (assuming the applicable tort law system corresponded reasonably well to the populace's culturally defined ethical systems), but such a correspondence would be incidental to the operation of the decision algorithm and would certainly not reflect any "universal" system of human values.

Finally, I would question whether this agenda (valuable though it is) is really social science. It looks a lot more like cognitive science. This is not a merely semantic distinction. The "judicial model" of AI ethics which is proposed here is, I think, so circumscribed as to produce results of only limited applicability (as in the example above). Specifically, the origin of ethical-decisional cognition is always already "social" in a sense which this process does not really capture: not only are ethical judgements made in their particular historical-geographical-cultural context, they also arise from the interaction between agent and world in a way which the highly restricted domain proposed here effectively (and deliberately?) excludes.

Thus while the methodology proposed here would generate somewhat useful results (we might, for instance, design a RoboJuror to earn tokens in a Kleros-like distributed dispute resolution system, based on the Schelling bet principle), it is questionable whether it advances the goal of friendly AI in itself. AI Safety does need social scientists - but they need to study more radically social processes, in more embedded ways, than is proposed here.