# 1

Epistemic status: Confused about what probabilities people assert to different alignment theories working.

I essentially want to know whether it would be worth spending some of my time working on conceiving a database like https://www.moralmachine.net/ with more complexity as a benchmark for moral predictions in future ai systems. It, of course, highly depends on if it would be useful. According to my own probability distribution of potential future alignment structures, something like a complex moral database would seem to be worth it. First and foremost, alignment testing will be useful, and you would need a moral database for this. Secondly, it could be used as a groundwork for a friendly AI.

I've had some different ideas about how to construct it, and its usefulness will depend on its design. The basic architecture I have in mind is the following: Start with a basic scenario such as the trolley problem (moral machines) with the number of people and happiness among others as variables and add a Bayesian reasoner that gets the input of a situation where all variables are pseudo-randomized according to an "importance" algorithm described below. (Maybe one can get an average happiness vs. population ratio as a numerical "solution" to The Repugnant Conclusion using this?) The Bayesian reasoner is then asked what confidence they have to predict the current situation, and if this confidence is under a certain percentage, we send out an inquiry about the situation it currently is in. We also assign an "importance" rating to each problem, mainly determined by the number of lives affected. This "importance" rating would, in turn, determine the confidence required and the priority ranking of that situation being lifted to the system's attention. This is akin to a sorting system where the most frequently used items are put on top, but it's the most "important" situation instead. This should allow the moral database to focus on the most "important" decisions that we have and, therefore, be more efficient than random sampling.

The biggest problem with human sampling is our plentitude of cognitive biases, and that's usually where the concept of CEV comes in. The first approach would then be to say, "fuck that, I'll let the programmers do the heavy lifting," and just go ahead and create a moral database with all problems of cognitive biases existent within it. This, however, seems quite inefficient and a tad bit rude. The best solution I've thought of so far is combining an anti-cognitive bias phrasing of each moral dilemma with a few implicit assumptions, for example, that each human is worth the same in what I would call "first-order" utilitarian terms (e.g., that each human is worth the same when disregarding how many other humans they will probably save in the future).

The problem is that I don't know whether this would be efficient to do or not, as human judgment might be fundamentally flawed, and we might, because of that reason, avoid it entirely. This currently seems unlikely to me, but I want to ask the nice and smart people of LessWrong what they think about the implementation of the idea and whether it's fundamentally worth it?