A DeepMind-like AI is trained and executed on a decentralized supercomputer like the Golem network, perhaps one that also permits microtransactions like the IOTA Tangle.

The algorithm collects texts posted by citizens on public fora and processes them by topic using sentiment analysis algorithms. From actions A1 to An, it takes action Ax which satisfies the following condition:

Find maximum discontent of the relevant kind voiced by citizens under each plan max(A1), ..., max(An). Across all plans, find the minimum value among the maximum discontent min(max(A1), ..., max(An)). Definition: max(Ax) = min(max(A1), ..., max(An)).

Even if such an AI is hard coded to forbid it from preventing free speech on those fora, how could we possibly trust it not to surreptitiously silence citizens in some other way to prevent them from voicing their discontent?

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 9:18 PM

I strongly suspect that you cannot, with a feedback loop as you describe. If you measure discontent based on social media, suffering that does not get posted to social media effectively does not exist. The AI would need a way of somehow recognizing that the social media are only its window to the discontent that exists in the world beyond, which is what it is intended to minimize. Proverbially, it would need to be able to look at the moon rather than the finger pointing to it.

Thanks. As far as I can tell, this is symptomatic of a general class of problems with trying to gauge what humans want. If you want to maximize fun, you need a model of what leads to fun. You must somehow gather this data from humans. The only foolproof way to do this is to recreate the human mind in the computer, which brings its own problems. Is there a way out I'm not seeing?