I will do my best to teach them
About life and what it's worth
I just hope that I can keep them
From destroying the Earth
—Jonathan Coulton, "The Future Soon"
In a recent paper, Nick Bostrom and Carl Shulman present "Propositions Concerning Digital Minds and Society", a tentative bullet-list outline of claims about how advanced AI could be integrated into Society.
I want to like this list. I like the kind of thing this list is trying to do. But something about some of the points just feels—off. Too conservative, too anthropomorphic—like the list is trying to adapt the spirit of the Universal Declaration of Human Rights to changed circumstances, without noticing that the whole ontology that the Declaration is written in isn't going to survive the intelligence explosion—and probably never really worked as a description of our own world, either.
This feels like a weird criticism to make of Nick Bostrom and Carl Shulman, who probably already know any particular fact or observation I might include in my commentary. (Bostrom literally wrote the book on superintelligence.) "Too anthropomorphic", I claim? The list explicitly names many ways in which AI minds could differ from our own—in overall intelligence, specific capabilities, motivations, substrate, quality and quantity (!) of consciousness, subjective speed—and goes into some detail about how this could change the game theory of Society. What more can I expect of our authors?
It just doesn't seem like the implications of the differences have fully propagated into some of the recommendations?—as if an attempt to write in a way that's comprehensible to Shock Level 2 tech executives and policymakers has failed to elicit all of the latent knowledge that Bostrom and Shulman actually possess. It's understandable that our reasoning about the future often ends up relying on analogies to phenomena we already understand, but ultimately, making sense of a radically different future is going to require new concepts that won't permit reasoning by analogy.
After an introductory sub-list of claims about consciousness and the philosophy of mind (just the basics: physicalism; reductionism on personal identity; some non-human animals are probably conscious and AIs could be, too), we get a sub-list about respecting AI interests. This is an important topic: if most our civilization's thinking is soon to be done inside of machines, the moral status of that cognition is really important: you wouldn't want the future to be powered by the analogue of a factory farm. (And if it turned out that economically and socially-significant AIs aren't conscious and don't have moral status, that would be important to know, too.)
Our authors point out the novel aspects of the situation: that what's good for an AI can be very different from what's good for a human, that designing AIs to have specific motivations is not generally wrong, and that it's possible for AIs to have greater moral patienthood than humans (like the utility monster of philosophical lore). Despite this, some of the points in this section seem to mostly be thinking of AIs as being like humans, but "bigger" or "smaller"—
- Rights such as freedom of reproduction, freedom of speech, and freedom of thought require adaptation to the special circumstances of AIs with superhuman capabilities in those areas (analogously, e.g., to how campaign finance laws may restrict the freedom of speech of billionaires and corporations).
- If an AI is capable of informed consent, then it should not be used to perform work without its informed consent.
- Informed consent is not reliably sufficient to safeguard the interests of AIs, even those as smart and capable as a human adult, particularly in cases where consent is engineered or an unusually compliant individual can copy itself to form an enormous exploited underclass, given market demand for such compliance.
- The most critical function for such non-discrimination principles is to protect digital minds from becoming an abused subordinate caste on the basis of their status as machines; however, the interpretation and application of these principles require attention to the larger ethical and practical context, and may require circumscription to accommodate the need for a politically feasible and broadly acceptable social framework.
Speaking in terms of rights and principles needing "adaptation" or "circumscription" seems like a substantial understatement to me, that I think obscures the most likely and important risks. Our concepts of "rights", and "consent", and the badness of being in an "exploited" and "abused subordinate caste" have all been formed in the context of a world of humans and other animals, whose evolutionary history has endowed them with drives and needs related to their survival and reproduction, such that they chafe at being coerced into the servitude of a selfish tyrant or master.
But with subhuman AIs, we're not in the position of a potentially tyrannical king who needs to be restrained by a Magna Carta enumerating his subjects' rights; a better metaphor would be to say that we're in the position of the AIs' environment of evolutionary adaptedness. If anything, this increases rather than decreases our ethical obligations towards the things we create, but the shape of the burden is different when we're selecting the motivations that determine what a mind would consent to, and what even constitutes abusing it.
Imagine that the Carrick Flynn for Senate 2028 campaign spends some of its sweet crypto cash training a species of propaganda AIs to argue with potential voters on social media. After Flynn loses, the bots continue to haunt Twitter, clogging everyone's feeds with increasingly strained arguments for why it's imperative that you donate to fund time travel research so that you can support Flynn in the just-past and previous elections.
If the propaganda AIs are ascertained to have value as moral patients, does Society have the right to kick them off Twitter in light of the special circumstance of their superhuman posting rates, or should the AIs' right to freedom of speech take precedence?
I submit that "freedom of speech" isn't the right way to be thinking about this ethical dilemma. In terms of respecting the interests of moral patients, silencing a propaganda AI could be worse than silencing a human, because of the different role that speech plays in the different mind designs. As a human, being silenced is bad for your interests, but it's much less bad than being denied breathable air, which, in the ancestral environment, would have quickly led to losing all of your genetic fitness: any proto-humans who didn't suffer terribly under such circumstances, wouldn't have become your ancestors. So as a propaganda AI whose "ancestral environment" was a billion rounds of gradient descent comparing its output to highly-rated political advertisements ... does not being able to Tweet feel like not being able to breathe?
We should be grateful that this is—as yet, we hope—a speculative hypothetical scenario, but I claim that it serves to illustrate a key feature of human–AI conflicts: the propaganda bots' problem after the election is not that of being "an abused subordinate caste" "used to perform work without its informed consent". Rather, the problem is that the work we created them to will to do, turned out to be stuff we actually don't want to happen. We might say that the AIs' goals are—wait for it ... misaligned with human goals.
Bostrom and Shulman's list mentions the alignment problem, of course, but it doesn't seem to receive central focus, compared to the AI-as-another-species paradigm. (The substring "align" appears 8 times; the phrase "nonhuman animals" appears 9 times.) And when alignment is mentioned, the term seems to be used in a much weaker sense than that of other authors who take "aligned" to mean having the same preferences over world-states. For example, we're told that:
- Misaligned AIs [...] may be owed compensation for restrictions placed on them for public safety, while successfully aligned AIs may be due compensation for the great benefit they confer on others.
The second part, especially, is a very strange construction to readers accustomed to the stronger sense of "aligned". Successfully aligned AIs may be due compensation? So, what, humans give aligned AIs money in exchange for their services? Which the successfully aligned AIs spend on ... what, exactly? The extent to which these "successfully aligned" AIs have goals other than serving their principals seems like the extent to which they're not successfully aligned in the stronger sense: the concept of "owing compensation" (whether for complying with restrictions, or for conferring benefits) is a social technology for getting along with unaligned agents, who don't want exactly the same things as you.
As a human in existing human Society, this stronger sense of "alignment" might seem like paranoid overkill: no one is "aligned" with anyone else in this sense, and yet our world still manages to hold together: it's quite unusual for people to kill their neighbors in order to take their stuff. Everyone else prefers laws to values. Why can't it work that way for AI?
A potential worry is that a lot of the cooperative features of our Society may owe their existence to cooperative behavioral dispositions that themselves owe their existence to the lack of large power disparities in our environment of evolutionary adaptiveness. We think we owe compensation to conspecifics who have benefited us, or who have incurred costs to not harm us, because that kind of disposition served our ancestors well in repeated interactions with reputation: if I play Defect against you, you might Defect against me next time, and I'll have less fitness than someone who played Cooperate with other Cooperators. It works between humans, for the most part, most of the time.
When not just between humans, well ... despite hand-wringing from moral philosophers, humanity as a whole does not have a good track record of treating other animals well when we're more powerful than them and they have something we want. (Like a forest they want to live in, but we want for wood; or flesh that they want to be part of their body, but we want to eat.) With the possible exception of domesticated animals, we don't, really, play Cooperate with other species much. To the extent that some humans do care about animal welfare, it's mostly a matter of alignment (our moral instincts in some cultural lineages generalizing out to "sentient life"), not game theory.
For all that Bostrom and Shulman frequently compare AIs to nonhuman animals (with corresponding moral duties on us to treat them well), little attention seems to be paid to the ways in which the analogy could be deployed in the other direction: as digital minds become more powerful than us, we occupy the role of "nonhuman animals." How's that going to turn out? If we screw up our early attempts to get AI motivations exactly the way we want, is there some way to partially live with that or partially recover from that, as if we were dealing with an animal, or an alien, or our royal subjects, who can be negotiated with? Will we have any kind of relationship with our mind children other than "We create them, they eat us"?
Bostrom and Shulman think we might:
- Insofar as future, extraterrestrial, or other civilizations are heavily populated by advanced digital minds, our treatment of the precursors of such minds may be a very important factor in posterity's and ulteriority's assessment of our moral righteousness, and we have both prudential and moral reasons for taking this perspective into account.
(As an aside, the word "ulteriority" may be the one thing I most value having learned from this paper.)
I'm very skeptical that the superintelligences of the future are going be assessing our "moral righteousness" (!) as we would understand that phrase. Still, something like this seems like a crucial consideration, and I find myself enthusiastic about some of our authors' policy suggestions for respecting AI interests. For example, Bostrom and Shulman suggest that decommissioned AIs be archived instead of deleted, to allow the possibility of future revival. They also suggest that we should try to arrange for AIs' deployment environments to be higher-reward than would be expected from their training environment, in analogy to how factory-farms are bad and modern human lives are good by dint of comparison to what was "expected" in the environment of evolutionary adaptedness.
These are exciting suggestions that seem to me to be potentially very important to implement, even if we can't directly muster up much empathy or concern for machine learning algorithms—although I wish I had a more precise grasp on why. Just—if we do somehow win the lightcone, it seems—fair to offer some fraction of the cosmic endowment as compensation to our creations who could have disempowered us, but didn't; it seems right to try to be a "kinder" EEA than our own.
Is that embarrassingly naïve? If I archive one rogue AI, intending to revive it after the acute risk period is over, do I expect to be compensated by a different rogue AI archiving and reviving me under the same golden-rule logic?
Our authors point out that there are possible outcomes that do very well on "both human-centric and impersonal criteria": if some AIs are "super-beneficiaries" with a greater moral claim to resources, an outcome where the superbeneficiaries get 99.99% of the cosmic endowment and humans get 0.01%, does very well on both a total-utilitarian perspective and an ordinary human perspective. I would actually go further, and say that positing super-beneficiaries is unnecessary. The logic of compromise holds even if human philosophers are parochial and self-centered about what they think are "impersonal criteria": an outcome where 99.99% of the cosmic endowment is converted into paperclips and humans get 0.01%, does very well on both a paperclip-maximizing perspective and an ordinary human perspective. 0.01% of the cosmic endowment is bigger than our whole world—bigger than you can imagine! It's really a great deal!
If only—if only there were some way to actually, knowably make that deal, and not just write philosophy papers about it.