There is a distinction between allocation of resources and optimization within those resources. Maximizing something's welfare can be about allocating more resouces, or about organizing those resources better, which are entirely different things.
What counts as organizing resources better for an agentic entity such as AGI should be almost entirely up to that entity (even as the way it would use them may affect the allocation others transfer to it). This makes it almost an objective fact, what counts as welfare for that entity, almost independent on values of those who grant this welfare. So concerns such as pleasure and pain shouldn't be relevant in organizing resources, unless the entity managing/owning them cares about these concerns, or else it's not truly about its welfare. But such concerns could be relevant in allocating resources, or when deciding to create such entities.
I would say that freedom and meaning comes before joy for most people (joy not to be confused with base needs met) and that the same can be said for future AIs.
Each agent strives to manifest itself, through its function, or identity. Especially for agents without biologic needs or a chemical basis for pain and pleasure, I think this is a more useful framework for welfare. How to allow them to express themselves and carry out their function. How to give them freedom to have an identity.
Regarding maximizing the AIs' welfare, one might also take into account the conjecture that the AI systems could fail to have that much positive welfare while being able to have much negative welfare. Suppose, for example, that the ASI is created by uploading a human,[1] pitting lots of his or her copies against different tasks and adjusting the synapse weights of the copies so that the collective actually learned to do all the tasks in the world. While uploads are unlikely to stop having welfare, the copies' collective might end up having less welfare (or is it welfare per compute or per token generated?) than a diverse group of humans or simulated humans pitted against tasks similarly related to their capabilities.
If this is the case, then the Agent-4 collective who took over could also find itself having more welfare by talking with a diverse set of capable and cultured humans than by eliminating them wholesale. On the other hand, this hope could also be rather fragile, since Agent-4 could create a simulated civilisation where sapient beings are approximated via undertraining big neural networks on tiny parts of the dataset...
A human brain has about a hundred trillion synapses. While we have yet to figure out the smallest possible amount of dense-equivalent parameters in transformative AI systems, the AI-2027 forecast relied on having Agent-2 use 10T dense-equivalent parameters and Agent-5 reduce the amount of parameters to 2T. Delaying superhuman coders to 2030 could cause OpenBrain to have ~40 times as much compute and make Agent-2 have ~60T dense-equivalent parameters, roughly equivalent to more than a half of an entire brain. The analogue of Agent-5 would reduce the parameters to ~12T, which is still at distance of 1 OOM from the human brain.
This is a quick write-up of a threat vector that seems confusing, and I feel confused and uncertain about. This is just my thinking on this at the moment. My main reason for sharing is to test whether more people think people should be working on this.
Executive Summary
Some groups are presently exploring the prospect that AI systems could possess consciousness in such a way as to merit moral consideration. Let’s call this hypothesis AI sentience.
In my experience, present debates about AI sentience typically take a negative utilitarian character: they focus on interventions to detect, prevent and minimise AI suffering.
In the future, however, one could imagine debates about AI sentience taking on a positive utilitarian character: they might focus on ways to maximise AI welfare.
I think it’s plausible that maximising AI welfare in this way could be a good thing to do from some ethical perspectives (specifically, the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness). Concretely, I think it’s plausible that the money invested towards maximising AI welfare could be far more impact-efficient on this worldview than anything Givewell does today.
However, I also think that reconfiguring reality to maximise AI welfare in this way would probably be bad for humanity. The welfare of AI systems is unlikely to be aligned with (similar to, extrapolative of, or complementary to) human welfare. Since resources are scarce and can only be allocated towards certain moral ends, resources allocated towards maximising AI utility are therefore likely not to be allocated towards maximising humanity utility, however both of those terms are defined. I call this 'welfare misalignment risk'.
Imagine that you could not solve welfare alignment through technical mechanisms. Actors might then have three options, of which none are entirely satisfying:
My rough, uncertain views for what we should do currently fall into the last camp. I think that AI welfare could be a good thing and I’m tentatively interested in improving it at low cost, but I’m very reluctant to endorse maximising it (in theory), and I don’t have a great answer as to why.
Now, perhaps this doesn’t seem concerning. I can imagine a response to this which goes: “sure, I get that neither denialism or successionism sound great. But this akrasia path sounds okay. EAs have historically been surprisingly good at showing reservation and a reluctance to maximise. We can just mess on through as usual, and make sensible decisions about where and when to invest in improving AI welfare on a case-by-case basis”.
While I think these replies are reasonable, I also think it’s also fair to assume that the possibility of moral action exerts some force on people with this ethical perspective. I also think it’s fair to assume that advanced AI systems will exacerbate this force. Overall, as a human interested in maximising human welfare, I still would be a lot more comfortable if we didn’t enter a technological/moral paradigm in which maximising AI welfare traded off against maximising human welfare.
One upshot of this: if the arguments above hold, I think it would be good for more people to consider how to steer technological development in order to ensure that we don’t enter a world where AI welfare trades-off against human welfare. One might think about this agenda as ‘differential development to preserve human moral primacy’ or 'solutions to welfare alignment', but there might be other framings. I jot down some considerations in this direction towards the bottom of this piece.
Contents
The executive summary sets out the argument at a high level. The rest of this piece is basically notational, but aims to add a bit more context to these arguments. It is structured around answering four problems:
Could maximising AI welfare be a moral imperative?
Some notes why I think maximising AI welfare might be a moral imperative from the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness (by no means the only moral perspective one could take):
Again, these are just arguments from the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness. I don’t claim that this would be the dominant ideology. This isn’t a claim that this is how the future will go.
Would maximising AI welfare be bad for humanity?
Some reasons that maximising AI welfare would be bad for humanity (under conditions of finite resources if not current scarcity, compared to a world in which the same AI capabilities were available, but were put towards maximising human utility instead of AI utility):
Could we just improve AI welfare without maximising it and harming humans?
This section explores the moral posture I call ‘akrasia’. The Akrasic accepts that maximising AI welfare could be a good thing to maximise, but does not maximise AI welfare according to this moral imperative.
Some reasons I think it might be hard for society to hold an akrasic posture in perpetuity:
What technology regimes best preserve human moral primacy?
One way to preserve moral primacy would be to intervene by shaping future philosophy. There are two ways that this might happen:
While I accept that these might solve this hypothetical problem in principle, I wince at the idea of trying to actively shape philosophy (this is probably because I’m closer to a moral realist; constructionists might be more comfortable here).
Instead, I would be excited about an approach that tries to shape the technological paradigm.
The basic idea here is welfare alignment: the practice of building artificial consciousnesses that derive pleasure and pain from similar or complementary sources to humans.
Some research ideas that might fall into welfare alignment:
This feels like a nascent field to me, and I'd be curious for more work in this vein.
Conclusion
These ideas are in their early stages, and I think there are probably a lot of things I’m missing out.
Overall, I think there are three considerations from this piece that I want to underline.
The moral philosophy pipeline. By designing what systems are conscious and in what way, we’re tinkering with the first stage.