Averting suffering with sentience throttlers (proposal)

by Quinn2 min read5th Apr 20215 comments


Risks of Astronomical Suffering (S-risks)AIWorld Optimization

Status: Originally written as the round two interview task for Center on Long-Term Risk's Summer Research Fellowship, but before it was finished I trashed it in favor of something under the Cooperative AI agenda. At this point it's a free idea, do whatever you want with it. Being an unfinished proposal, the quality bar of this post is not high.

Some history: I originally developed this idea as a plotline in a game I wrote in the summer of 2019, I only considered making it into a rigorous research program very recently. I find it odd that that I haven't found a literature on this, I asked some people who would probably know if it existed and they hadn't heard of it.

Written at CEEALAR

The problem termed “Suffering in AI Workers” (Gloor 2016) can be averted by sentience throttlers. Within scope for a summer fellowship is to write down a formalization of throttlers at a high level of abstraction, then follow it up with lower-level work about specific theories of sentience and specific AI architectures/paradigms.

What is the problem?

In the default outcome, astronomical amounts of subroutines will be spun up in pursuit of higher-level goals, whether those goals are aligned with the complexity of human value or aligned with paperclips. Without firm protections in place, these subroutines might experience some notion of suffering (Tomasik 2017).


One way to prevent suffering is to prevent sentience. If it is the case that sentience “emerges” from intelligence, then one obvious way forward is to suppress the “emergent” part while retaining competitive intelligence. This is all a throttler is- it is some technology that constrains systems in this way.

In the spirit of anticipating, legitimizing, and fulfilling governance demands (Critch 2020) (though substituting “governance” demands with “moral” demands), it makes sense to work on anticipating and legitimizing before fulfilling. This will first involve work at a level of abstraction that proposes the construction of a function that consumes a theory of sentience and something mind-like as input and produces a specification for a safe (defined as not tending toward sentience) but competitive implementation as output. As a computer scientist and not a neuroscientist, I’m interested in focusing on non-biological applications of throttlers, but I do expect to quantify over all minds at this first level of abstraction, and I can gesture vaguely in the direction of biotech and say that biological minds may be throttleable as well.

When the substrate in question is AI technologies, imagine you’re in the role of an ethics officer at Alphabet, reasoning with engineering managers about abiding by the specification described by our function. Genovese 2018 claims that systems without emergent properties are feasible, but more difficult to implement than systems subject to emergent properties, implying that even if it was reasonable for the performance of Alphabet’s products to implement throttled versions of the technologies it may not be reasonable to ask them to invest requisite extra engineer labor hours into it. Thus, it would be ideal to also create programming abstractions that make the satisfaction of these specifications as easy as possible, which adds a dimension to the research opportunity.

How exactly I would go about dumping a summer into this

The notion throttlers is so undertheorized that I think the first level of abstraction, that quantifies over all theories of consciousness and all minds, will be a value-add. With this level on paper, people would have an anchor-point to criticize and build on. However, such a high abstraction level might be unsatisfying, because its practical consequences are so far away. Therefore, I would like to provide specific input-output pairs in this “function”, such as Integrated Information Theory (IIT), Deep Learning (DL), and the appropriate specification for i.e. a way to reduce integrated information in a neural network, perhaps applying some of the ideas in Aguilera & Di Paolo 2017.

IIT stands out because it is the most straightforward to operationalize computationally, but perhaps a better project would be to increase the ability to reason computationally about eliminativism Tomasik 2014 or illusionism for the purpose of specifying throttlers assuming either of those theories are true. As for the substrate, beyond thinking about deep neural nets altogether we could think about RL-like, transformer-like, or NARS-like (Wang unknown year) architectures. The latter in particular is important because NARS has been used to coordinate and unify other ML technologies (Hammer et. al. 2020) as subroutines, in exactly the way that the above moral threat model suggests. While it may not be tractable to properly output an actionable specification within a summer, lots of concrete work could be done.


5 comments, sorted by Highlighting new comments since Today at 11:14 PM
New Comment

If I'm skimming right, you're saying that you'd like to know how to write code that calculates how much consciousness a calculation has. Correct? I agree that such a thing would be nice. But it's at least as hard (and probably much harder) than solving the problem of consciousness. Also, if illusionism is correct, I'm not sure that writing such code is even possible.

FWIW, I haven't deeply studied it, but I'm currently very much inclined to believe that IIT is just a bunch of baloney, see this blog post

In the default outcome, astronomical amounts of subroutines will be spun up in pursuit of higher-level goals, whether those goals are aligned with the complexity of human value or aligned with paperclips. Without firm protections in place, these subroutines might experience some notion of suffering

Surely, an human goal aligned ASI wouldn't want to make suffering subroutines. 

For paperclip maximizers, there are 2 options, either suffering based algorithms are the most effective way of achieving important real world tasks, or they aren't. In the latter case, no problem, the paperclip maximizer won't use them. (Well you still have a big problem, namely the paperclip maximizer)

In the former case, you would need to design a system that intrinsically wanted not to make suffering subroutines, and had that goal stable under self improvement. The level of competence and understanding needed to do this is higher than the amount needed to realize you are making a paperclip maximizer, and not turn it on. 

I wonder how it would be received if we applied the same reasoning to humans and animals.

Humans might (and in fact do) undergo a lot of suffering. If we could identify people who are likely to suffer high amounts of suffering, then should we put in sentience throttling so that they don't feel it? Seems very Brave New World.

How about with animals? If we could somehow breed chickens that are identical to current ones except that they don't feel pain or suffering, would that make factory farming ethical? Here the answer might be yes, though I'm not sure the animal rights crowd would agree.

Note that you can have pain without suffering and vice versa. 

The OP was talking about suffering but it wasn't clear to me whether pain was included or not.