FAI and the Information Theory of Pleasure

johnsonmx

Previously, I talked about the mystery of pain and pleasure, and how little we know about what sorts of arrangements of particles intrinsically produce them.

Up now: should FAI researchers care about this topic? Is research into the information theory of pain and pleasure relevant for FAI? I believe so! Here are the top reasons I came up with while thinking about this topic.

An important caveat: much depends on whether pain and pleasure (collectively, 'valence') are simple or complex properties of conscious systems. If they’re on the complex end of the spectrum, many points on this list may not be terribly relevant for the foreseeable future. On the other hand, if they have a relatively small “kolmogorov complexity” (e.g., if a ‘hashing function’ to derive valence could fit on a t-shirt), crisp knowledge of valence may be possible sooner rather than later, and could have some immediate relevance to current FAI research directions.

Additional caveats: it’s important to note that none of these ideas are grand, sweeping panaceas, or are intended to address deep metaphysical questions, or aim to reinvent the wheel- instead, they’re intended to help resolve empirical ambiguities and modestly enlarge the current FAI toolbox.

1. Valence research could simplify the Value Problem and the Value Loading Problem. If pleasure/happiness is an important core part of what humanity values, or should value, having the exact information-theoretic definition of it on-hand could directly and drastically simplify the problems of what to maximize, and how to load this value into an AGI.

2. Valence research could form the basis for a well-defined ‘sanity check’ on AGI behavior. Even if pleasure isn’t a core terminal value for humans, it could still be used as a useful indirect heuristic for detecting value destruction. I.e., if we’re considering having an AGI carry out some intervention, we could ask it what the expected effect is on whatever pattern precisely corresponds to pleasure/happiness. If there’s be a lot less of that pattern, the intervention is probably a bad idea.

3. Valence research could help us be humane to AGIs and WBEs. There’s going to be a lot of experimentation involving intelligent systems, and although many of these systems won’t be “sentient” in the way humans are, some system types will approach or even surpass human capacity for suffering. Unfortunately, many of these early systems won’t work well— i.e., they’ll be insane. It would be great if we had a good way to detect profound suffering in such cases and halt the system.

4. Valence research could help us prevent Mind Crimes. Nick Bostrom suggests in Superintelligence that AGIs might simulate virtual humans to reverse-engineer human preferences, but that these virtual humans might be sufficiently high-fidelity that they themselves could meaningfully suffer. We can tell AGIs not to do this- but knowing the exact information-theoretic pattern of suffering would make it easier to specify what not to do.

5. Valence research could enable radical forms of cognitive enhancement. Nick Bostrom has argued that there are hard limits on traditional pharmaceutical cognitive enhancement, since if the presence of some simple chemical would help us think better, our brains would probably already be producing it. On the other hand, there seem to be fewer a priori limits on motivational or emotional enhancement. And sure enough, the most effective “cognitive enhancers” such as adderall, modafinil, and so on seem to work by making cognitive tasks seem less unpleasant or more interesting. If we had a crisp theory of valence, this might enable particularly powerful versions of these sorts of drugs.

6. Valence research could help align an AGI’s nominal utility function with visceral happiness. There seems to be a lot of confusion with regard to happiness and utility functions. In short: they are different things! Utility functions are goal abstractions, generally realized either explicitly through high-level state variables or implicitly through dynamic principles. Happiness, on the other hand, seems like an emergent, systemic property of conscious states, and like other qualia but unlike utility functions, it’s probably highly dependent upon low-level architectural and implementational details and dynamics. In practice, most people most of the time can be said to have rough utility functions which are often consistent with increasing happiness, but this is an awfully leaky abstraction.

My point is that constructing an AGI whose utility function is to make paperclips, and constructing a sentient AGI who is viscerally happy when it makes paperclips, are very different tasks. Moreover, I think there could be value in being able to align these two factors— to make an AGI which is viscerally happy to the exact extent it’s maximizing its nominal utility function.

(Why would we want to do this in the first place? There is the obvious semi-facetious-but-not-completely-trivial answer— that if an AGI turns me into paperclips, I at least want it to be happy while doing so—but I think there’s real potential for safety research here also.)

7. Valence research could help us construct makeshift utility functions for WBEs and Neuromorphic AGIs. How do we make WBEs or Neuromorphic AGIs do what we want? One approach would be to piggyback off of what they already partially and imperfectly optimize for already, and build a makeshift utility function out of pleasure. Trying to shoehorn a utility function onto any evolved, emergent system is going to involve terrible imperfections, uncertainties, and dangers, but if research trends make neuromorphic AGI likely to occur before other options, it may be a case of “something is probably better than nothing.”

One particular application: constructing a “cryptographic reward token” control scheme for WBEs/neuromorphic AGIs. Carl Shulman has suggested we could incentivize an AGI to do what we want by giving it a steady trickle of cryptographic reward tokens that fulfill its utility function- it knows if it misbehaves (e.g., if it kills all humans), it’ll stop getting these tokens. But if we want to construct reward tokens for types of AGIs that don’t intrinsically have crisp utility functions (such as WBEs or neuromorphic AGIs), we’ll have to understand, on a deep mathematical level, what they do optimize for, which will at least partially involve pleasure.

8. Valence research could help us better understand, and perhaps prevent, AGI wireheading. How can AGI researchers prevent their AGIs from wireheading (direct manipulation of their utility functions)? I don’t have a clear answer, and it seems like a complex problem which will require complex, architecture-dependent solutions, but understanding the universe’s algorithm for pleasure might help clarify what kind of problem it is, and how evolution has addressed it in humans.

9. Valence research could help reduce general metaphysical confusion. We’re going to be facing some very weird questions about philosophy of mind and metaphysics when building AGIs, and everybody seems to have their own pet assumptions on how things work. The better we can clear up the fog which surrounds some of these topics, the lower our coordinational friction will be when we have to directly address them.

Successfully reverse-engineering a subset of qualia (valence- perhaps the easiest type to reverse-engineer?) would be a great step in this direction.

10. Valence research could change the social and political landscape AGI research occurs in. This could take many forms: at best, a breakthrough could lead to a happier society where many previously nihilistic individuals suddenly have “skin in the game” with respect to existential risk. At worst, it could be a profound information hazard, and irresponsible disclosure or misuse of such research could lead to mass wireheading, mass emotional manipulation, and totalitarianism. Either way, it would be an important topic to keep abreast of.

These are not all independent issues, and not all are of equal importance. But, taken together, they do seem to imply that reverse-engineering valence will be decently relevant to FAI research, particularly with regard to the Value Problem, reducing metaphysical confusion, and perhaps making the hardest safety cases (e.g., neuromorphic AGIs) a little bit more tractable.

"wireheading ... how evolution has addressed it in humans"

It hasn't - that's why people do drugs (including alcohol). What is stopping all humans from wireheading is that all currently available methods work only short term and have negative side effects. The ancestral environment didn't allow for the human kind to self-destruct by wireheading. Maybe peer pressure to not do drugs exists but there is also peer pressure in the other direction.

What is stopping all humans for wireheading is that all currently available methods work only short term and have negative side effects.

Maybe that's how evolution addressed it.

How are humans supposed to generate cryptographic reward tokens in such a way that an AI could not duplicate the process?

It would probably be highly dependent on the AI's architecture. The basic idea comes from Shulman and Bostrom - Superintelligence, chapter 9, in the "Incentive methods" section (loc 3131 of 8770 on kindle).

My understanding is that such a strategy could help as part of a comprehensive strategy of limitations and inventivization but wouldn't be viable on its own.

Where is the cultural context in all of this? How does that play in? Pain and pleasure here in the West is different than in the East just as value systems are different. When it comes to creating AGI I think a central set of agreed upon tenets are important. What is valuable? How can we quantify that in a way that makes sense to create AGI? If we want to reward it for doing good things, we have to consider cultural validation. We don't steal, murder or assault people because we have significant cultural incentive not to do so, especially if you live in a stable country. I think that could help. If we can somehow show group approval of the AGI, like favorable opinions, verbal validation and other things that I intrinsically values as we do. We could use our own culture to reinforce norms within it's archetecture.

A rigorous theory of valence wouldn't involve cultural context, much as a rigorous theory of electromagnetism doesn't involve cultural context.

Cultural context may matter a great deal in terms of how to build a friendly AGI that preserves what's valuable about human civilization-- or this may mostly boil down to the axioms that 'pleasure is good' and 'suffering is bad'. I'm officially agnostic on whether value is simple or complex in this way.

One framework for dealing with the stuff you mention is Coherent Extrapolated Volition (CEV)- it's not the last word on anything but it seems like a good intuition pump.

And I guess I'm saying that the sooner we think about these sorts of things the better off we'll be. Going for pleasure good/suffering bad reduced the mindset of AI to about 2 years old. Cultural context gives us a sense of maturity Valence or no.

In what way is current research into qualia not valency research?

In what way would the valency research we dont yet have prove more successful?

Are you referring to any specific "current research into qualia", or just the idea of qualia research in general? I definitely agree that valence research is a subset of qualia research- but there's not a whole lot of either going on at this point, or at least not much that has produced anything quantitative/predictive.

I suspect valence is actually a really great path to approach more 'general' qualia research, since valence could be a fairly simple property of conscious systems. If we can reverse-engineer one type of qualia (valence), it'll help us reverse other types.

There's a lot of philosophical research, and very little scientific research. That confirms the impression of philosophers qualia are a Hard Problem.

How do you reverse engineer a quale? how do you tell you have succeeded? I think that you have underestimated the hardness of the problem.

I do have some detailed thoughts on your two questions-- in short, given certain substantial tweaks, I think IIT (or variants by Tegmark/Griffiths) can probably be salvaged from its (many) problems in order to provide a crisp dataset on which to base testable hypotheses about qualia.

(If you're around the Bay Area I'd be happy to chat about this over a cup of coffee or something.)

I would emphasize, though, that this post only talks about the value results in this space would have for FAI, and tries to be as agnostic as possible on how any reverse-engineering may happen.

Im still not seeing how IIT would help with confirming that an attempt at reverse engineering had succeeded, absent circular reasoning along the lines of "IIT says the system will have qualia. therefore the system wil have qualia".

Testing hypotheses derived from or inspired by IIT will probably be on a case-by-case basis. But given some of the empirical work on coma patients IIT has made possible. I think it may be stretching things to critique IIT as wholly reliant on circular reasoning.

That said, yes there are deep methodological challenges with qualia that any approach will need to overcome. I do see your objection quite clearly- I'm confident that I address this in my research (as any meaningful research on this must do) but I don't expect you to take my word for it. The position that I'm defending here is simply that progress in valence research will have relevance to FAI research.

Out of curiosity, do you think valence has a large or small kolgoromov complexity?

Out of curiosity, do you think valence has a large or small kolgoromov complexity?

I think it's smallish. and that's philosoophy, because I don't have qualiometer.

But given some of the empirical work on coma patients IIT has made possible. I

refs?

The stuff by Casali is pretty topical, e.g. his 2013 paper with Tononi.

You mean this?

But that isn't really saying anything about qualia. The authors can relate their PCI measure to consciousness as judged medically... in humans. But would that scale be applicable to very simple systems or artificial systems? There is a real possibility that qualia could go missing in computational simulations,even assuming strict physicalism. In fact , we standardly assume that AIs embedded in games don't suffer.

If you're looking for a Full, Complete Data-Driven And Validated Solution to the Qualia Problem, I fear we'll have to wait a long, long time. This seems squarely in the 'AI complete' realm of difficulty.

But if you're looking for clever ways of chipping away at the problem, then yes, Casali's Perturbational Complexity Index should be interesting. It doesn't directly say anything about qualia, but it does indirectly support Tononi's approach, which says much about qualia. (Of course, we don't yet know how to interpret most of what it says, nor can we validate IIT directly yet, but I'd just note that this is such a hard, multi-part problem that any interesting/predictive results are valuable, and will make the other parts of the problem easier down the line.)

nd will make the other parts of the problem easier down the line

That's what I am disputing. You are taking a problem we don;t know how to make a start on, and turning it into a smaller problem we also don't know how to make a start on. That is't an advance. Reducing or simplifying a problem isn't an unconditional, universal solvent, it only works where the simpler problem is one you can actually make progress on.

IIT isn't going toi be of any real use unless it is confirmed, and how are you goign to confirm it, as a theory of qualia, without qualiometers?

If we are going to continue not having qualiometers, we may have to give up on testing consciousness objectiively in favour oof subjective measures...phenomenology and heterophenomenology. But you can only do heterophenomenology on a system that can report its subjective sates. Starting with simpler systems, like a single simulated pain receptor, is not going to work.

We're not on the same page. Let's try this again.

The assertion I originally put forth is AI safety; it is not about reverse-engineering qualia. I'm willing to briefly discuss some intuitions on how one may make meaningful progress on reverse-engineering qualia as a courtesy to you, my anonymous conversation partner here, but since this isn't what I originally posted about I don't have a lot of time to address radical skepticism, especially when it seems like you want to argue against some strawman version of IIT.
You ask for references (in a somewhat rude monosyllabic manner) on "some of the empirical work on coma patients IIT has made possible" and I give you exactly that. You then ignore it as "not really qualia research"- which is fine. But I'm really not sure how you can think that this is completely irrelevant to supporting or refuting IIT: IIT made a prediction, Casali et al. tested the prediction, the prediction seemed to hold up. No qualiometer needed. (Granted, this would be a lot easier if we did have them.)

This apparently leads to you say,

You are taking a problem we don;t know how to make a start on, and turning it into a smaller problem we also don't know how to make a start on.

More precisely, I'm taking a problem you don't know how to make a start on, and am turning it into a smaller problem that you also don't seem to know how to make a start on. Which is fine, and I don't wish to be a jerk about it, and not merely because Tononi/Tegmark/Griffith could be wrong in how they're approaching consciousness, and I could be wrong in how I'm adapting their stuff to try to explain some specific things about qualia. But you seem to just want to give up, to put this topic beyond the reach of science, and criticize anyone trying to find clever indirect approaches. Needless to say I vehemently disagree with the productiveness of that attitude.

I think we are in agreement that valence could be a fairly simple property. I also agree that the brain is Vastly Complex, and that qualia research has some excruciatingly difficult methodological hurdles to overcome, and I agree that IIT is still a very speculative hypothesis which shouldn't be taken on faith. I think we differ radically on our understandings of IIT and related research. I guess it'll be an empirical question whether IIT morphs into something that can substantially address questions of qualia- based on my understandings and intuitions, I'm pretty optimistic about this.

The assertion I originally put forth is AI safety;

I actually agree with aim of using some basic, "visceral" drive for AI safety. I have argued that making an AIs top-level drive the same as it's ostensible purpose, paperclipping or whatever, is a potential disaster, because any kind of cease and desist command has to be a "non maskable interrupt" that overrides everything else.

But if all you are doing is trying to constrain an AIs behaviour, you have the opportunity to use methodological behaviourism, because you are basically trying to get a certain kind of response to a certain kind of input ..you can sidestep the Hard Problem.

But that isn't anything very new. The functional/behavioural equivalents of pleasure and pain are positive and negative reinforcement, which machine learning systems have already.(That's somewhat new to MIRIland, because MIRI tends not to take much notice that large and important class of AIs, but otherwise it isn't new).

You list a number of useful things one could do with an understanding of pain and pleasure as qualia. The hypotheticals are true enough, because there are a lot of things one could do with an understanding of qualia. But valency isn't really a simplification of the Hard Problem..it just appears to be one. In other words, if you are aiming at AI control, then bringing in qualia just makes things considerably more difficult for yourself.

But I'm really not sure how you can think that this is completely irrelevant to supporting or refuting IIT: IIT made a prediction, Casali et al. tested the prediction, the prediction seemed to hold up. No qualiometer needed.

It made a prediction about what it does, which is scales of more consciousness to less consciousness. That isn't particularly relevant to understanding how qualia are implemented. It's not clear that an artificial system implemented to have high consciousness according to IIT would have qualia at all. But, while IIT isn't elarly relevant to qualia, qualia aren't clearly relevant to AI control.

But you seem to just want to give up, to put this topic beyond the reach of science, and criticize anyone trying to find clever indirect approaches.

You don't have data about my overall approach.

What I'm doing is noting that, historically, the problem remains unsolved, and that, historically, people who think there is some relatively easy answer have misunderstood the question, or are engaging in circular reasoning about their favourite theory or, are running off a subjective feeling of optimism...

I guess it'll be an empirical question whether IIT morphs into something that can substantially address questions of qualia- based on my understandings and intuitions, I'm pretty optimistic about this.