Does this algorithm, when run, instantiate a subjective experience with the same moral relevance as the subjective experience that happens when mu opioids are released in biological brains?

Python:  

''' 

L1=[1,1,0.5,1,1,1] #prior beliefs, alternate expectations and confidences 
last=1 
last2=0 
while True:  
   L2=[L1[0],L1[2],L1[4]] #wanting? 
    
   action=1 #block chooses decision 
   if L2[last]>L2[1+last] or L2[0]>L2[last+1]:  
       action=0  
   if L2[2]>L2[0]:  
       action=1  
   print(action) 
    
   reward=int(input()) 
   last2=last 
   C=1+2*(action+last2) 
   last=action   
   L1[C]=L1[C]+1#updating confidence 
   A=reward-L1[C-1]#pleasure/suffering(liking?)  
   L1[C-1]=L1[C-1]+A/L1[C]#updating expectation with A, which is so obvious to algorithm 

'’’  

Why I think this algorithm experiences happiness and/or suffering when run:  

On line 20, the variable A is obvious to the algorithm, just like pleasure and suffering, because it is a variable it references. Just like with liking, it updates the algorithm’s expectations. If the algorithm runs for a long time, variable A will have negligible influence on wanting. This is like experiments in rats where researchers activated the liking circuits in rat brains without making them want anything.[1] Line 6 has expected rewards, analogous to wanting. Forcing the 1st item on the list to be 1000 each turn makes the algorithm incessantly output zero, even when it gets negative reward. This is like activating the wanting circuit in rats without activating their liking circuit. In rats, wanting and liking are separate. In this algorithm, the conjenctured wanting and liking are separate. 

The algorithm also has a non-contrived hedonic treadmill. As A stays positive over multiple rounds, A becomes lower with the same reward. If A becomes negative, it becomes less negative with the same reward. Just getting rid of the updates which cause the hedonic treadmill makes the 'liking variable' knowably not correspond to liking. If this is true for humans, getting rid of the hedonic treadmill while keeping pleasure will be extremely difficult. Reinforcement learning is a major motif in the evolution of animals. Wanting and liking are part of reinforcement learning in animals and in this algorithm.  

The algorithm steers actions to get high rewards. Here are sample interactions with the algorithm and some simple environments the algorithm models well. The actions are indented.  

Reward for 1:  

      1  

1  

      1  

1  

… the last 2 lines repeat forever  

Reward for 0:  

1  

      0  

0  

      1  

… the last 2 lines repeat forever.  

Reward for switching:  

1  

      0  

0  

      1  

0  

      0  

1  

      1  

0  

      1  

… the last 4 lines repeat forever  

These runs show that algorithm tries to avoid suffering and get reward.  

Before commenting, please make sure you understand How An Algorithm Feels From Inside. If you see any fake explanations in my explanation, please tell me in the comments.

Open problem: what is the True Name of suffering? 

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Why are you calling various parts of the program "wanting", "liking", "reward", "confidence", etc.? For those attributes to be really there, they would have to be discoverable from the code alone, even if all the comments were omitted and all the variable names replaced by arbitrary ones like G36287.

As this algorithm executes, the last and 2last variables become the program's last 2 outputs. L1's even indexes become the average input(reward?) given the number of ones the program outputted the last 2 times. I called L1's odd indexes 'confidence' because, as they get higher, the corresponding average reward changes less based on evidence. When L1 becomes entangled with the input generation process, the algorithm chooses which outputs make the inputs higher on average. That is why I called the input 'reward'. L2 reads off the average reward given the las... (read more)

2Richard_Kennaway8mo
In effect, you're saying that all reinforcement learners experience pleasure and suffering. But how do these algorithms "feel from the inside"? What does it mean for the variable A to be "obvious to the algorithm"? We know how we feel, but how do you determine whether there is anything it is like to be that program? Are railway lines screaming in pain when the wheel flanges rub against them? Does ChatGPT feel sorrow when it apologises on being told that its output was bad? I see no reason to attribute emotional states to any of these things.
1elbow9218mo
‘By 'obvious to the algorithm' I mean that, to the algorithm, A is referenced with no intermediate computation. This is how pleasure and pain feel to me.  I do not believe all reinforcement learning algorithms feel pleasure/pain. A simple example that does not suffer is the Simpleton iterated prisoner’s dilemma strategy. I believe pain and pleasure are effective ways to implement reinforcement learning. In animals, reinforcement learning is called operant conditioning. See Reinforcement learning on a chicken  for a chicken that has experienced it. I do not know any algorithms to determine whether there is anything to be like a given program. I suspected this program experienced pleasure/pain because of its paralells to the neuroscience of pleasure and pain.
1Signer8mo
The description of our feelings is not fundamentally different from the description of any reinforcement learner. They both describe the same thing - physical reality - just with different language and precision. The reason is that they are abstractly analogous to emotional states in humans, like emotional state in one human may be abstractly analogous to emotional state in other human.
2Richard_Kennaway8mo
I cannot see "abstractly analogous" as sufficient grounds. Get abstract enough and everything is "abstractly analogous" to everything.

Can you rephrase the question while tabooing 'pleasure' and 'suffering'? What are you, really, asking here?

This is what I am wondering: Does this algorithm, when run, instantiate a subjective experience with the same moral relevance as the subjective experience that happens when mu opioids are released in biological brains?