Epistemic Status: Simultaneously this is work that took me a long time and a lot of thought, and also a playful and highly speculative investigation. Consider taking this seriously but not literally.
Take a simple agent (GitHub; Python), with no capacity for learning, that exists on a 2D plane. It shares the plane with other agents and objects, to be described shortly.
The agent intrinsically doesn't want anything. But it can be assigned goal-like objects, which one might view as subagents. Each individual goal-like subagent can possess a simple preference, such as a desire to reach a certain region of space, or a desire to avoid a certain point.
The goal-like subagents can also vary in the degree to which they remain satisfied. Some might be permanently satisfied after achieving their goal once; some might quickly become unsatisfied again after a few timesteps.
Every timestep, the agent considers ten random movements of unit-distance, and executes the movement corresponding to the highest expected valence being reported by its goal-like subagents, in a winner-take-all fashion.
Even with such an intentionally simplistic model, a surprising and illuminating level of behavioral complexity can arise.
Sections 1-8 concern interesting or amusing behaviors exhibited by the model.
Sections 8-12 outline future directions for the model and ruminations on human behavior.
In this image, the path of the agent is painted with points, the color of the points changing slowly with the passage of time. This agent possesses three subagents with preferences for reaching the three green circles, and a fourth mild preference for avoiding the red circle.
Once it comes within a set distance of one of the green circles, the corresponding subagent is satisfied, and thus the movement with the highest expected valence switches to the next-highest valence goal. The satisfaction gradually wears off, and the agent begins to be drawn to the goal again. Thus, the agent moves inexorably around the triangle of green circles, sometimes in a circuit, sometimes backtracking.
2. "Ugh field"
If the aversion to the red circle is amplified above a certain threshold, this behavior results. The subagent with preferences for reaching the top green circle still exists, but it will never be satisfied, because expected negative valence of passing near the red circle is too high.
But if one is clever, one can find a way around aversions, by inventing intermediary goals or circumventing the aversion with intermediate desirable states.
Sometimes a you want to accomplish something, but a seemingly trivial inconvenience will arise to screen off your motivation. If you can't remove the inconvenience, you can usually find a path around it.
What if the agent has, pinned to its position (such that it is constantly somewhat nearby), a low-valence rewarding object, which doesn't provide lasting satisfaction? (In other words - the agent has a goal-like subagent which mildly but relatively persistently wants to approach the pinned object.)
The agent suddenly looks very distracted, doesn't it? It doesn't make the same regular productive circuits of its goals. It seems to frequently get stuck, sphexishly returning to a goal that it just accomplished, and to take odd pointless energy-wasting zigzags in its path.
Maybe it's bad to constantly carry around attention-grabbing objects that provide us with miniscule, unsatisfying hits of positive valence.
Considered together, Parts 2 and 3 speak to the dangers of convenience and the power of trivial inconvenience. The agents (and humans) are extraordinarily sensitive not only to the absolute valence of an expectation, but to the proximity of that state. Even objectively weak subagents can motivate behavior if they are unceasingly present.
4. Agitation and/or Energy
The model does not actually have any concept of energy, but it is straightforward to encode a preference for moving around a lot. When the agent is so inclined, its behavior becomes chaotic.
Even a relatively moderate preference for increased movement will lead to some erratic swerves in behavior.
If one wished, one could map this type of behavior onto agitation, or ADHD, or anxiety, or being overly caffeinated. On the other hand, you could view some degree of "restlessness" as a drive toward exploration, without which one might never discover new goals.
One path of investigation that occurred to me but which I did not explore was to give the agent a level of movement-preference that waxed and waned cyclically over time. Sometimes you subjectively have a lot of willpower, sometimes you subjectively can't focus on anything. But, on the whole, we all manage to get stuff done.
I attempted to implement an ability for the agent to scan ahead more than one step into the future and take the movement corresponding the highest expected valence in two timesteps, rather than just the next timestep. This didn't really show anything interesting, and remains in the category of things that I will continue to look into. (The Red agent is thinking two moves ahead, the Blue agent only one move ahead. Is there a difference? Is the clustering of the Red agent's pathing slightly tighter? Difficult to say.)
I don't personally think humans explicitly look ahead very often. We give ourselves credit as the "thinking, planning animal", but we generally just make whichever choice corresponds to the highest expected valence in the current moment. Looking ahead is also very computationally expensive - both for people, and for these agents - because it inevitably requires something like a model-based tree search. What I think we actually do is better addressed in Section 10 regarding Goal Hierarchies.
Of course, we can give the agents preferences for being near other agents, obeying the same rules as the preferences for any other position in space.
With hyper-dominant, non-extinguishing preferences for being around other agents, we get this piece of computer generated art that I call "Lovers".
With more modest preference for the company of other agents, and with partially-overlapping goals (Blue agent wants to spend time around the top and rightmost target, Red agent wants to spend time around the top and leftmost target) you get this other piece of art that I call "Healthy Friendship". It looks like they're having fun, doesn't it?
7. New Goals Are Disruptive
Brief reflection should confirm that introducing a new goal into your life can be very disruptive to your existing goals. You could say that permitting a new goal-like subagent to take root in your mind is akin to introducing a competitor who will now be bidding against all your existing goals for the scarce resource of your time and attention.
Compare this image with the Baseline at the top of this article. The new, powerful top-right goal has siphoned away all the attention from the formerly stable, well-tended trio of goals.
I think one of the main reasons we fall down on our goals is simply that we spontaneously generate new goals, and these new goals disrupt our existing motivational patterns.
You may have more questions about the winner-take-all assumption that I mentioned above. In this simple model, the goal-like subagents do not "team up". If two subagents would prefer that the agent move to the left, this does not mean that their associated valence will sum up and make that choice more globally appealing. The reason is simple: if you straightforwardly sum up over all valences instead of picking a winner, this is what happens:
The agent simply seeks out a local minimum and stays there.
I am currently somewhat agnostic as to what the human or animal brain is actually doing. We do appear to get stuck in local minima sometimes. But you can get sphexish behavior that looks like a local minimum out of a particular arrangement of winner-take-all subagents. For example, if an agent is hemmed in by aversive stimuli with no sufficiently positive goal-states nearby, that might look like a local minimum, though it is still reacting to each aversive stimulus in a winner-take-all fashion.
Subjectively, though, it feels like if you have two good reasons supporting an action, that makes the action feel a bit easier to do, a bit more motivating, than if you just had one good reason. This hints that maybe goal-like subagents can gang up together. But I also doubt that this is anything like strictly additive. Thinking of 2,000 reasons why I should go to the gym isn't 2,000 times more compelling than thinking of one reason.
9. Belief, Bias, and Learning
The main area of the model that I would like to improve, but which would amplify the complexity of the code tremendously, would be in introducing the concept of bias and/or belief. The agent should be able to be wrong about its expected valence. I think this is hugely important, actually, and explains a lot about human behavior.
Pathologies arise when we are systematically wrong about how good, or how bad, some future state will be. But we can overcome pathologies by exposing ourselves to those states, and becoming deeply calibrated regarding their reality. On the aversion side this applies to everything from the treatment of phobias and PTSD, to the proper response to a reasonable-seeming anxiety. On the positive-valence side, we may imagine that it would be incredibly cool and awesome to do or to be some particular thing, and only experience can show us that accomplishing such things yields only a shadow of what we expected. Then your brain updates on that, and you cease to feel motivated to do that thing anymore. You can no longer sustain the delusion that it was going to be awesome.
10. Goal Hierarchies
It seems clear that, in humans, goals are arranged in something like trees: I finish this current push-up because I want to finish my workout. I want to finish my workout because I want to stay on my workout program. I want to stay on my workout program because I want to be strong and healthy.
But it's almost certainly more complex than this, and I don't know how the brain manages its "expected valence" calculations across levels of the tree.
I hypothesize that it goes something like this. Goal-like subagents concerned with far-future outcomes, like "being strong and healthy", generate (or perhaps manifest as) more specific near-term goal-like targets, with accompanying concrete sensory-expectation targets, like "working out today". This seems like one of those mostly automatic things that happens whether or not we engineer it. The automaticity of it seems to rely on our maps/models/beliefs about how the world works. Even much simpler animals can chain together and break down goals, in the course of moving across terrain toward prey, for example.
The model described above doesn't really have a world model and can't learn. I could artificially designate some goals as being sub-goals of other goals, but I don't think this is how it actually works, and I don't actually think it would yield any more interesting behavior. But it might be worth looking into. Perhaps the most compelling aspect of this area is that what would be needed would not be to amplify the cleverness of the agent; it would be to amplify the cleverness of the subagent in manipulating and making its preferences clearer to the agent. For example: give subagents the power to generate new goal-objects, and lend part of their own valence to those subagents.
I toyed with the idea of summing up all the valences of the goal-objects that were being ignored at any given moment, and calling that "suffering". This sure is what suffering feels like, and it's akin to what those of a spiritual bent would call suffering. Basically, suffering is wanting contradictory, mutually exclusive things, or, being aware of wanting things to be a certain way while simultaneously being aware of your inability to work toward making it that way. One subagent wants to move left, one subagent wants to move right, but the agent has to pick one. Suffering is something like the expected valence of the subagent that is left frustrated.
I had a notion here that I could stochastically introduce a new goal that would minimize total suffering over an agent's life-history. I tried this, and the most stable solution turned out to be thus: introduce an overwhelmingly aversive goal that causes the agent to run far away from all of its other goals screaming. Fleeing in perpetual terror, it will be too far away from its attractor-goals to feel much expected valence towards them, and thus won't feel too much regret about running away from them. And it is in a sense satisfied that it is always getting further and further away from the object of its dread.
File this under "degenerate solutions that an unfriendly AI would probably come up with to improve your life."
I think a more well-thought-out definition of suffering might yield much more interesting solutions to the suffering-minimization problem. This is another part of the model I would like to improve.
12. Happiness and Utility
Consider our simple agents. What makes them happy?
You could say that something like satisfaction arises the moment they trigger a goal-state. But that goal object immediately begins recharging, becoming "dissatisfied" again. The agent is never actually content, unless you set up the inputs such that the goal valences don't regenerate - or if you don't give it goals in the first place. But if you did that, the agent would just wander around randomly after accomplishing its goals. That doesn't seem like happiness.
Obviously this code doesn't experience happiness, but when I look at the behavior of the agents under different assumptions, the agents seem happy when they are engaged in accomplishing their various goals. They seem unhappy when I create situations that impede the efficiency of their work. This is obviously pure projection, and says more about me, the human, than it says about the agent.
So maybe a more interesting question: What are the high-utility states for the agent? At any given moment of time the agents certainly have preference orderings, but those preference orderings shift quite dramatically based on its location and the exact states of each of its subagents, specifically their current level of satisfaction. In other words, in order to mathematically model the preference ordering of the agent across all times, you must model the individual subagents.
If humans "actually" "have" "subagents" - whatever those words actually end up meaning - then the "human utility function" will need to encompass each and every subagent. Even, I think, the very stupid ones that you don't reflectively endorse.
I set out on this little project because I wanted to prove some assumptions about the "subagent" model of human consciousness. I don't think I can ultimately say that I "proved" anything, and I'm not sure that one could ever "prove" anything about human psychology using this particular methodology.
The line of thinking that prompted this exploration owes a lot to Kaj_Sotala, for his ongoing Sequence, Scott Alexander's reflections on motivation, and Mark Lippman's Folding material. It's also their fault I used the unwieldy language "goal-like subagent" instead of just saying "the agent has several goals". I think it's much more accurate, and useful, to think of the mind as being composed of subagents, than to say it "has goals". Do you "have" goals if the goals control you?
This exercise has changed my inner model of my own motivational system. If you think long enough in terms of subagents, something eventually clicks. Your inner life, and your behaviors, seems to make a lot more sense. Sometimes you can even leverage this perspective to construct better goals, or to understand where some goals are actually coming from.
The code linked at the top of this page will generate all of the figures in this article. It is not especially well documented, and bears the marks of having been programmed by a feral programmer raised in the wilds of various academic and industrial institutions. Be the at as it may, the interface is not overly complex. Please let me know if anyone ends up playing with the code and getting anything interesting out of it.