The Solitaire Principle: Game Theory for One

alkjash

Do I contradict myself?
Very well then I contradict myself;
(I am large, I contain multitudes.)

This post is an exercise in taking Whitman seriously. If the self is properly understood as a loose coalition of many agents with possibly distinct values, beliefs, and incentives, what does game theory have to say about self-improvement?

The Solitaire Principle is the principle that human beings can be usefully thought about as loose coalitions of many agents. Classes of interpersonal problems often translate into classes of intrapersonal problems, and the tools to solve them are broadly similar. The Solitaire Principle is a corollary of the paradigm that the universe is self-similar at every level of organization: the organizational principles and faults of a civilization are not wildly different from those of a single human mind.

Self-improvement is often framed in terms of optimization of a monolithic whole. Instead, the Solitaire Principle suggests that self-improvement can also be achieved by alignment of pieces within the whole to cooperate more efficiently.

First, I fractionate the self across the time dimension and investigate self-improvement as an iterated game for one. This is partially inspired by this essay on becoming more legible to other agents.

Second, I fractionate the self into multiple sub-personalities and investigate self-improvement as a single sub-personality taking unilateral action to improve the whole.

1. Iterated Games for One

i. Basic Thought Experiments

Imagine that a human being dies and is re-instantiated the following day. Across a year, one agent A actually behaves like 365 very weakly dependent agents A1, A2, ..., A365.

A1 wants to write a novel, and can either write a page today (cooperate) or Netflix (defect). The novel is completed if and only if A1, A2, ..., A365 all cooperate. A1 decides the probability of that happening is vanishingly small, so she defects. No pages are written.

B1 wants to write a novel. The novel is completed if at least 300 of B1, B2, ..., B365 all cooperate. B1 simulates the 364 other agents and expects only half of them to cooperate. B1 defects and no pages are written.

C1 wakes up on New Year's Day inspired to write a novel. C1 feels excited about the project and decides it's likely that everyone will cooperate, so she writes Page 1. The other 364 agents don't know about the book. One page is written.

D1 wakes up on New Year's Day inspired to write a novel. He gets Write Novel tattooed on his arm to broadcast his intent to the others. All the agents now know about the book project, but the other guys aren't excited about it. One page is written.

E1 has always wanted to write a novel. Given that E(-364),...,E(-1), E0 didn't already start the novel, E1 reasons that she is not the kind of person who would be able to follow through with a project of this magnitude. E2,...E365 reason similarly. No pages are written.

F1 has always wanted to write a novel. F1 reasons that he is not currently the kind of person who would be able to follow through with a project like this. He reads a self-help book to fix this state of affairs, and broadcasts his intention. F2, ... , F365 also reason that given the previous agents have not started, they are probably not yet ready. 365 self-help books are read, but no pages are written.

G1 has always wanted to write a novel. She designs an hour-long morning meditation to reflect on the importance of mindfulness and writing to her life. She performs this ritual before writing a page. The ritual shifts the kind of person G2, ..., G365 are, so that they are individually 10% more likely to repeat it. One of G2, ..., G11 (in expectation) repeats the ritual and shifts the kind of person G is by another 10%, for a total of 20%. One of the next five (in expectation) repeats the ritual and shifts the probability bay another 10%. Eventually, G30 is the kind of person who will meditate and write no matter what. The meditation no longer serves function, but continues nevertheless. The novel is written, but 365-30=335 hours are wasted on an unnecessary meditation.

ii. Variations

H1 gained a bit of weight over her undergraduate years, and decides to go on a diet to lose 10 pounds in a month. At work, H1 is tempted by the wonderful dessert selection at lunch, and H1 can choose to (a) have a piece of tiramisu (just this once!), or (b) maintain the integrity of the diet.

At the end of a month, Reality swoops in with two transparent boxes, leaving H30 with the choice of either both boxes or just Box 2. In Box 1 is a piece of tiramisu. In Box 2 is a magic bean that instantly induces 10 pounds of weight loss, but Box 2 will be empty iff Reality thinks H30 is the kind of person who would take both boxes. H30 sees an empty Box 2, shrugs, and takes Box 1 like H1, ..., H29 did. H30 is tired of tiramisu, but she isn't losing weight anyway.

I(today) plays an iterated prisonner's dilemma with I(yesterday). In each round, I(today) can choose to sleep on time (cooperate) or Netflix into the wee hours of the morning (defect). I(yesterday) is a known Tit-for-Tat player - in tomorrow's game, I(yesterday) is guaranteed to make the same move I(today) made today. I(today) reasons that the only way to get out of defect-defect against Tit-for-Tat is to cooperate first, so he cooperates.

iii. Planning and Self-Improvement

Long-term projects for one person can be difficult for the same reasons that short-term projects for large teams are difficult:

The individual has imperfect shared knowledge (C) and values (D) across time, and communication between selves suffers from the illusion of transparency.
The individual doesn't trust his future and past selves (A, B, E, H), and has much less influence over them than he thinks.
The individual gets bogged down into meta-level planning meetings and team-building exercises without actually shutting up and doing the work (F).
The individual becomes superstitious about improvement rituals (G).

A few first-pass ideas:

Broadcast transparently to your future self. Send costly signals, decide on Schelling points, etc.
Become legible, for the same reasons we'd like friendly AI to be. Follow hard-and-fast rules. Arrive on time. Stick to plans.
Shut up and do the thing. Do it now. Do it badly. Dwell not on quality.
Your never make decisions about what you do right now. Your decisions are always about the kind of person you are.
You are the Omega now. You might be the one agent in the universe who stands a chance of simulating you to sufficient precision. Building habits and changing who you are can be about setting up the right Newcomb-like problems for yourself.

2. Moloch for One

I derived the Solitaire Principle from the following quote of Solzhenitsyn:

If only it were all so simple! If only there were evil people somewhere insidiously committing evil deeds, and it were necessary only to separate them from the rest of us and destroy them. But the line dividing good and evil cuts through the heart of every human being. And who is willing to destroy a piece of his own heart?

The evil that is Moloch, Moloch who lives in the vacuum between naive libertarians and the gears of capitalism and the manic whispers of causal decision theorists (defect! defect!), that evil lies in your heart too.

i. Subpersonalities

Many schools of psychology have taken seriously the idea that the human consciousness decomposes into separate sub-personalities, although the exact divisions are very different. Kahneman's System 1 and System 2 is a simple dichotomy in this vein. Freud decomposed the self into id, ego, and superego. Jordan Peterson argues that ancient Gods are embodiments of primordial human subpersonalities. The Internal Family Systems (IFS) model takes another tack:

IFS sees consciousness as composed of three types of subpersonalities or parts: managers, exiles, and firefighters. Each individual part has its own perspective, interests, memories, and viewpoint. A core tenet of IFS is that every part has a positive intent for the person, even if its actions or effects are counterproductive or cause dysfunction. This means that there is never any reason to fight with, coerce, or try to eliminate a part; the IFS method promotes internal connection and harmony.
[...]
IFS practitioners report a well-defined therapeutic method for individual therapy based on the following principles. In this description, the term "protector" refers to either a manager or firefighter.
Parts in extreme roles carry "burdens," which are painful emotions or negative beliefs that they have taken on as a result of harmful experiences in the past, often in childhood. These burdens are not intrinsic to the part and therefore they can be released or "unburdened" through IFS. This allows the part to assume its natural healthy role.
The client's Self is the agent of psychological healing. The therapist helps the client to access and remain in Self and provides guidance in the therapy process.
Protectors can't usually let go of their protective roles and transform until the exiles they are protecting have been unburdened.
There is no attempt to work with any exile until the client has obtained permission from any protectors who are protecting that exile. This makes the method relatively safe, even in working with traumatized parts.
The Self is the natural leader of the internal system. However, because of harmful incidents or relationships in the past, protectors have stepped in to protect the system and taken over for the Self. One protector after another is activated and takes over the system causing dysfunctional behavior. These protectors are also frequently in conflict with each other, resulting in internal chaos or stagnation. The goal of IFS is for the protectors to come to trust the Self so they will allow it to lead the system and create internal harmony under its guidance.

I have previously speculated on salient divisions of my own internal processes into subpersonalities, e.g. Babble and Prune, Chinese and English. For now, the exact details of how subpersonalities should be split are not important - my sense is that every such theory is typical-minding straight off a cliff anyway.

Instead, I'll start with a simplified model of subpersonalities ("agents"). Here are the rules.

There are at least two agents.
Among them there is one you identify with most, the Self.
Agents have different values and(/because) they have different beliefs about reality.
The more CPU time an agent gets, the more it grows.

ii. Three Pairs of Nemeses

Babble and Prune are seven year olds who write poetry together. Babble writes the lines, and Prune edits them. One day, Prune gets a Yeats collection for Christmas. He falls in love:

Had I the heavens’ embroidered cloths,
Enwrought with golden and silver light,
The blue and the dim and the dark cloths
Of night and light and the half light,
I would spread the cloths under your feet:
But I, being poor, have only my dreams;
I have spread my dreams under your feet;
Tread softly because you tread on my dreams.

Babble's poetry no longer lives up to Prune's standards, so they stop playing together. She continues to write with the proficiency of a seven year old while he ransacks the poetry section of the library.

Yin and Yang cohabit uncomfortably. Yin sits hunched over as if to minimize and protect herself. Her inner life is filled with jealousy, vindictiveness, and unprovoked images of violence and sadism. Yang stands upright with his shoulders back, ready to meet the world. His inner life is filled with confidence, empathy, and faith in the good.

Yin and Yang each believe that other human beings are mostly like themselves.

When Yin is awake, she perceives jokes as sarcasm and body language as hostile. She is intimately aware of the vulnerabilities of her flesh. Yin is constantly at the ready, calculating how to strike the enemy preemptively.

Yang sees the good in people. He perceives jokes as gentle and body language as inviting. He is willing to extend a charitable hand in good faith, believing other people to be like himself.

Yin and Yang both want CPU time, and are thus beset by perverse incentives. Yin is as nasty as possible to people, provoking their enmity. This enmity Yin uses as evidence that her worldview is true and people are inherently evil and that she thus deserves more CPU time. Yang is friendly and forgiving, earning their trust and respect. This good nature Yang uses as evidence that people are inherently good and that he is the one who deserves more CPU time.

Actor and Scribe have competing worldviews. In the lab, Scribe determines the truth via the scientific method, controlling, double-blinding, the whole shebang. Scribe uses words to denote pieces of reality. Scribe knows about the conjunction fallacy and believes in Occam's razor: that simplicity is proof.

Actor uses words enactively. Actor believes all good things come from willpower and placebomancy. In conversation, Actor takes complexity as proof of honesty, because it's harder to falsify a consistent and persuasive hypothesis with more moving parts. Actor worships mystery and complexity for their own sake, for mysterious and complex things cast long shadows and make good dinner conversation.

Actor and Scribe each tries to surround you with people like himself. Actor wants you to be popular and plays word games to climb the social hierarchy. Scribe looks for communities where truth and simplicity are sacred. Each knows that success in this regard is the key to winning the war.

iii. God's Eye View

In each of these examples, two nemesis subpersonalities that both serve important functions oppose and detract from each other. They respond to perverse incentives to increase their individual power (CPU time) rather than maximizing value produced. As these oppositions between subpersonalities proliferate, we have a chaotic multi-agent race to the bottom - an inner Moloch.

The common refrain in Meditations on Moloch is that Moloch can be defeated from a god's-eye-view:

4. Coordination.
The opposite of a trap is a garden.
Things are easy to solve from a god’s-eye-view, so if everyone comes together into a superorganism, that superorganism can solve problems with ease and finesse. An intense competition between agents has turned into a garden, with a single gardener dictating where everything should go and removing elements that do not conform to the pattern.

Jungian psychoanalysis and IFS agree that the path to maturity is the integration of agents into a whole under the leadership of a driving Self, the Optimization Czar, the gentle Gardener. What does integration mean, and how is it accomplished?

The Self must become strong enough to lead all other agents. This cannot be achieved through tyranny. Rather, it must be recognized that all agents have an internal logic and rationality given their beliefs and values, and serve a purpose to the collective. By fostering healthy discourse norms, the Self can allow antagonistic agents to exchange information and understand that they share terminal values. Build your mind into a walled garden.

Babble and Prune are both necessary ingredients to a productive poet. Without Babble, Prune is just a miserable critic. Without Prune, Babble will never grow past the ability of a seven-year-old.

Yin, the Jungian shadow, is necessary to protect against genuine malice in the world. To integrate the shadow requires coming to terms with the fact that a human being is a horrifyingly dangerous animal. To nevertheless stand up straight with your shoulders back and meet people in good faith - knowing something of their nature - requires a correspondingly strong Yang.

Actor and Scribe are both correct about how to speak. I hardly need to prove the value of speaking the truth, but it's also impossible to just say what you mean.

The grand conceit of our civilization is that each individual human being has intrinsic value, be he ne'er so vile. Taking this nearly absurd principle seriously has been unbelievably productive. To achieve a harmony of all the contradictory multitudes within the individual soul requires applying that same idealistic conceit to each subpersonalities in turn.

[-]SquirrelInHell7y40

Classes of interpersonal problems often translate into classes of intrapersonal problems, and the tools to solve them are broadly similar.

This is true, but it seems you don't have any ideas about why it's true. I offer the following theory: if you are designing brains to deal with social situations, it is very adaptive to design them in a way that internally mirrors some of the structure that arises in social environments. This makes the computations performed by the brain more directly applicable to social life, in several interesting ways (e.g. increased ability to take/simulate various points of view, simulate and exploit adversarial situations, operate under mismatched/fake sets of pretenses etc.).

[-]alkjash7y20

That's an appealing hypothesis! It does seem like part of the picture, but I would offer the alternative hypothesis that even absent social environments such a system might arise. It's natural to design and compartmentalize subprocesses for specific tasks, and to give them isolated virtual address spaces. Eventually, because each subprocess is engaging with a different region of thingspace it collects different information (e.g. about human nature) and that produces different beliefs and values when it inhabits you, so to speak. I will definitely give this question more thought.

[-]drethelin7y40

This is a neat idea but too long and repetitive. You use a ton of examples when the message gets across after 2 or 3.

[-]habryka7y40

I’ve been using most of alkjash’s posts more as guided meditations on a topic, that help me focus my mind on something interesting for a few minutes, and for that I really appreciate longer length and more anecdotes. Though I did also find the flow in this one a bit worse.

[-]alkjash7y30

I definitely worried about that too. The reason I put in so many examples is because I am not sure which ones are most central. It's not clear to me how much more there is here than thought experiments, and which ones are closest to reality. One of the pieces I think is most important that I haven't seen in conventional game theory discussions is the idea that a human being playing an iterated game might want to spend a good chunk of time self-modifying.

[-]JenniferRM7y30

K1 wants to write a novel because she calculated a novel to be the best thing to be working on given many environmental factors as input to a reflectively stable and emotionally integrated theory of axiology.

The novel is completed if at least 300 future Ks agree.

However, K1 mostly ignores "other people" in favor of thinking of herself as something like a local/momentary snapshot of a turing machine's read/write head in operation....

She has obvious inputs and an obvious place for outputs, plus some memory and awareness of the larger program, and an ability and interest in fixing the program she is executing when definite errors are detected... and just trusting the system otherwise.

K1 writes 1/300th of a novel.

Since K1's value estimates were very reasonable, the estimates are replicated by many future K's and 753 days later a novel is finished.

It took more than 300 days, but during the 753 days many other similarly valuable things were also done that were plausibly valuable things to have done. The whole time, K has been more or less safely interuptible, and it would have been pretty weird if K had ignored surprising issues that were more important than the novel when those things actually came up.

If the novel was somehow never finished that would have been OK. It probably would mean it was an omnicient-persperctive-error to have worked on it, but that's OK because humans aren't omniscient.

Lesson: stop worrying about other people (who are often mostly crazy anyway) and instead pay attention to efficiently and reliably knowing what is actually good.

[-]Vanessa Kosoy7y10

The grand conceit of our civilization is that each individual human being has intrinsic value, be he ne'er so vile. Taking this nearly absurd principle seriously has been unbelievably productive.

I'm confused since I don't understand on what "level of irony" are you here. Are you literally saying that some humans have no intrinsic value? Or that the state pretends to value all people while it really doesn't? Or something else entirely?

[-]alkjash7y10

I probably misused words. I'm saying that it is an important idea that seemed initially absurd to me and which has taken me a great deal of effort to take seriously, but moving towards believing everyone has value is (at least locally) a good and useful thing. As far as the state goes I think it does a fairly honest job of valuing all people, as it should.

LESSWRONG
LW

25