Sorted by New

Wiki Contributions


Update from almost 3 years in the future: this stream of work has continued developing in a few different directions. Both on the conceptual foundations, and some initial attempts to apply these tools to AI. Two recent works I was especially excited by (and their bibliographies): 'Towards a Grounded Theory of Causation for Embodied AI' (, and here's an excellent talk by the author,, and 'Faithful, Interpretable Model Explanations via Causal Abstraction' (

I'll have to think through this post more carefully later, but, there's some recent work on approximate abstractions between causal models that I expect you'd be extremely interested by (if you aren't already aware)

There are quite a few interesting dynamics in the space of possible values, that become extremely relevant in worlds where 'perfect inner alignment' is impossible/incoherent/unstable.

In those worlds, it's important to develop forms of weak alignment, where successive systems might not be unboundedly corrigible but do still have semi-cooperative interactions (and transitions of power).

Yeah, intertemporal trust and coordination become hugely important. Lots of 'scalable alignment' strategies are relevant, recursively delegating yourself tasks or summarizing your progress so far. An inhuman level of flexibility would also help, instantly grieving your old circumstances then adapting to the new ones.

Can you be confident that your past self knew what they were doing when they dropped you in this situation? Or that your future selves will develop things the way you expect them to? You could choose to deliberately and repeatedly lie to yourself. Picture Susan writing a note, "exciting new job tomorrow!" - each night realizing the truth, but deciding to enjoy her fantasy just one more day. This doesn't have to be completely destructive though. Susan might have either internal or external thresholds, so she'll let herself wallow in escapism for a month (or a century) but no longer than that.

Also, exomemory isn't the only way to break the symmetry of repetition. Susan could have started each morning by flipping to a random sentence in a book, or rolling a few high-emotional-variance activity dice, etc.

Things get especially scary if you're unsure whether your exomemory has been tampered with by somebody else. The domestic abuse here was relatively mild - Jane could have tried a hundred different times to manipulate Susan into a specific outcome. Anyone on a staggered timescale from you can attempt those kind of brute-force attacks.

Alternatively, reliable social supports who have different memory windows than you would make everything so much easier. If Susan and Jane had a better relationship with one another, they could have had a conversation like "you seem to be stuck in a rut, let's talk this through and change something."

However...'reliable' leaves a ton of messy wiggle room. Which versions of yourself are they cooperating with, which of your layered ongoing commitments do they respect? What if you ask them to keep secrets from your future iterations? Your full life is vast and ancient, your locally-available context each day is tiny and carefully curated. Part of what makes a friend different from a private notebook is that they can make independent judgements about which pieces of your past you need to be aware of today.

You can have super strong deference towards your local past selves, while still doing your own (random?) global spot-checks. Verify that page 18 of the daily dossier is factually true. Check that sub-sub-plan 5j still makes sense. Semi-regularly re-evaluate old habits, old preferences, old relationships, so that you aren't just coasting on momentum. Project management on thousand year scales despite constantly resetting.

All of this is much, much easier to do if you prepare ahead of time, which Jane didn't. But I agree with slimepriestess that we already kinda do this stuff in ordinary life. Decade by decade, day by day, or shorter - "redaction frequency" rhymes with "attention span" and "working memory size."

...uhh anyways, cool story thanks lol


Multiscale agency, self-misalignment, and ecological basins of attraction? This sounds really excellent and targets a lot of the conceptual holes I worry about in existing approaches. I look forward to the work that comes out of this!!

I was reminded of a couple different resources you may or may not already be aware of.

For 'vertical' game theory, check out Jules' Hedges work on open/compositional games.

For aggregative alignment, there's an interesting literature on the topology of social choice, treating things like Arrow's voting theorem as a description of holes in the space of preferences. There's something cool going on where partially-overlapping locally-linear rankings can have much stranger global structures. I'm also reminded of this post comment, on the possible virtues of self-misalignment.

I suspect you'd enjoy The Dawn Of Everything, an anarchist-tinged anthropological survey of the different nonlinear paths stateless societies and state formation have taken. Or, well, it discusses a wide range of related topics, with lots of creativity and decent enough rigor. I haven't finished yet.

I do agree that states can be seen as a game-theoretic trap, though. Once you have some centralized social violence or institutional monopoly on power, for a huge range of goals the easiest way to achieve them becomes "get the state/king/local bigwig on your side to impose what you want." Not direct problem-solving or building up consensus. Just fighting over control of the leviathan, powerful but blunt and low-bandwidth. So in that sense, it's pretty useful to have robust norms curbing power imbalances before they reach that tipping point.

The claim that scissor statements are dangerous is itself a scissor statement: I think it's obviously false, and will fight you over it. Social interaction is not that brittle. It is important to notice the key ruptures between people's values/beliefs. Disagreements do matter, in ways that sometimes rightly prevent cooperation.

World population is ~2^33, so 33 independent scissor statements would set you frothing in total war of everyone against everyone. Except people are able to fluidly navigate much, much higher levels of difference and complexity than that. Every topic and subculture has fractal disagreements, each battle fiercely fought, and we're basically fine. Is it productive to automatically collaborate on a project with someone who disagrees with your fundamental premises? How should astronomy and astrology best coexist, especially when one of the two is badly out-numbered?

Vigorous, open-ended epistemic and moral competition is hard. Neutrality and collaboration can be useful, but are always context-sensitive and provisional. They are ongoing negotiations, weighing all the different consequences and strategies. A fighting couple can't skip past all the messy heated asymmetric conflicts with some rigid absolutes about civil discourse.

I expect you already know this, but, the role of activists is not the same as the role of experts, and that's okay. You will never know everything relevant to the situation you're hoping to intervene in. Even if you did, institutions ignore their own environmental experts all the time. Usually, you aren't there as some sort of policy consultant, you're there to pressure their interests into alignment with yours. Even if you have zero clue what other constraints they are balancing, it can still be reasonable to loudly voice your problems; you are yourself one of their constraints. (There's actually an analogy you could make with price signals, where the buyer and seller don't need to know the other's budget calculations, they just need to freely pursue their goals against one another.)

Ideological information is still information, the high-level conceptual narratives and emphases you place on different factors. It felt like the aura of 'seriousness' you talk about with money points to something more general: tradeoffs. Black-and-white thinking in politics is, uh, easy to fall into. But it's a lot more powerful when you can say, "Sure, there are benefits/harms x y and z...and they're completely outweighed by a b and c." You can pay attention to those tradeoffs without losing sight of the Very Important Things. Maybe x needs to be mitigated more carefully. Maybe y is a core obstacle that needs to be dealt with first. 

Obviously, though, having some in-depth subject knowledge doesn't hurt! It helps you make sure you're fighting for the right thing, in the most effective way, and can give you greater legitimacy dealing with other parties. It's a tragic historical fluke that radicals the last few decades have been so, innumerate and technophobic. Get a few of your activist friends and call yourselves a research team, or a reading circle, and then spread whatever knowledge you gain. I think you're on the right track, and good luck.  :)