Creator of the Complice productivity system, which hosts the Less Wrong Study Hall:

Working full-time on solving human coordination at the mindset level.

Wiki Contributions


Right, yeah. And that (eventually) requires input of food into the person, but in principle they could be in a physically closed system that already has food & air in it... although that's sort of beside the point. And isn't that different from someone meditating for a few hours between meals. The energy is already in the system for now, and it can use that to untangle adaptive entropy.

Huh, reading this I noticed that counterintuitively, alignment requires letting go of the outcome. Like, what defines a non-aligned AI (not an enemy-aligned one but one that doesn't align to any human value) is its tendency to keep forcing the thing it's forcing rather than returning to some deeper sense of what matters.

Humans do the same thing when they pursue a goal while having lost touch with what matters, and depending on how it shows up we call it "goodharting" or "lost purposes". The mere fact that we can identify the existence of goodharting and so on indicates that we have some ability to tell what's important to us, that's separate from whatever we're "optimizing" for. It seems to me like this is the "listening" you're talking about.

And so unalignment can refer both to a person who isn't listening to all parts of themselves, and to eg corporations that aren't listening to people who are trying to raise concerns about the ethics of the company's behavior.

The question of where an AI would get its true source of "what matters" from seems like a bit of a puzzle. One answer would be to have it "listen to the humans" but that seems to miss the part where the AI needs to itself be able to tell the difference between actually listening to the humans and goodharting on "listen to the humans".

Maybe instead of "shut up and do the impossible" we need "listen, and do the impossible" 😆

Sort of flips where the agency needs to point.

This "it gets worse if you try to deal with it" isn't necessarily true in every case. In this way adaptive entropy is actually unlike thermodynamic entropy: it's possible to reduce adaptive entropy within a closed system.

Actually naming whether this bolded part is true would require defining what "closed" means in the context of an adaptive system—it's clearly different than a closed system in the physical sense, since all adaptive systems have to be open in order to live.

This is great and I'm looking forward to your book.

Some adjacent ideas:

I feel like I've been appreciating the nature of wisdom (as you describe it here) increasingly much over the past couple of years. One thing this has led me to is looking at tautologies, where the sentence in some sense makes no claim but directs your attention to something that's self-evident once you look. For example, "the people you spend time with will end up being the people you've spent time with".

In 2017, I wrote an article about transcending regret, and a few years later I shared it with a friend and said:

at the time I wrote this, I hadn't gotten the insight as deep into my bones as I now have, & I still have much further to go

but the insight is still legit
& the articulation is good
& your integration will be yours anyway no matter how well I had it integrated when I wrote it

This feels like sort a dual of the sazen, and also maybe relates to the comment Kaj made about experiences that are hard to point at verbally even once you have experienced them.

Huh—it suddenly struck me that Peter Singer is doing the exact same thing in the drowning child thought experiment, by the way, as Tyler Alterman points out beautifully in Effective altruism in the garden of ends. He takes for granted that the frame of "moral obligation" is relevant to why someone might save the child, then uses our intuitions towards saving the child to suggest that we agree with him about this obligation being present and relevant, then he uses logic to argue that this obligation applies elsewhere too. All of that is totally explicit and rational within that frame, but he chose the frame.

In both cases, everyone agrees about what actually happens (a child dies, or doesn't; you contribute, or you don't).

In both cases, everyone agrees because within the frame that has been presented there is no difference! Meanwhile there is a difference in many other useful frames! And this choice of frame is NOT, as far as I can recall, explicit. Rather than recall, let me actually just go check... watching this video, he doesn't use the phrase "moral obligation", but asks "[if I walked past,] would I have done something wrong?". This interactive version offers a forced choice "do you have a moral obligation to rescue the child?"

In both cases, the question assumes the frame, and is not explicit about the arbitrariness of doing so. So yes, he is explicit about setting the zero point, but focusing on that part of the move obscures the larger inexplicit move he's making beforehand.

Ah this comment from facebook also feels relevant:

Sam Harris’ argument style reminds me very much of the man that trained me, and the example of fire smoke negatively affecting health is a great zero point to contest. Sam has slipped in a zero point of physical health being the only form of health which matters. Or at least the highest. One would have to argue against his zero point, that there are other values which can be measured in terms of health greater than mere physical health associated with fire. Psychological, familial, and social immediately come to mind. Further, in the case of Sam, famed for his epistemological intransigence, one would likely have to argue against his zero point of what constitutes rationality itself in order to further one’s position that physical health is very often a secondary value, as this sort of argument follows more a conversational arrangement of complex interdependent factors, than the straight rigorous logic Sam seemingly prefers

A lot of what's going on here is primarily frame control—setting the relevant scale on which a particular zero is then made salient. And that is not being done in the nice explicit friendly way.

He's not casting the sneaky dark-arts version of the spell

Sam Harris here is not casting a sneaky version of Tare Detrimens, but he's maybe (intentionally or not, benevolently or malevolently) casting a sneaky version of Fenestra Imperium.

Broadly an overall point that makes sense and feels good to me.

Something feels off or at least oversimplified to me in some of the cases, particularly these two lines of thinking:

There's no substantive disagreement between me and critics-of-my-blocking-policy about the difficulties that this imposes—the way it makes certain conversations tricky or impossible, the way it creates little gaps and blind spots in discussion and consensus.


As far as I could tell, both I and the admin team agreed about its absolute size; there were no disagreements about things like e.g. "broken links to previously written essays are a pain and a shame."

I found myself not actually trusting that there was "no disagreement about" about the nature or size in these cases. Maybe I would if I had more data about each situation, but something about how it's being written about raises suspicion for me. It's not per se than I think there was disagreement, but that I think the apparent agreement was on the level of the objective details (broken links etc) but that you didn't know how each other felt about it or what it meant to each other, and that if you'd more thoroughly seen the world through each others' eyes, it wouldn't seem like "zero point" is the relevant frame here.

One attempt to point at that:

It seems to me that without straightforward scales on which to measure things, or even getting clear on exactly what the units are, "setting the zero point" isn't even a real move that's available (intentionally or not) and I would expect people discussing in good faith to nonetheless end up with differences as a result of those.

Taking the latter case in particular, it seem likely to me (at least based on what you've written) that the LW admins were mostly tracking something like a sense of betrayal of expectations that people would have about LW as an ever-growing web of wisdom, and that feeling of betrayal is their units. And you're measuring something more on the level of "how much wisdom is on LW?" And from those two scales, two different natural zero points emerge:

  1. in removing the posts, LW goes from zero betrayal of the expectation of posts by default sticking around to more than zero betrayal of the expectation of posts sticking around
  2. in removing the posts, LW goes from more-than-zero wisdom on it from everybody's posts to less-than-before-but-still-more-than-zero wisdom on it with everybody's posts minus the Conor Moreton series (and in the meantime there was some more-than-zero temporary wisdom from those posts having gone up at all)

I noticed I ended up flipping the scales here, such that both are just zero-to-more-than-zero, even though one is more-than-zero of an unwanted thing. Not sure if that's incidental or relevant. Sometimes I've found in orienting to situations like this, one finds that there's only ever presence of something, never absence.

I'm not totally satisfied with this articulation but maybe it's a starting point for us to pick out whatever structure I'm noticing here.

No particular tips about Hamming Circles proper—I've run them a couple times but don't feel like I grokked how to run them well—but I'll put out that I've had some success with running longer events oriented towards Hamming Problems, shaped more like "here's 5 hours, broken into 25min pomodoros where you focus on making actual tangible progress towards something that's stuck, then during 5min breaks check in with a partner about how your focus is going".

Which on reflection is actually very similar to how the online goal-crafting intensives I've been running for years are structured, except with coaches instead of a buddy.

They feel like different-but-related thing to me. I would say that colorblindness can be simply that you haven't learned to differentiate some aspects of reality. A blindspot is not just something you can't see but a way in which you're actively hiding from yourself the fact that you can't see it. That's how I use the term "blindspot", which is perhaps downstream of Val / Michael Smith's "Metacognitive Blindspots" presentations (at eg the 2014 alumni reunion iirc, I forget where/when else). "Colorblindness" doesn't cut it for that meaning, so it's not the metaphor I want when I reach for "blindspot".

Having said that, it's a cool metaphor for this different thing, and I can see the temptation to stretch "blindspot" to cover it. Both look kinda similar from the outside if you try to give someone feedback about the thing they're not seeing. I'd say that if someone just has a colorblindness and no blindspot, they would tend to respond more curiously and you'd be able to make some headway starting to point out the dimension and how things vary along it. If someone has a blindspot, trying to talk with them about the thing they're not seeing will feel weird. You'll keep saying things and the conversation will sort of circle around the thing you're trying to point at, or it may feel hard to even put the thing into words while trying to translate it into the other person's frame, or hard to even stay in touch with the thing yourself while talking with the other person.

Load More