I haven't read this post super deeply yet, but obviously this is one of those excellent posts that's going to become a Schelling point for various semi-related gripes after a mere token skim, even though most of them have been anticipated already in the post!

Some of those gripes are:
- Near enemies: Once a term for a phenomenon is entrenched in a community, it's a lot a lot a lot of work to name anything that's close to it but not quite it. (See, for example, "goodhart" for what is IMO a very diverse and subtle cluster of clumsiness in holding onto intentionality.) Escaping near enemies is difficult in any community, but especially difficult in this community because there tend to be mathematical/formalist intuitions of "well I have Abstraction that captures The Other Thing as a special case", which is often true. But it is also often "false" in the sense of...
- Sneaking in Connotations: I'm sad ADBOC hasn't become as CK as it could, especially for checking the unconscious tools that receive salience. Usually, if you don't believe in "vibes" and the relevance of connotations, you tend to only do some denotational pattern-matching to check whether a concept applies. But then it sneaks in burdensome frames and stances to bring to bear anyway. Sure, basketball ultimately, technically, happens in physics. I'm not sure that looking for the relevant differential equations (which is what you often do in physics) is the best frame to bring to improve your shot! Further, connotations also very often carry information that just don't make it into your cleaned up abstraction...
- Stickiness: ...and then it sticks. This is a feature, of course, as you name in point five: reification. You've made plenty of remarks about how the kata is useful even though it is not The Thing. That's great. But the fact of stickiness also implies that the underlying frame needs to be the relevant one for contexts where it is cued in people's minds, because it's now in some kind of vaguely zero-sum game with other frames. Now there's some transaction costs in switching to the other frame, taxing other perspectives. This is not even beginning to touch though, the real taxing involved in...
- Reification bias: There's a spectrum of errors from taking "you only understand it if you can program it" too literally, forcefully narrowing all the kinds of clarity available to human brains and bodies. It's a short step from there to "it's only real if you can formalize it" and "it's only real if it can be made explicit" and "it's only real if it can be reified/sustained in some way", which last thing a lot of non-dual practices attempt to disabuse humans of. It's not like this is extremely difficult to recognize (although it can be for certain dispositions), but it is especially immune to being noticed if you believe in...
- Solving everything with more explicitness: I love how a lot of double binds are dissolved in this community by kind and skillful naming of things. This is greatly underpumped nearly everywhere else. But I think it's silly to automatically assume it is a safe defense for any issue. It isn't impossible that certain situations might benefit from limiting explicitness rather than unboundedly including ever more meta accounting.

In sum: Detailed models that frames and handles come with are great, but burdensome. It's probably an oversight to not account for transaction costs in switching to (or even instantiating to, if you think your handle is general) other frames when you camp around one frame though, roughly multiplied with how burdensome the current frame might be.

Even if SEP was right about getting around the infinity problem and CK was easy to obtain before, it certainly isn't now! (Because there is some chance that whoever you're talking to has read this post, and whoever reads this post will have some doubt about whether the other believes that...)

Love this post overall! It's hard to overstate the importance of (what is believed to be) common knowledge. Legitimacy is, as Vitalik notes[1], the most important scarce resource (not just in crypto) and is likely closer to whatever we usually intend to name when we say "common knowledge", which this post argues (successfully IMO) is not actually common knowledge. 

It does seem like legitimacy is possible to model with p-CK, but I'm not convinced.[2] Nor do I know how substitutable p-CK is with my old notion of CK, what it's good for! Which theorems can be rescued with p-CK where they depended on CK? Does Aumann's still hold with probability p upon p-CK of priors, or does the entire reasoning collapse? How careful do I have to be?

  1. ^

    He seems to only talk about finitely many layers of higher order knowledge, as does Duncan (for explicitly pedagogical reasons) in his post on CK and Miasma. I think this can be right, but if so, for complicated reasons. And it still leaves some seemingly self-undermining "rationality" in our frameworks.

  2. ^

    Mainly, I expect self-reference in legitimacy to be troublesome. A lot of things are legitimate because people think other people think they are legitimate, which seems enough like Lob's theorem that I worry about the lobstacle.

I've been a longtime CK atheist (and have been an influence on Abram's post), and your comment is in the shape of my current preferred approach. Unfortunately, rational ignorance seems to require CK that agents will engage in bounded thinking, and not be too rational! 

(CK-regress like the above is very common and often non-obvious. It seems plausible that we must accept this regress and in fact humans need to be Created Already in Coordination, in analogy with Created Already in Motion)

I think it is at least possible to attain p-CK in the case that there are enough people who aren't "inductively inclined". This sort of friction from people who aren't thinking too hard causes unbounded neuroticism to stop and allow coordination. I'm not yet sure if such friction is necessary for any agent or merely typical.

What are the standard doomy "lol no" responses to "Any AGI will have a smart enough decision theory to not destroy the intelligence that created it (ie. us), because we're only willing to build AGI that won't kill us"?

(I suppose it isn't necessary to give a strong reason why acausality will show up in AGI decision theory, but one good one is that it has to be smart enough to cooperate with itself.)

Some responses that I can think of (but I can also counter, with varying success):

A. Humanity is racing to build an AGI anyway, this "decision" is not really enough in our control to exert substantial acausal influence

B. It might not destroy us, but it will likely permanently lock away our astronomical endowment and this is basically just as bad and therefore the argument is mostly irrelevant

C. We don't particularly care to preserve what our genes may "wish" to preserve, not even the acausal-pilled among us

D. Runaway, reckless consequntialism is likely to emerge long before a sophisticated decision theory that incorporates human values/agency, and so if there is such a trade to be had, it will likely be already too late

E. There is nothing natural about the categories carved up for this "trade" and so we wouldn't expect it to take place. If we can't even tell it what a diamond is, we certainly wouldn't share enough context for this particular acausal trade to snap into place

F. The correct decision theory will actually turn out to only one-box in Newcomb's and not in Transparent Newcomb's, and this is Transparent Newcomb's

G.There will be no "agent" or "decision theory" to speak of, we just go out with a whimper via increasingly lowered fidelity to values in the machines we end up designing

This is from ten minutes of brainstorming, I'm sure it misses out some important ones. Obviously, if there don't exist any good ones (ones without counters), that gives us reason to beieve in alignment by default!

Keen to hear your responses.

When there's not a "right" operationalization, that usually means that the concepts involved were fundamentally confused in the first place.

Curious about the scope of the conceptual space where this belief was calibrated. It seems to me to tacitly say something like "everything that's important is finitely characterizable".

Maybe the "fundamentally confused" in your phrasing already includes the case of "stupidly tried to grab something that wasn't humanly possible, even if in principle" as a confused way for a human, without making any claim of reality being conveniently compressible at all levels. (Note that this link explicitly disavows beauty at "all levels" too.)

I suppose you might also say "I didn't make any claim of finiteness" but I do think something like "at least some humans are only a finite string away from grokking anything"  is implicit if you expect there to be blogposts/textbooks that can operationalize everything relevant. It would be an even stronger claim than "finiteness", it would be "human-typical length strings"

I believe Adam is pointing at something quite important, akin to a McNamara fallacy for formalization. To paraphrase:

The first step is to formalize whatever can be easily formalized. This is OK as far as it goes. The second step is to disregard that which can't be easily formalized or to make overly simplifying assumptions. This is artificial and misleading. The third step is to presume that what can't be formalized easily really isn't important. This is blindness. The fourth step is to say that what can't be easily formalized really doesn't exist. This is suicide.

In the case of something that has already been engineered (human brains with agency), we probably should grant that it is possible to operationalize everything relevant. But I want to pushback on the general version and would want "why do you believe simple-formalization is possible here, in this domain?" to be allowed to be asked.

[PS. am not a native speaker]

It seems like one of the most useful features of having agreement separate from karma is that it lets you vote up the joke and vote down the meaning :)

Thanks for clarifying! And for the excellent post :)

Finally, when steam flows out to the world, and the task passes out of our attention, the consequences (the things we were trying to achieve) become background assumptions. 

To the extent that Steam-in-use is a kind of useful certainty about the future, I'd expect "background assumptions" to become an important primitive that interacts in this arena as well, given that it's a useful certainty about the present. I realize that's possibly already implicit in your writing when you say figure/ground.

I think some equivalent of Steam pops out as an important concept in enabling-agency-via-determinism (or requiredism, as Eliezer calls it), when you have in your universe both:

  • iron causal laws coming from deterministic physics and
  • almost iron "telic laws" coming from regulation by intelligent agents with something to protect.

The latter is something that can also become a very solid (full of Steam) thing to lean on for your choice-making, and that's an especially useful model to apply to your selves across time or to a community trying to self-organize. It seems very neglected, formally speaking. Economically-minded thinking tends to somewhat respect it as a static assumption, but not so much the dynamics of formation AFAIK (and so dynamic Steam is a pretty good metaphor).

However, shouldn't "things that have faded into the background" be the other kind of trivial, ie. have "maximal Steam" rather than have "no Steam"? It's like an action that will definitely take place. Something that will be in full force. Trivially common knowledge. You yourself seem to point at it with "Something with a ton of steam feels inevitable", but I suppose that's more like the converse.

(EDIT: Or at least something like that. If a post on the forum has become internalized by the community, a new comment on it won't get a lot of engagement, which fits with "losing steam" after it becomes "solid". But even if we want to distinguish where the action is currently, it makes sense to have a separate notion of what's finished and can easily re-enter attention compared to what was never started.)

Also when you say, in your sunk costs example, "no steam to spend time thinking", I'd say a better interpretation than "time thinking" would be "not enough self-trust to repledge solidity in a new direction". Time to think sounds to me more like Slack, but maybe I'm confused.

I'm unsure if open sets (or whatever generalization) are a good formal underpinning of what we call concepts, but I'm in agreement that there seems needed at least a careful reconsideration of intuitions one takes for granted when working with a concept, when you're actually working with a negation-of-concept. And "believing in" might be one of those things that you can't really do with negation-of-concepts.

Also, I think a typo: you said "logical complement", I'm imagining you meant "set-theoretic complement". (This seems important to point out since in topological semantics for intuitionistic logic, the "logical complement" is in fact defined to be the interior of the set-theoretic complement, which guarantees an open.)

I began reading this charitably (unaware of whatever inside baseball is potentially going on, and seems to be alluded to), but to be honest struggled after "X" seemed to really want someone (Eliezer) to admit they're "not smart"? I'm not sure why that would be relevant. 

I think I found these lines especially confusing, if you want to explain:

  • "I just hope that people can generalize from "alignment is hard" to "generalized AI capabilities are hard".

    Is capability supposed to be hard for similar reasons as alignment? Can you expand/link? The only argument I can think of relating the two (which I think is a bad one) is "machines will have to solve their own alignment problem to become capable." 
  • Eliezer is invalidating the second part of this but not the first.

    This would be a pretty useless machiavellian strategy, so I'm assuming you're saying it's happening for other reasons? Maybe self-deception? Can you explain?
  • Eliezer thinks that OpenAI will try to make things go faster rather than slower, but this is plainly inconsistent with things like the state of vitamin D research

    This just made me go "wha" at first but my guess now is that this and the bits above it around speech recognition seem to be pointing at some AI winter-esque (or even tech stagnation) beliefs? Is this right?
