Decision Theory: Newcomb's Problem


In Defense of Attempting Hard Things, and my story of the Leverage ecosystem

I, also, really appreciate Cathleen for writing this piece, and found it worth reading and full of relevant details. I'll try to add more substantive comments in a week or so, but wanted meanwhile to add my vote to those recommending that folks wanting to understand Leverage read this piece.

AnnaSalamon's Shortform

This is one of my bottlenecks on posting, so I'm hoping maybe someone will share thoughts on it that I might find useful:

I keep being torn between trying to write posts about things I have more-or-less understood already (which I therefore more-or-less know how to write up), and posts about things I presently care a lot about coming to a better understanding of (but where my thoughts are not so organized yet, and so trying to write about it involves much much use of the backspace, and ~80% of the time leads to me realizing the concepts are wrong, and going back to the drawing board).

I'm curious how others navigate this, or for general advice.

What would you like from How valuable would it be to you?

I continue to get a lot of value from, just as is. Partly using it myself and partly using it with friends/family who want help evaluating particular actions. Very grateful for this site.

The main additional feature that would be great for me would be help modeling how much of an update to make from Covid tests (e.g., how much does it help if everyone takes a rapid test before a gathering).

The Rationalists of the 1950s (and before) also called themselves “Rationalists”

Thanks! I appreciate knowing this. Do you happen to know if there's a connection between these 1950's rationalists, and the "critical rationalists" (who are a contemporary movement that involves David Deutsch, the "taking children seriously" people, and some larger set of folks who try to practice a certain set of motions and are based out of the UK, I think)?

Frame Control

But to understand better: if I'd posted a version of this with fully anonymous examples, nothing specifically traceable to Leverage, would that have felt good to you, or would something in it still feel weird?

I'd guess the OP would’ve felt maybe 35% less uneasy-making to me, sans Geoff/Aubrey/“current” examples.

The main thing that bothers me about the post is related to, but not identical to, the post’s use of current examples:

I think the phenomena you’re investigating are interesting and important, but that the framework you present for thinking about them is early-stage. I don’t think these concepts yet “cleave nature at its joints.” E.g., it seems plausible to me that your current notion of “frame control” is a mixture of [some thing that’s actually bad for people] and mere disagreeableness (and that, for all I know, disagreeableness decreases rather than increases harms), as Benquo and Said variously argue. Or that this notion of “frame control” blends in some behaviors we’re used to tolerating as normal, such as leadership, as Matt Goldenberg argues. Or any number of other things.

I like that you’re writing about something early-stage! Particularly given that it seems interesting and important. But I will wish you would do it in a way that telegraphs the early-stage-ness and lends momentum toward having readers join you as fellow scientists/philosophers/naturalists who are squinting at the phenomena together. There are a lot of kinds of sentences that can invite investigation. Some are explicit — stating explicitly something like “this is an early-stage conceptualization of a set of thingies we’re probably still pretty confused by, and so I’d like to invite you guys in to be fellow scientists/philosophers/naturalists with me about this stuff, including helping spot where this model is a bit askew.” Some are more ‘inviting it by doing parts of it yourself to make it easy for others to join’ — saying things like “my guess is that all of the examples I’m clustering under ‘frame control’ share a common structure; some of the reasons for my guess as [reasons]; I’m curious what you guys think about whether there’s a common structure and a single cluster here”. (A lot of this amounts to showing your scratchwork.)

If the post seemed mostly to invite being a fellow scientist/philosopher/puzzler with you about these thingies, while mostly not-inviting “immediate application to current events with the assumption that ‘frame control’ is a simple thing that we-as-a-group now understand” (it could still invite puzzling at current events, but would in my hoped-for world invite doing this while puzzling at where the causal joints are, how valid the ‘frame control’ concept is or isn’t and what is or isn’t central to it, a la rationalist taboo), I’d feel great about it.

Frame Control

I expect these topics are hard to write about, and that there’s value in attempting it anyway. I want to note that before I get into my complaints. So, um, thanks for sharing your data and thoughts about this hard-to-write-about (AFAICT) and significant (also AFAICT) topic!

Having acknowledged this, I’d like to share some things about my own perspective about how to have conversations like these “well”, and about why the above post makes me extremely uneasy.

First: there’s a kind of rigor that IMO the post lacks, and IMO the post is additionally in a domain for which such rigor is a lot more helpful/necessary than such rigor usually is.

Specifically: I can’t tell what the core claims of the OP are. I can’t easily ask myself “what would the world look like if [core claim X] was true? If it were false? what do I see?” “How about [core claim Y]”? “Are [X] and [Y] the best way to account for the evidence the OP presents, or are there unnecessary details tagging along with the conclusions that aren’t actually actually implied by the evidence?”, and so on.

I.e., the post’s theses are not factored to make evidence-tracking easy.

I care more about (separable claims, each separately trackable by evidence, laid out to make vetting easy) here than I usually would, because the OP is about politics (specifically, it is about what behaviors should lead to us “burning [those who do them] with fire” and ostracizing those folks from our polity. Politics is damn tricky stuff; political discussion in groups about who to exclude and what precedents to set up for why is damn tricky stuff.

I think Raemon’s comment is pretty similar to the point I’m trying to make here.

(Key to my reaction here is that this is a large public discussion. I’m worried that in such discussions, “X was claimed, and upvoted, and no one objected” may cause many readers to assume “X is now a vetted claim that can be assumed-and-cited when making future arguments.” I’m not sure if this is right; if it’s false, I care less.)

(Alternately put: I like this post fine for conversation-level discussion; it’s got some interesting examples and anecdotes and claims and hypotheses, seems worth reading and helpful-on-some-points. I don’t as much like it as a contribution to LW’s “vetted precedents that we get to cite when sorting through political cases”, because I think it doesn’t hit the fairly high and hard-to-hit standard required for such precedents to be on-net-not-too-confusing/“weaponizable”/something.)

I expect it’s slower to try to proceed via separable claims that we can separately track the evidence for/against, but on ground this tricky, slower seems worth it to me.

I’ve often failed at the standard I’m requesting here, but I’ll try to hit in in the future, and will be a good sport when people point out I’m dramatically failing at it.

Secondly, and relatedly: I am uneasy about the fact that many of the post’s examples are from a current conflict that is still being worked out (the rationalist community’s attempt to figure out how to relate to Geoff Anders). IMO, we are still in the process of evaluating both: a) Whether Geoff Anders is someone the rationalist community (or various folks in it) would do better to ostracize, in various senses; and b) Whether there really is a thing called “frame control”, what exactly it is, whether it’s bad, whether it should be “burned with fire,” etc.

I would much rather we try to prosecute conversation (a) and conversation (b) separately, rather than taking unvetted claims about what a new bad thing is and how to spot it, and relatively unvetted claims about Geoff, and using them to reinforce each other.

(If one is a prerequisite for the other, we could try to establish that one first, and then bring in the other.)

The reason I’d much rather they be done separately, is that I don’t trust my own, or most others’, ability to track evidence when they’re done at once. The sort of confusion I get around this is similar to the confusion the OP describes frame-controllers as inducing with “burried claims”. If (a) and (b) are both cited as evidence for one another, it’s a bit tricky to pull out the claims, and I notice myself getting sort of dizzy as I read.

Hammering a bit more here, we get to my third source of unease: there are plenty of ways I can excerpt-and-paraphrase-uncharitably from the OP, that seem like kinds of things that ought not to be very compelling, and that I’d kind of expect would cause harm if a community found them compelling anyhow.

Uncharitable paraphrase/caricature: “Hey you guys. There’s a thing that is secretly very bad, but looks pretty normal. (So, discount your “this is probably fine”, “the argument for ostracism doesn’t seem very compelling here” reactions. (cf. “Finger-trap beliefs.)) I know it’s really bad because my dad was really bad for me and my mom during my childhood, and this not-very-specified thingy was the central thing; I can’t give you enough of a description to allow independent evaluation of who’s doing it, but I can probably detect it myself and tell you which people are/aren’t doing (the central and vaguely specified bad thing). We should burn it with fire when we see it; my saying this may trigger your “wait, we should be empathetic” reactions, but ignore those because, let me tell you so that you know, I’m normally very empathetic, and I think this one vaguely specified thing should be burned with fire. So you guys should override a bunch of your usual heuristics and trust (me or whoever you think is good at spotting this vaguely specified thing) to decide which things we should collectively burn with fire.”

It’s possible there are protective factors that should make me not-worry about this post, even if I’m right that a reasonable person would worry about some other posts that fit my above caricature. But I don’t clearly see them, and would like help with that if they are here!

I like a bunch of the ending, about holding things lightly and so on. I feel like that is basically enough to make the post net-just-fine, and also helpful, for an individual reading this, who isn’t part of a community with the rest of the readers and the author — for such an individual, the post basically seems to me to be saying “sometimes you’ll find yourself feeling really crazy around somebody without knowing how to pin down why. In such a case, feel free to trust your own judgment and get out of there, if that’s what your actual unjustifiable best guess at what to do is.” This seems like fine advice! But in a community context, if we’re trying to arrive at collective beliefs about other people (which I’m not sure we’re doing, and I’m even less sure we should be doing; if we aren’t, maybe this is fine), such that we’re often deferring to other peoples’ guesses about what was and wasn’t “frame control” and whether that “frame control” maps onto a set of things that are really actually “burn it with fire” harmful and not similar in some other sense… I’m uneasy!

Cornell Meetup

I've known Lionel since high school, and can vouch for him if it's somehow helpful. Additional thoughts: He's good at math; he's new enough to AI alignment that having anyone local-to-him (e.g. at Cornell / in Ithaca) who wants to talk about this would probably help, so don't be shy or think you need much background; he cares about this stuff; he enjoys thinking and trying to get at truth, and I tend to find him fun to talk to.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

A CFAR board member asked me to clarify what I meant about “corrupt”, also, in addition to this question.

So, um. Some legitimately true facts the board member asked me to share, to reduce confusion on these points:

  • There hasn’t been any embezzlement. No one has taken CFAR’s money and used it to buy themselves personal goods.
  • I think if you took non-profits that were CFAR’s size + duration (or larger and longer-lasting), in the US, and ranked them by “how corrupt is this non-profit according to observers who people think of as reasonable, and who got to watch everything by video and see all the details”, CFAR would on my best guess be ranked in the “less corrupt” half rather than in the “more corrupt” half.

This board member pointed out that if I call somebody “tall” people might legitimately think I mean they are taller than most people, and if I agree with an OP that says CFAR was “corrupt” they might think I’m agreeing that CFAR was “more corrupt” than most similarly sized and durationed non-profits, or something.

The thing I actually think here is not that. It’s more that I think CFAR’s actions were far from the kind of straight-forward, sincere attempt to increase rationality, compared to what people might have hoped for from us, or compared to what a relatively untraumatized 12-year-old up-and-coming-LWer might expect to see from adults who said they were trying to save the world from AI via learning how to think. (IMO, this was made mostly via a bunch of people doing reasoning that they told themselves was intended to help with existential risk or with rationality or at least to help CFAR or do their jobs, but that was not as much that as the thing a kid might’ve hoped for. I think I, in my roles at CFAR, was often defensive and power-seeking and reflexively flinching away from things that would cause change; I think many deferred to me in cases where their own sincere, Sequences-esque reasoning would not have thought this advisable; I think we fled from facts where we should not have, etc.).

I think this is pretty common, and that many of us got it mostly from mimicking others at other institutions (“this is how most companies do management/PR/whatever; let’s dissociate a bit until we can ‘think’ that it’s fine”). But AFAICT it is not compatible (despite being common) with the kinds of impact we were and are hoping to have (which are not common), nor with the thing that young or sincere readers of the Sequences, who were orienting more from “what would make sense” and less from “how do most organizations act” would have expected. And I think it had the result of wasting a bunch of good peoples’ time and money, and making it look as though the work we were attempting is intrinsically low-reward, low-yield, without actually checking to see what would happen if we tried to locate rationality/sanity skills in a simpleway.

I looked at the Wikipedia article on corruption to see if it had helpful ontology I could borrow. I would say that the kind of corruption I am talking about is “systemic” corruption rather than individual, and involved “abuse of discretion”.

A lot of what I am calling “corruption” — i.e., a lot of the systematic divergence between the actions CFAR was taking, and the actions that a sincere, unjaded, able-to-actually-talk-to-each-other version of us would’ve chosen for CFAR to take, as a best guess for how to further our missions — came via me personally, since I was in a leadership role manipulating the staff of CFAR by giving them narratives about how the world would be more saved if they did such-and-such (different narratives for different folks), and looking to see how they responded to these narratives in order to craft different ones. I didn’t say things I believed false, but I did choose which things to say in a way that was more manipulative than I let on, and I hoarded information to have more control of people and what they could or couldn’t do in the way of pulling on CFAR’s plans in ways I couldn’t predict, and so on. Others on my view chose to go along with this, partly because they hoped I was doing something good (as did I), partly because it was way easier, partly because we all got to feel as though were were important via our work, partly because none of us were fully conscious of most of this.

This is “abuse of discretion” in that it was using places in which my and our judgment had institutional power because people trusted me and us, and making those judgments via a process that was predictably going to have worse rather than better outcomes, basically in my case via what I’ve lately been calling narrative addiction.

I love the people who work at CFAR, both now and in the past, and predict that most would make your house or organization or whatnot better if you live or hire them or similar. They’re bringing a bunch of sincere goodwill, willingness to try what is uncomfortable (not fully, but more than most, and enough that I admire it and am impressed a lot), attempt better epistemic practices than I see most places where they know how to, etc. I’m afraid to say paragraphs like the ones preceding this one lest I cause people who are quite good as people in our social class go, and who sacrificed at my request in many cases, to look bad.

But in addition to the common human pass-time of ranking all of us relative to each other, figuring out who to scapegoat and who to pass other relative positive or negative judgments on, there is a different endeavor I care very much about: one of trying to see the common patterns that’re keeping us stuck. Including patterns that may be pretty common in our time and place, but that (I think? citation needed, I’ll grant) may have been pretty uncommon in the places where progress historically actually occurred.

And that is what I was so relieved to see Jessica’s OP opening a beginning of a space for us to talk about. I do not think Jessica was saying CFAR was unusually bad; she estimates it was on her best guess a less traumatizing place than Google. She just also tries to see through lines between patterns across places, in ways I found very relieving and hopeful. Patterns I strongly resisted seeing for most of the last six years. It’s the amount of doublethink I found in myself on the topic, more than almost any of the rest of it, that most makes me think “yes there is a non-trivial insight here, that Jessica has and is trying to convey and that I hope eventually does get communicated somehow, despite all the difficulties of talking about it so far.”

Self-Integrity and the Drowning Child

Equally importantly IMO, it argues for transfer from a context where the effect of your actions is directly perceptionally obvious to one where it is unclear and filters through political structures (e.g., aid organizations and what they choose to do and to communicate; any governments they might be interacting with; any other players on the ground in the distant country) that will be hard to model accurately.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

In the last two years, CFAR hasn't done much outward-facing work at all, due to COVID, and so has neither been a MIRI funnel nor definitively not a MIRI funnel.

Yes, but I would predict that we won't be the same sort of MIRI funnel going forward. This is because MIRI used to have specific research programs that it needed to hire for, and it it was sponsoring AIRCS (covering direct expenses plus loaning us some researchers to help run the thing) in order to recruit for that, and those research programs have been discontinued and so AIRCS won't be so much of a thing anymore.

This has been the main part of why no AIRCS post vaccines, not just COVID.

I, and I would guess some others at CFAR, am interested in running AIRCS-like programs going forward, especially if there are groups that want to help us pay the direct expenses for those programs and/or researchers that want to collaborate with us on such programs. (Message me if you're reading this and in one of those categories.) But it'll be less MIRI-specific this time, since there isn't that recruiting angle.

Also, more broadly, CFAR has adopted different structures for organizing ourselves internally, and we are bigger now into "if you work for CFAR, or are a graduate of our instructor training program, and you have a 'telos' that you're on fire to do, you can probably do it with CFAR's venue/dollars/collaborations of some sorts" (we're calling this "platform CFAR," Elizabeth Garrett invented it and set it up maybe about a year ago, can't remember), and also into doing hourly rather than salaried work in general (so we don't feel an obligation to fill time with some imagined 'supposed to do CFAR-like activity" vagueness, so that we can be mentally free) and are also into taking more care not to have me or anyone speak for others at CFAR or organize people into a common imagined narrative one must pretend to believe, but rather into letting people do what we each believe in, and try to engage each other where sensible. Which makes it a bit harder to know what CFAR will be doing going forward, and also leaves me thinking it'll have a bit more variety in it. Probably.

Load More