All of AnnaSalamon's Comments + Replies

The public early Covid-19 conversation (in like Feb-April 2020) seemed pretty hopeful to me -- decent arguments, slow but asymmetrically correct updating on some of those arguments, etc.  Later everything became politicized and stupid re: covid.

Right now I think there's some opportunity for real conversation re: AI.  I don't know what useful thing follows from that, but I do think it may not last, and that it's pretty cool.  I care more about the "an opening for real conversation" thing than for the changing overton window as such, although I think the former probably follows from the latter (first encounters are often more real somehow).

This seems like a very off-distribution move from Eliezer—which I suspect is in large part the point: when your model predicts doom by default, you go off-distribution in search of higher-variance regions of outcome space.

That's not how I read it.  To me it's an attempt at the simple, obvious strategy of telling people ~all the truth he can about a subject they care a lot about and where he and they have common interests.  This doesn't seem like an attempt to be clever or explore high-variance tails.  More like an attempt to explore the obvious strategy, or to follow the obvious bits of common-sense ethics, now that lots of allegedly clever 4-dimensional chess has turned out stupid.

I don't think what you say Anna contradicts what dxu said. The obvious simple strategy is now being tried, because the galaxy brained strategies don't seem like they are working; the galaxy-brained strategies seemed lower-variance and more sensible in general at the time, but now they seem less sensible so EY is switching to the higher-variance, less-galaxy-brained strategy.

0[comment deleted]3d
-9Thoth Hermes3d
But it does risk giving up something.  Even the average tech person on a forum like Hacker News still thinks the risk of an AI apocalypse is so remote that only a crackpot would take it seriously.   Their priors regarding the idea that anyone of sense could take it seriously are so low that any mention of safety seems to them a fig-leaf excuse to monopolize control for financial gain; as believable as Putin's claims that he's liberating the Ukraine from Nazis.  (See my recent attempt to introduce the idea here [] .) The average person on the street is even further away from this I think. The risk then of giving up "optics" is that you lose whatever influence you may have had entirely; you're labelled a crackpot and nobody takes you seriously.  You also risk damaging the influence of other people who are trying to be more conservative.  (NB I'm not saying this will happen, but it's a risk you have to consider.) For instance, personally I think the reason so few people take AI alignment seriously is that we haven't actually seen anything all that scary yet.  If there were demonstrations of GPT-4, in simulation, murdering people due to mis-alignment, then this sort of a pause would be a much easier sell.  Going full-bore "international treaty to control access to GPUs" now introduces the risk that, when GPT-6 is shown to murder people due to mis-alignment, people take it less seriously, because they've already decided AI alignment people are all crackpots. I think the chances of an international treaty to control GPUs at this point is basically zero.  I think our best bet for actually getting people to take an AI apocalypse seriously is to demonstrate an un-aligned system harming people (hopefully only in simulation), in a way that people can immediately see could extend to destroying the whole human race if the AI were more capable.  (It would also give all those AI researchers something more concrete to do: figure out h

Thanks for the suggestion.  I haven't read it.  I'd thought from hearsay that it is rather lacking in "light" -- a bunch of people who're kinda bored and can't remember the meaning of life -- is that true?  Could be worth it anyway.

1Aatu Koskensilta3d
It's heavily implied in the novels we only see the "disaffected" lot -- people who experience ennui, etc. and are drawn to find meaning out of a sense of meaninglesness even in somewhat inadvisable ways -- and the whole of Culture is mostly exploring the state space of consciousness and the nature of reality, sort of LARPing individual humanity as a mode of exploration -- you can for instance upgrade yourself from a humanoid into something resembling a Mind to a degree if you want to, it just seems this is not the path we mostly see mentioned. It's just that that sort of thing is not narratively exciting for most people, and Banks is, after all, in the entertainment business in a sense. There are interesting themes explored in the books that go beyond just the "cinematic fireworks and a sense of scale". For instance, it is suggested that the Culture could have the option to simply opt out of Samasara, but refuses to do this out the suspicion that the possibility of Sublimation -- collectively entering Nirvana -- would be to cop out, preventing them from helping sentient beings. (There's a conflation of sapience and sentience in the books, and disregard for the plight of sentient beings who are not "intelligent" to a sufficient degree, but otherwise there's an underlying sentientist/truth-seeking slant to it.)  The Minds of Culture are also represented to be basically extremely sophisticated consequentialists with appreciation for "Knightian uncertainty" and wary about total certainty about their understanding of the nature of reality, although it's not clear if they're e.g. super intelligent negative utilitarian Boddhisattva beings -- in the Culture world there seems still be belief in individual, metaphysically enduring personal identity extending to the Minds themselves, but it might also be that this is again a narrative device -- or some sort of anti-realists about ethics but on the side of the angels just for the heck of it, because why not, what else could t

Not sure where you're going with this.  It seems to me that political methods (such as petitions, public pressure, threat of legislation) can be used to restrain the actions of large/mainstream companies, and that training models one or two OOM larger than GPT4 will be quite expensive and may well be done mostly or exclusively within large companies of the sort that can be restrained in this sort of way.

Maybe also: anything that bears on how an LLM, if it realizes it is not human and is among aliens in some sense, might want to relate morally to thingies that created it and aren't it.  (I'm not immediately thinking of any good books/similar that bear on this, but there probably are some.)

The Mote in God's Eye is about creatures that feel heavily misaligned with their evolutionary selection filters. Golem XIV is about an advanced AI trying to explain things about how our biological selection filters created weird spandrels in consciousness.

I was figuring GPT4 was already trained on a sizable fraction of the internet, and GPT5 would be trained on basically all the text (plus maybe some not-text, not sure).  Is this wrong?

4the gears to ascension4d
Oh hmm - that could be true. I suspect that data curation is too important though, there are significant gains to be had by not including confusing data as positive examples. [Loading paper links...]

In terms of what kinds of things might be helpful:

1. Object-level stuff:

Things that help illuminate core components of ethics, such as "what is consciousness," "what is love," "what is up in human beings with the things we call 'values', that seem to have some thingies in common with beliefs," "how exactly did evolution end up producing the thing where we care about stuff and find some things worth caring about," etc.

Some books I kinda like in this space: 

  • Martin Buber's book "I and thou"; 
  • Christopher Alexander's writing, especially his "The Natur
... (read more)

It may not be possible to prevent GPT4-sized models, but it probably is possible to prevent GPT-5-sized models, if the large companies sign on and don't want it to be public knowledge that they did it.  Right?

1the gears to ascension4d
Not for long. Sure, maybe it's a few months.

As a personal datapoint: I think the OPs descriptions have a lot in common with how I used to be operating, and that I think this would have been tremendously good advice for me personally, both in terms of its impact on my personal wellness and in terms of its impact on whether I did good-for-the-world things or harmful things.

(If it matters, I still think AI risk is a decent pointer at a thingy in the world that may kill everyone, and that this matters.  The "get sober" thing is a good idea both in relation to that and broadly AFAICT.)

Thank you for adding your personal data point. I think it's helpful in the public space here. But also, personally I liked seeing that this is (part of) your response. I totally agree.

Nope, haven't changed it since publication.

I like this observation.  As a random note, I've sometimes heard people justifying "leave poor working conditions in place for others, rather than spending managerial time improving them" based on how AI risk is an emergency, though whether this checks out on a local consequentialist level is not actually analyzed by the model above, since it partly involves tradeoffs between people and I didn't try to get into that.

I sorta also think that "people acting on a promise of community and support that they later [find] [isn't] there" is sometimes done semi... (read more)

5Ben Pace8mo
Personally I think of people as more acting out their dream, because reality seems empty. Like the cargo culters, praying to a magical skyplane that will never arrive. Sure, you can argue to them that they're wasting their time. But they don't have any other idea about how to get skyplanes, and the world is a lot less... magical without them. So they keep manning their towers and waving their lights.

I can think of five easily who spontaneously said something like this to me and who I recall specific names and details about.  And like 20 more who I'm inclined to project it onto but there was some guesswork involved on my part (e.g., they told me about trouble having hobbies and about feeling kinda haunted by whether it's okay to be "wasting" their time, and it seemed to me these factors were connected, but they didn't connect them aloud for me; or I said I thought there was a pattern like this and they nodded and discussed experiences of theirs bu... (read more)

If you get covid (which many of my friends seem to be doing lately), and your sole goal is to minimize risk of long-term symptoms, is it best to take paxlovid right away, or with a delay?

My current low-confidence guess is that it is best with a delay of ~2 days post symptoms.  Would love critique/comments, since many here will face this sometime this year.

Basic reasoning: anecdotally, "covid rebound" seems extremely common among those who get paxlovid right away, probably also worse among those who get paxlovid right away.  Paxlovid prevents vira... (read more)

Maybe.  But a person following up on threads in their leisure time, and letting the threads slowly congeal until they turn out to turn into a hobby, is usually letting their interests lead them initially without worrying too much about "whether it's going anywhere," whereas when people try to "found" something they're often trying to make it big, trying to make it something that will be scalable and defensible.  I like that this post is giving credit to the first process, which IMO has been historically pretty useful pretty often.  I'd also ... (read more)

Many, though not all, of the "gentlemen scientists" were an intensely competitive bunch. They didn't typically have to to scale and defend their discoveries by building large organizations, because they were producing scientific knowledge. Their interests were guided by the interests of their contemporaries, or by the pressing issues of their day, as well as their own enthusiasms.

For example, Joseph Montgolfier started building parachutes around age 35. About 7 years later:

... he was watching a fire one evening while contemplating one of the great military

... (read more)

I appreciate this comment a lot.  Thank you.  I appreciate that it’s sharing an inside view, and your actual best guess, despite these things being the sort of thing that might get social push-back!

My own take is that people depleting their long-term resources and capacities is rarely optimal in the present context around AI safety.

My attempt to share my reasoning is pretty long, sorry; I tried to use bolding to make it skimmable.

In terms of my inside-view disagreement, if I try to reason about people as mere means to an end (e.g. “labor”):

0. &nb... (read more)

Thanks for all these comments. I agree with a bunch of this. I might try later to explain more precisely where I agree and disagree.

I think this is a solid point, and that pointing out the asymmetry in evolutionary gradients is important; I would also expect different statistical distributions for men and women here.  At the same time, my naive ev psych guess about how all this is likely to work out would also take into account that men and women share genes, and that creating gender-specific adaptations is actually tricky.  As evidence: men have nipples, and those nipples sometimes produce drops of milk.

Once, awhile ago and outside this community, a female friend swore me to... (read more)

I've known several men who had sexual encounters with women that... labeling them is hard, let's say the encounters left them unhappy, and would have been condemned if the sexes had been reversed. These men encountered a damaging amount of pushback and invalidation when they tried to discuss their feelings about those encounters.  One was literally told "I hope you were grateful", for others the invalidation was more implicit.  For at least 2, me saying "that sounds fucked up" and then listening was an extremely helpful novelty. So I'm really ner... (read more)

A bunch of people have told me they got worse at having serious/effortful intellectual hobbies, and at "hanging out", after getting worried about AI.  I did, for many long years.  Doesn't mean it's not an "excuse"; I agree it would be good to try to get detailed pictures of the causal structure if we can.

In fairness, a lot of these things (clothes, hairstyles, how "hard core" we can think we are based on working hours and such) have effects on our future self-image, and on any future actions that're mediated by our future self-image.  Maybe they're protecting their psyches from getting eaten by corporate memes, by refusing to cut their hair and go work there.

I suspect we need to somehow have things less based in self-image if we are to do things that're rooted in fresh perceptions etc. in the way e.g. science needs, but it's a terrifying transition.

I may have been primed to interpret this post in those terms too much, because I perceived it to be a reaction to Eliezer's recent doomy-sounding blog posts (and people worrying about AI more than usual recently because of that, plus ML news, plus various complicated social dynamics), trying to prevent the community from 'going too far' in certain directions. ... But it sounds like I may be imposing context on the post that isn't the way you were thinking about it while writing it.

Oh, yeah, maybe.  I was not consciously responding to that.  I was... (read more)

In terms of "and those people who care will be broad and varied and trying their hands at making movies and doing varied kinds of science and engineering research and learning all about the world while keeping their eyes open for clues about the AI risk conundrum, and being ready to act when a hopeful possibility comes up" we're doing less well compared to my 2008 hopes. I want to know why and how to unblock it.

I think to the extent that people are failing to be interesting in all the ways you'd hoped they would be, it's because being interesting in th... (read more)

I make this point not to argue against finding love or starting a family, but to argue against a mindset that treats AGI and daily life as more or less two different magisteria….

It still doesn't feel to me like it's fully speaking as though the two worlds as one world


The situation is tricky, IMO.  There is, of course, at the end of the day only one world.  If we want to have kids who can grow up to adulthood, and who can have progeny of their own, this will require that there be a piece of universe hospitable to human life where they can do... (read more)

That makes sense to me, and it updates me toward your view on the kid-having thing. (Which wasn't the focus of my comment, but is a thing I was less convinced of before.) I feel sad about that having happened. :( And curious about whether I (or other people I know) are making a similar mistake.

(My personal state re kids is that it feels a bit odd/uncanny when I imagine myself having them, and I don't currently viscerally feel like I'm giving something up by not reproducing. Though if I lived for centuries, I suspect I'd want kids eventually in the same wa... (read more)

A friend emailed me a comment I found helpful, which I am copying here with their permission: 

"To me [your post] sounded a bit like a lot of people are experiencing symptoms similar to ADHD: both becoming hyperfocused on a specific thing and having a lot of habits falling apart. Makes sense conceptually if things labeled as emergencies damage our attention systems. I think it might have to do with a more general state of stress/future-shock where people have to go into exception-handling mode more often. As exceptions become normalized the s... (read more)

But it still feels to me like it's a post trying to push the pendulum in a particular direction, rather than trying to fully and openly embody the optimal-by-your-lights Balancing Point.


AFAICT, I am trying to fully and openly embody the way of reasoning that actually makes sense to me in this domain, which… isn’t really a “balancing point.”  It’s more like the anarchist saying “the means are the ends.”  Or it’s more like Szilard’s “ten commandments,” (which I highly recommend reading for anyone who hasn’t; they’re short).  Or more like... (read more)

8Rob Bensinger8mo
Yeah, that makes sense to me. I'm complaining about a larger class of posts, so maybe this one isn't really an example and I'm just pattern-matching. I do still wish there existed more posts that were very obviously examples of the 'both-and' things I was pointing at. (Both dentist appointments and Dyson spheres; both embrace slack and embrace maxipok; etc.) It might be that if my thinking were clearer here, I'd be able to recognize more posts as doing 'both-and' even if they don't explicitly dwell on it as much as I want.

I want to have a dialog about what’s true, at the level of piece-by-piece reasoning and piece-by-piece causes.  I appreciate that you Rob are trying to do this; “pedantry” as you put it is great, and seems to me to be a huge chunk of why LW is a better place to sort some things out than is most of the internet.

I’m a bit confused that you call it “pedantry”, and that you talk of my post as trying to push the pendulum in a particular way, and “people trying to counter burnout,” and whether this style of post “works” for others.  The guess I’m formi... (read more)

I think of this in terms of personal vs. civilization-scale value loci distinction. Personal-scale values, applying to individual modern human minds, speaking of those minds, might hold status quo anchoring sacred and dislike presence of excessive awareness of disruptive possible changes. While civilization-scale values, even as they are facilitated by individuals, do care about accurate understanding of reality regardless of what it says. People shouldn't move too far towards becoming decision theoretic agents, even if they could, other than for channeling civilization. The latter is currently a necessity (that's very dangerous to neglect), but it's not fundamentally a necessity. What people should move towards is a more complicated question with some different answer (which does probably include more clarity in thinking than is currently the norm or physiologically possible, but still). People are vessels of value, civilization is its custodian. These different roles call for different shapes of cognition. In this model, it's appropriate / morally-healthy / intrinsically-valuable for people to live more fictional lives (as they prefer) while civilization as a whole is awake, and both personal-scale values and civilization-scale values agree on this point.

I want to have a dialog about what’s true, at the level of piece-by-piece reasoning and piece-by-piece causes.  I appreciate that you Rob are trying to do this; “pedantry” as you put it is great, and seems to me to be a huge chunk of why LW is a better place to sort some things out than is most of the internet.

Yay! I basically agree. The reason I called it "pedantry" was because I said it even though (a) I thought you already believed it (and were just speaking imprecisely / momentarily focusing on other things), (b) it's an obvious observation that a... (read more)

Yes!  I am really interested in this sort of dynamic; for me things in this vicinity were a big deal I think.  I have a couple half-written blog posts that relate to this that I may manage to post over the next week or two; I'd also be really curious for any detail about how this seemed to be working psychologically in you or others (what gears, etc.).  

I have been using the term "narrative addiction" to describe the thing that in hindsight I think was going on with me here -- I was running a whole lot of my actions off of a backchain from a... (read more)

Pain is Not The Unit of Effort [] as well as the "Believing in free will to earn merit" example under Beliefs as Emotional Strategies [] also seem relevant.
My best guess at mechanism: 1. Before, I was a person who prided myself on succeeding at marshmallow tests. This caused me to frame work as a thing I want to succeed on, and work too hard. 2. Then, I read Meaningful Rest [] and Replacing Guilt [], and realized that often times I was working later to get more done that day, even though it would obviously be detrimental to the next day. This makes the reverse marshmallow test dynamic very intuitively obvious. 3. Now I am still a person who prides myself on my marshmallow prowess, but hopefully I've internalized an externality or something. Staying up late to work doesn't feel Good and Virtuous, it feels Bad and like I'm knowingly Goodharting myself. Note that this all still boils down to narrative-stuff. I'm nowhere near the level of zen that it takes to Just Pursue The Goal, with no intermediating narratives or drives based on self-image. I don't think this patch has been particularly moved me towards that either, it's just helpful for where I'm currently at.

I agree that many of those who decide to drop everything to work on AI expect AI sooner than that.  (Though far from all.)

It seems to me though that even if AI is in fact coming fairly soon, e.g. in 5 years, this is probably still not-helpful for reducing AI risk in most cases, compared to continuing to have hobbies and to not eat one's long-term deep interests and spiritual health and ability to make new sense of things.

Am I missing what you're saying?

I agree that the time frame of 5-30 years is more like a marathon than a sprint, but those you are talking about treat it like a sprint. It would make sense if there was a clear low-uncertainly estimate of "we have to finish in 5 years, and we have 10 years worth of work to do" to better get cracking, everything else is on hold. But it seems like a more realistic estimate is "the TAI timeline is between a few years and a few decades, and we have no clue how much work AI Safety entails, or if it is even an achievable goal. Worse, we cannot even estimate the effort required to figure out if the goal is achievable, or even meaningful." In this latter case, it's a marathon on unknown length, and one has to pace themselves. I wonder if this message is intentionally minimized to keep the sense of urgency going.

One substantive issue I didn’t manage to work into the OP, but am interested in, is a set of questions about memetics and whether memetics is one of the causes of how urgent so many people seem to find so many causes.

A section I cut from the OP, basically  because it's lower-epistemic-quality and I'm not sure how relevant it is or isn't to the dynamics I kept in the OP, but that I'd like to throw into the comments section for discussion:

Memetics sometimes leads to the amplification of false “emergencies”

Once upon a time, my former housemate Steve Ra... (read more)

You managed to cut precisely the part of the post that was most useful for me to read :) 

(To be clear, putting it in this comment was just as helpful, maybe even more-so.)

I wrote: 
>... Also, I am excited about people trying to follow paths to all of their long-term goals/flourishing, including their romantic and reproductive goals, and I am actively not excited about people deciding to shelve that because they think AI risk demands it. 

Justis (amid a bunch of useful copy-editing comments) said he does not know what "actively not excited" is supposed to mean, and suggested that maybe I meant "worried."  I do not mean "worried", and do mean "actively not excited": when people do this, it makes me less excited by and hopeful about the AI risk movement; it makes me think we're eating our seedcorn and have less here to be excited about.

There’s a lot I want to try to tell LessWrong about.  A lot of models, perceptions, thoughts, patterns of thinking.  It’s been growing and growing for me over the last several years.

A lot of the barrier to me posting it has been that I am (mostly unendorsedly) averse to publishing drafts that’re worse than my existing blog posts, or that may not make sense to people, or that talk about some things without having yet talked about other things that I care more about, or etc.  This aversion seems basically mistaken to me because “trial and erro... (read more)

I missed this comment when it first went up. FYI this problem is also part of what shortform is for – you can get half-formed ideas out there, and then if they turn out to be pretty-close-to-publishable-as-top-level-post you can repost them. (Oliver used to do some publishing of his thoughts via shortform, and then later republish them as posts)
4Adam Zerner9mo
That sounds awesome! I have similar feelings. This is how I think about it. I don't feel great about this as a way of explaining it, but perhaps it'd be useful. Think about posts as forming some sort of spectrum. On one end (let's say the right side) you've got something like a book. The ideas have been refined. The author spent a ton of time researching it, coming up with great examples, revising it, doing user testing on people, having professional editors look at it, etc. Next to a book maybe you've got something like an academic journal article. Next to that maybe an essay, or a blog post where a lot of effort has been put into it. Then on the other end of the spectrum (left side) you've got maybe notes that are scribbled on the back of a napkin. Just the raw seeds of an idea. Then maybe after that you take that napkin home with you and expand a bit about those thoughts in a personal journal, but still very informal and unrefined. Then maybe you text a friend about it. Then maybe you email another friend. Then maybe posting on eg. the LW shortform. See, there's a spectrum. If you buy that there is this spectrum, which I think is pretty self-evident, it begs the question of how well we (LW? Rationality community? Society?) are doing at providing a platform for people at various points along that spectrum. I think that LW does a good job in the vicinity of "well researched blog post", but for the sorts of things at the left side of the spectrum, I don't really feel like LW addresses it. And I think that it is a cultural problem, not a technical one. We have things like Shortform, Open Thread, and various Slack and Discord groups. It's just that, at least IME, people don't use it for things that are on the left side of the spectrum, and thus it feels uncomfortable if you are doing things on left side of the spectrum, even if eg. the Personal Blog Posts are explicitly intended for "left side of the spectrum" types of thoughts. So bringing this back full circle,

Justis and Ruby made a bunch of good substantive comments on my draft, and also Justis made a bunch of very helpful typo-fixes/copy-editing comments on my draft.

I fixed the copy-editing ones but mostly did not respond to the substantive ones, though I liked them; I am hoping some of that discussion makes its way here, where it can happen in public.

> I've thought a bit about ideas like this, and talked to much smarter people than myself about such ideas - and they usually dismiss them, which I take as a strong signal this may be a misguided idea. 

I honestly don’t know whether slowing down AI progress in these ways is/isn’t a good idea.  It seems plausibly good to me.  I do think I disagree about whether the “much smarter people”s dismissal of these ideas is a strong signal.

Why I disagree about the strong signal thing:

I had to push through some fear as I wrote the sentence about it s... (read more)

It seems hard to make the numbers come out that way. E.g. suppose human-level AGI in 2030 would cause a 60% chance of existential disaster and a 40% chance of existential disaster becoming impossible, and human-level AGI in 2050 would cause a 50% chance of existential disaster and a 50% chance of existential disaster becoming impossible. Then to be indifferent about AI timelines, conditional on human-level AGI in 2050, you'd have to expect a 1/5 probability of existential disaster from other causes in the 2030-2050 period. (That way, with human-level AGI in 2050, you'd have a 1/2 * 4/5 = 40% chance of surviving, just like with human-level AGI in 2030.) I don't really know of non-AI risks in the ballpark of 10% per decade. (My guess at MIRI people's model is more like 99% chance of existential disaster from human-level AGI in 2030 and 90% in 2050, in which case indifference would require a 90% chance of some other existential disaster in 2030-2050, to cut 10% chance of survival down to 1%.)
All of this makes sense, and I do agree that it's worth consideration (I quadruple upvoted the check mark on your comment). Mainly in-person conversations, since the absolute worst case scenario with in-person conversations is that new people learn a ton of really good information about the nitty-gritty problems with mass public outreach; such as international affairs []. I don't know if there's a knowable upper bound on how wayward/compromised/radicalized this discussion could get if such discussion takes place predominantly on the internet. I'd also like to clarify that I'm not "interpreting this silence as evidence", I've talked to AI policy people, and I also am one, and I understand the details of why we reflexively shoot down the idea of mass public outreach. It all boils down to ludicrously powerful, territorial, invisible people with vested interests in AI, and zero awareness of what AGI is or why it might be important (for the time being).

I didn't follow CFAR that closely, so I don't know how transparent you were that this was a MIX of rationality improvement AND AI-Safety evangelism.

How transparent we were about this varied by year.  Also how much different ones of us were trying to do different mixes of this by different programs varied by year, which changed the ground truth we would've been being transparent about.  In the initial 2012 minicamps, we were part of MIRI still legally and included a class or two on AI safety.  Then we kinda dropped it from the official stuff,... (read more)

  I do not think this is true. I snapped to 'Oh God this is right and we're all dead quite soon' as a result of reading a short story about postage stamps something like fifteen years ago, and I was totally innocent of Bayesianism in any form.  It's not a complicated argument at all, and you don't need any kind of philosophical stance to see it.  I had exactly the same 'snap' reaction to my first exposure to ideas like global warming, overpopulation, malthus, coronavirus, asteroids, dysgenics, animal suffering, many-worlds, euthanasia, etc ad inf. Just a few clear and simple facts, and maybe a bit of mathematical intuition, but nothing you wouldn't get from secondary school, lead immediately to a hideous or at least startling conclusion. I don't know what is going on with everyone's inability to get these things. I think it's more a reluctance to take abstract ideas seriously. Or maybe needing social proof before thinking about anything weird.  I don't even think it's much to do with intelligence. I've had conversations with really quite dim people who nevertheless 'just get' this sort of thing. And many conversations with very clever people who can't say what's wrong with the argument but nevertheless can't take it seriously.  I wonder if it's more to do with a natural immunity to peer pressure, and in fact, love of being contrarian for the sake of it (which I have in spades, despite being fairly human otherwise), which may be more of a brain malformation than anything else. It feels like it's related to a need to stand up for the truth even when (possibly even because) people hate you for it. Maybe the right path here is to find the already existing correct contrarians, rather than to try to make correct contrarians out of normal well-functioning people. 
Looks like late 2016 [].

Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.

I love this question.  Mostly because your model seems pretty natural and clear, and yet I disagree with it.

To me it looks more like AI alignment research, in that one is often trying to align internal processes with e.g. truth-seeking, so that a person ends up doing reasoning instead of rationalization.  Or, on the group level, so that people can work together to form accurate maps and build good things, instead of working to trick each oth... (read more)

Ah, I see your point now, and it makes sense. If I had to summarize it (and reword it in a way that appeals to my intuition), I'd say that the choice of seeking the truth is not just about "this helps me," but about "this is what I want/ought to do/choose". Not just about capabilities. I don't think I disagree at this point, although perhaps I should think about it more. I had the suspicion that my question would be met with something at least a bit removed inference-wise from where I was starting, since my model seemed like the most natural one, and so I expected someone who routinely thinks about this topic to have updated away from it rather than not having thought about it. Regarding the last paragraph: I already believed your line "increasing a person's ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct." It didn't seem to bear on the argument in this case because it looks like you are getting alignment for free by improving capabilities (if you reason with my previous model, otherwise it looks like your truth-alignment efforts somehow spill over to other values, which is still getting something for free due to how humans are built I'd guess). Also... now that I think about it, what Harry was doing with Draco in HPMOR looks a lot like aligning rather than improving capabilities, and there were good spill-over effects (which were almost the whole point in that case perhaps).   

I mean, that kind of is the idea in Eliezer's post "Schools proliferating without evidence," from two years before CFAR was founded.

(Minus the "so we stop" part.)

"Anti-crux" is where the two parties who're disagreeing about X take the time to map out the "common ground" that they both already believe, and expect to keep believing, regardless of whether X is true or not.  It's a list of the things that "X or not X?" is not a crux of.  Often best done before double-cruxing, or in the middle, as a break, when the double-cruxing gets triggering/disorienting for one or both parties, or for a listener, or for the relationship between the parties.

A common partial example that may get at something of the spirit o... (read more)

Thanks for weighing in; I trust these conversations a lot more when they have multiple people from current or former CFAR.  (For anyone not tracking, Unreal worked at CFAR for awhile.)  (And, sorry, I know you said you're mainly writing this to not-me, but I want to engage anyhow.)

The hypotheses listed mostly focus on the internal aspects of CFAR.

This may be somewhat misleading to a naive reader. (I am speaking mainly to this hypothetical naive reader, not to Anna, who is non-naive.) 

.... It's good FOR CFAR to consider what the org cou

... (read more)


I think a careful and non-naive reading of your post would avoid the issues I was trying to address. 

But I think a naive reading of your post might come across as something like, "Oh CFAR was just not that good at stuff I guess" / "These issues seem easy to resolve." 

So I felt it was important to acknowledge the magnitude of the ambition of CFAR and that such projects are actually quite difficult to pull off, especially in the post-modern information age. 


I wish I could say I was speaking from an interest in tackling the puzzle. I'm not coming from there. 


  • The egregores that are dominating mainstream culture and the global world situation are not just sitting passively around while people try to train themselves to break free of their deeply ingrained patterns of mind. I think people don't appreciate just how hard it is to uninstall the malware most of us are born with / educated into (and which block people from original thinking). These egregores have been functioning for hundreds of years. Is the ground fertile for the art of rationality? My sense is that the ground is dry and salted, and yet we stil
... (read more)
I probably don't have the kinds of concepts you're interested in, but...  Some significant conceptual pieces in my opinion are: * "As above, so below." Everything that happens in the world can be seen as a direct, fractal-like reflection of 'the mind' that is operating (both individual and collective). Basically, things like 'colonialism' and 'fascism' and all that are external representations of the internal. (So, when some organization is having 'a crisis' of some kind, this is like the Shakespeare play happening on stage... playing out something that's going on internal to the org, both at the group level and the individual level.) Egregores, therefore, are also linked inextricably to 'the mind', broadly construed. They're 'emergent' and not 'fixed'. (So whatever this 'rationality' thing is, could be important in a fundamental way, if it changes 'the mind'.) Circling makes this tangible on a small scale. * My teacher gave a talk on "AI" where he lists four kinds of processes (or algorithms, you could say) that all fit onto a spectrum. Artificial Intelligence > Culture > Emotions / Thoughts > Sense perception. Each of these 'algorithms' have 'agendas' or 'functions'. And these functions are not necessarily in service of truth. ('Sense perception' clearly evolved from natural selection, which is keyed into survival and reproduction. Not truth-seeking aims. In other words, it's 'not aligned'.) Humans 'buy in' to these algorithms and deeply believe they're serving our betterment, but 'fitness' (ability to survive and reproduce) is not necessarily the result of 'more truth-aligned' or 'goodness aligned'. So ... a deeper investigation may be needed to discern what's trustworthy. Why do we believe what we believe? Why do we believe the results of AI processes... and then why do we believe in our cultural ideologies? And why do I buy into my thoughts and feelings? Being able to see the nature of all

I suspect we need to engage with politics, or with noticing the details of how rationality (on group-relevant/political topics) ends up in-practice prevented in many groups, if we want to succeed at doing something real and difficult in groups (such as AI safety).

Is this what you mean?

One of the big modeling errors that I think was implicit in CFAR through most of its history, was that rationality was basically about making sure individuals have enough skill for individual reasoning, rather than modeling it as having a large component that is about resisti... (read more)

I think I meant something in the more general sense of political issues being important topics on which to apply rationality, but very poor topics on which to learn or improve rationality.  Trying to become stronger in the Bayesean Arts is a different thing than contributing to AI Safety (and blended in difficult ways with evaluating AI Safety as a worthy topic for a given aspiring-rationalist's time). For resisting pressure and memeplexes, this is especially true, if most/all of the guides/authorities have bought into this specific memeplex and aren't particularly seeking to change their beliefs, only to "help" students reach a similar belief. I didn't follow CFAR that closely, so I don't know how transparent you were that this was a MIX of rationality improvement AND AI-Safety evangelism.  Or, as you'd probably put it, rationality improvement which clearly leads to AI-Safety as an important result.

CFAR, to really succeed at what I see as its mission (bring rationality to the masses), needed...

IMO (and the opinions of Davis and Vaniver, who I was just chatting with), CFAR doesn't and didn't have this as much of its mission.

We were and are (from our founding in 2012 through the present) more focused on rationality education for fairly small sets of people who we thought might strongly benefit the world, e.g. by contributing to AI safety or other high-impact things, or by adding enrichment to a community that included such people.  (Though with th... (read more)

One Particular Center for Helping A Specific Nerdy Demographic Bridge Common Sense and Singularity Scenarios And Maybe Do Alignment Research Better But Not Necessarily The Only Or Primary Center Doing Those Things

Maybe this was a wrong strategy even given your goals. Imagine that your goal is to train 10 superheroes, and you have the following options: A: Identify 10 people with greatest talent, and train them. B: Focus on scaling. Train 10 000 people. It seems possible to me that the 10 best heroes in strategy B might actually be better than the 10 heroes in strategy A. Depends on how good you are at identifying talented heroes, whether the ones you choose actually agree to get trained by you, what kinds of people self-select for the scaled-up training, etc. Furthermore, this is actually a false dilemma. If you find a way to scale, you can still have a part of your team identify and individually approach the talented individuals. They might be even more likely to join if you tell them that you already trained 10 000 people but they will get an individualized elite training.

and relatedly at some point I got a doomy sense about CFAR after inquiring with various people and not being able to get a sense of a theory of change or a process that could converge to a theory of change for being able to diagnose this and other obstacles.

Can you say a bit more about what kind of a "theory of change" you'd want to see at CFAR, and why/how?  I still don't quite follow this point.

Weirdly, we encountered "behaviors consistent with wanting fancy indirect excuses to not change" less than I might've expected, though still some.  This... (read more)

Every org has a tacit theory of change implied by what they are doing, some also have an explicit one (eg poor to middling examples: business consulting orgs). Sometimes the tacit one lines up with the explicit one, sometimes not. I think having an explicit one is what allows you to reason about and iterate towards one that is functional. I don't know the specific theory of change that would be a good fit for what CFAR was trying to do, I was, at the time, bouncing off the lack of any explicit one and some felt sense of resistance towards moving in the direction of having one in 1 on 1 conversations. I think I was expecting clearer thoughts since I believed that CFAR was in the business of investigating effect sizes of various theories of change related to diagnosing and then unblocking people who could work on x-risk. This gets much stronger once you get big effect sizes that touch on core ways of navigating the world someone holds.

I mean... "are you making progress on how to understand what intelligence is, or other basic foundational issues to thinking about AI" does have somewhat accessible feedback loops sometimes, and did seem to me to feed back in on the rationality curriculum in useful ways.

I suspect that if we keep can our motives pure (can avoid Goodhardting on power/control/persuasion, or on "appearance of progress" of various other sorts), AI alignment research and rationality research are a great combination.  One is thinking about how to build aligned intelligence i... (read more)

Is this true though? Teaching rationality improves capability in people but shouldn't necessarily align them. People are not AIs, but their morality doesn't need to converge under reflection.  And even if the argument is "people are already aligned with people", you still are working on capabilities when dealing with people and on alignment when dealing with AIs. Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.

I wish we had.  Unfortunately, I don't think we did much in the way of pre-portems on our long-term goals, unless I'm forgetting something.  (We discussed "what if CFAR doesn't manage to become financially able to keep existing at all", and "what if particular workshops can't be made to work," but those are shorter term.)  Eliezer's sequence "The craft and the community" was written before CFAR, but after he wanted an independent rationality community that included rationality training, so we could try to compare what happened against that.

Yeah, not clear what this particular scenario would have looked like then. "We succeed financially, we get good feedback from satisfied customers, but our rationality training doesn't seem to make the alumni measurably more "rational", and so we stop."

Mostly wanted to say that even though CFAR got maybe "less far" than hoped for, in my view it actually got quite far.

I agree CFAR accomplished some real, good things.  I'd be curious to compare our lists (and the list of whoever else wants to weigh in) as to where CFAR got.

On my best guess, CFAR's positive accomplishments include:

  • Learning to run workshops where people often "wake up" and are more conscious/alive/able-to-reflect-and-choose, for at least ~4 days or so and often also for a several-month aftermath to a lesser extent;
  • Helping a bunch of peo
... (read more)

"Learning to run workshops where people often "wake up" and are more conscious/alive/able-to-reflect-and-choose, for at least ~4 days or so and often also for a several-month aftermath to a lesser extent" 

I permanently upgraded my sense of agency as a result of CFAR workshops. Wouldn't be surprised if this happened to others too. Would be surprised if it happened to most CFAR participants. 


I think CFAR's effects are pretty difficult to see and measure. I think this is the case for most interventions? 

I feel like the best things CFAR did we... (read more)

What's anti-crux?

[wrote these points before reading your list]

1. CFAR managed to create a workshop which is, in my view, reasonably balanced - and subsequently beneficial for most people.

In my view, one of the main problems with “teaching rationality” is people’s minds often have parts which are “broken” in a compatible way, making the whole work. My goto example is “planning fallacy” and “hyperbolic discounting”: because in decision making, typically only a product term of both appears, they can largely cancel out, and practical decisions of someone exhibiting both biases... (read more)

The even broader context is Bay area is currently the best place in the world for production of memeplexes, influence-seeking patterns, getting money for persuasion, etc., which implies it is likely a great place where world would benefit from someone teaching rationality, but maybe not the best place for developing the skills.

Thanks for mentioning this.  I think this had a big effect.

Fair enough.  FWIW, I found the movie good / full of useful anecdata for piecing together a puzzle that I personally care a lot about, and so found it rewarded my four hours, but our interests are probably pretty different and I know plenty who would find it empty and annoying.

On reflection, I shouldn’t have written my paragraph the way I did in my parent comment; I am not sure what trouble something-like-every self-help thingy has run into, I just suspect there’re threads in common based on how things look.  I might be wrong about it.

Still, I wr... (read more)

I'd like to upvote reading Val's linked post, if someone's wondering whether to bother reading it and likes my opinions on things.

Sorry.  I don't have a good short description of the problem, and so did not try to say explicitly what I meant.  Instead I tried to refer to a 4-hour film, "Century of the self," as trying to describe the same problem.

I may come back later with an attempted description, probably not a good one.

Seconding gjm's reply, and wondering what can possibly be so difficult to talk about that even a 4-hour film can only be an introduction? I watched a few 20-second snippets scattered over its whole length (since this is an Adam Curtis film, that is all that is needed), and I am sceptical that the line that he draws through a century of history corresponds to a load-bearing rope in reality.

Thanks. I am, realistically, not going to watch four hours of propaganda (assuming your description of it is accurate!) in the hope of figuring out what you meant, so in the hope that you will come back and have at least a sketchy try at it I'll list my leading hypotheses so you have something concrete to point at and say "no, not that" about.

  • It turns out that actually it's incredibly difficult to improve any of the things that actually stop people fulfilling what it seems should be their potential; whatever is getting in the way isn't very fixable by trai
... (read more)

And a second afterthought:

I think for a long time CFAR was trying, though maybe not in a very smart/calibrated/wise/accurate way, to have public relationship with "the rationality community" along the lines of "we will attempt this project that you guys care about; and you guys may want to collaborate with us on that."  (Details varied by year; I think at the beginning something like this was more intended, accurate, and sincere, but after awhile it was more like accumulated branding we didn't mean but didn't update.)

I think at the moment we are not t... (read more)

I think the conclusion I take from it is ~"There's a bunch of individual people who were involved with CFAR still doing interesting stuff, but there is no such public organisation anymore in a meaningful sense (although shards of the organisation still help with AIRCS workshops); so you have to follow these individual people to find out what they're up to. Also, there is no concentration of force working towards a public accessible rationality curriculum anymore."

This seems about right to me personally, although as noted there is some network / concentrati... (read more)

Load More