Scott Garrabrant


Finite Factored Sets
Cartesian Frames
Fixed Points

Wiki Contributions


My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

MIRI can't seem to decide if it's an advocacy org or a research org.

MIRI is a research org. It is not an advocacy org. It is not even close. You can tell by the fact that it basically hasn't said anything for the last 4 years. Eliezer's personal twitter account does not make MIRI an advocacy org.

(I recognize this isn't addressing your actual point. I just found the frame frustrating.)

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

So I think my orientation on seeking out disagreement is roughly as follows. (This is going to be a rant I write in the middle of the night, so might be a little incoherent.)

There are two distinct tasks: 1)Generating new useful hypotheses/tools, and 2)Selecting between existing hypotheses/filtering out bad hypotheses.

There are a bunch of things that make people good at both these tasks simultaneously. Further, each of these tasks is partially helpful for doing the other. However, I still think of them as mostly distinct tasks. 

I think skill at these tasks is correlated in general, but possibly anti-correlated after you filter on enough g correlates, in spite of the fact that they are each common subtasks of the other. 

I don't think this (anti-correlated given g) very confidently, but I do think it is good to track your own and others skill in the two tasks separately, because it is possible to have very different scores (and because of side effects of judging generators on reliability might make them less generative as a result of being afraid of being wrong, and similarly vise versa.)

I think that seeking out disagreement is especially useful for the selection task, and less useful for the generation task. I think that echo chambers are especially harmful for the selection task, but can sometimes be useful for the generation task. Working with someone who agrees with you on a bunch of stuff and shares your ontology allows you to build deeply faster. Someone with a lot of disagreement with you can cause you to get stuck on the basics and not get anywhere. (Sometimes disagreement can also be actively helpful for generation, but it is definitely not always helpful.)

I spend something like 90+% of my research time focused on the generation task. Sometimes I think my colleagues are seeing something that I am missing, and I seek out disagreement, so that I can get a new perspective, but the goal is to get a slightly different perspective on the thing I am working on, and not on really filtering based on which view is more true. I also sometimes do things like double-crux with people with fairly different world views, but even there, it feels like the goal is to collect new ways to think, rather than to change my mind. I think that for this task a small amount of focusing on people who disagree with you is pretty helpful, but even then, I think I get the most out of people who disagree with me a little bit, because I am more likely to be able to actually pick something up. Further, my focus is not really on actually understanding the other person, I just want to find new ways to think, so I will often translate things to something near by my ontology, and thus learn a lot, but still not be able to pass an ideological Turing test.

On the other hand, when you are not trying to find new stuff, but instead e.g. evaluate various different hypotheses about AI timelines, I think it is very important to try to understand views that are very far from your own, and take steps to avoid echo chamber effects. It is important to understand the view, the way the other person understands it, not just the way that conveniently fits with your ontology. This is my guess at the relevant skills, but I do not actually identify as especially good at this task. I am much better at generation, and I do a lot of outside-view style thinking here.

However, I think that currently, AI safety disagreements are not about two people having mostly the same ontology and disagreeing on some important variables, but rather trying to communicate across very different ontologies. This means that we have to build bridges, and the skills start to look more like generation skill. It doesn't help to just say, "Oh, this other person thinks I am wrong, I should be less confident." You actually have to turn that into something more productive, which means building new concepts, and a new ontology in which the views can productively dialogue. Actually talking to the person you are trying to bridge to is useful, but I think so is retreating to your echo chamber, and trying to make progress on just becoming less confused yourself.

For me, there is a handful of people who I think of as having very different views from me on AI safety, but are still close enough that I feel like I can understand them at all. When I think about how to communicate, I mostly think about bridging the gap to these people (which already feels like and impossibly hard task), and not as much the people that are really far away. Most of these people I would describe as sharing the philosophical stance I said MIRI selects for, but probably not all.

If I were focusing on resolving strategic disagreements, I would try to interact a lot more than I currently do with people who disagree with me. Currently, I am choosing to focus more on just trying to figure out how minds work in theory, which means I only interact with people who disagree with me a little. (Indeed, I currently also only interact with people who agree with me a little bit, and so am usually in an especially strong echo chamber, which is my own head.)

However, I feel pretty doomy about my current path, and might soon go back to trying to figure out what I should do, which means trying to leave the echo chamber. Often when I do this, I neither produce anything great nor change my mind, and eventually give up and go back to doing the doomy thing where at least I make some progress (at the task of figuring out how minds work in theory, which may or may not end up translating to AI safety at all).

Basically, I already do quite a bit of the "Here are a bunch of people who are about as smart as I am, and have thought about this a bunch, and have a whole bunch of views that differ from me and from each other. I should be not that confident" (although I should often take actions that are indistinguishable from confidence, since that is how you work with your inside view.) But learning from disagreements more than that is just really hard, and I don't know how to do it, and I don't think spending more time with them fixes it on its own. I think this would be my top priority if I had a strategy I was optimistic about, but I don't, and so instead, I am trying to figure out how minds work, which seems like it might be useful for a bunch of different paths. (I feel like I have some learned helplessness here, but I think everyone else (not just MIRI) is also failing to learn (new ontologies, rather than just noticing mistakes) from disagreements, which makes me think it is actually pretty hard.)

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

Not sure I follow. It seems to me that the position you're pushing, that learning from people who disagree is prohibitively costly, is the one that goes with learned helplessness. ("We've tried it before, we encountered inferential distances, we gave up.")


I believe they are saying that cheering for seeking out disagreement is learned helplessness as opposed to doing a cost-benefit analysis about seeking out disagreement. I am not sure I get that part either. 

I was also confused reading the comment, thinking that maybe they copied the wrong paragraph, and meant the 2nd paragraph.

I am interested in the fact that you find the comment so cult-y though, because I didn't pick that up.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

Note that I think the form of inferential distance is often about trying to communicate across different ontologies. Sometimes a person will even correctly get the arguments of their discussion partner to the point where they can internally inhabit that point of view, but it is still hard to get the argument to dialogue productively with your other views because the two viewpoints have such different ontologies.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

Interesting. I just went and looked at some old survey results hoping I would find a question like this one. I did not find a similar question. (The lack of a question about this is itself evidence against my theory.)

(Agreement among less wrongers is not that crux-y for my belief that it is both a natural cluster and is highly selected for at MIRI, but I am still interested about the question about LW)

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

I notice I like "you are an algorithm" better than "you are a computation", since "computation" feels like it could point to a specific instantiation of an algorithm, and I think that algorithm as opposed to instantiation of an algorithm is an important part of it.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

I agree that the phrase "taking seriously the idea that you are a computation" does not directly point at the cluster, but I still think it is a natural cluster. I think that computational neuroscience is in fact high up on the list of things I expect less wrongers to be interested in. To the extent that they are not as interested in it as other things, I think it is because it is too hard to actually get much that feels like algorithmic structure from neuroscience.

I think that the interest in anthropics is related to the fact that computations are the kind of thing that can be multiply instantiated. I think logic is a computational-like model of epistemics. I think that haskell is not really that much about this philosophy, and is more about mathematical elegance. (I think that liking elegance/simplicity is mostly different from the "I am a computation" philosophy, and is also selected for at MIRI.)

I think that a lot of the sequences (including the first and third and fourth posts in your list) are about thinking about the computation that you are running in contrast and relation to an ideal (AIXI-like) computation.

I think that That alien message is directly about getting the reader to imagine being a subprocess inside an AI, and thinking about what they would do in that situation.

I think that the politics post is not that representative of the sequences, and it bubbled to the top by karma because politics gets lots of votes.

(It does feel a little like I am justifying the connection in a way that could be used to justify false connections. I still believe that there is a cluster very roughly described as "taking seriously the idea that you are a computation" that is a natural class of ideas that is the heart of the sequences)

I think the vast majority of people who bounce off the sequences do so either because it's too longwinded or they don't like Eliezer's writing style. I predict that if you ask someone involved in trying to popularize the sequences, they will agree.

I agree, but I think that the majority of people who love the sequences do so because they deeply share this philosophical stance, and don't find it much elsewhere, more so than because they e.g. find a bunch of advice in it that actually works for them.

I think the effect you describe is also part of why people like the sequences, but I think that a stronger effect is that there are a bunch of people who had a certain class of thoughts prior to reading the sequences, didn't see thoughts of this type before finding LessWrong, and then saw these thoughts in sequences. (I especially believe this about the kind of people who get hired at MIRI.) Prior to the sequences, they were intellectually lonely in not having people to talk to that shared this philosophical stance, that is a large part of their worldview.

I view the sequences as a collection of thoughts similar to things that I was already thinking, that was then used as a flag to connect me with people who were also already thinking the same things, more so than something that taught me a bunch of stuff. I predict a large portion of karma-weighted lesswrongers will say the same thing. (This isn't inconsistent with your theory, but I think would be evidence of mine.)

My theory about why people like the sequences is very entangled with the philosophical stance actually being a natural cluster, and thus something that many different people would have independently.

I think that MIRI selects for the kind of person who likes the sequences, which under my theory is a philosophical stance related to being a computation, and under your theory seems entangled with little mental resistance to (some kinds of) narratives.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

I don't want to speak for/about MIRI here, but I think that I personally do the "patting each other on the back for how right we all are" more than I endorse doing it. I think the "we" is less likely to be MIRI, and more likely to be a larger group that includes people like Paul.

I agree that it would be really really great if MIRI can interact with and learn from different views. I think mostly everyone agrees with this, and has tried, and in practice, we keep hitting "inferential distance" shaped walls, and become discouraged, and (partially) give up. To be clear, there are a lot of people/ideas where I interact with them and conclude "There probably isn't much for me to learn here," but there are also a lot of people/ideas where I interact with them and become sad because I think there is something for me to learn there, and communicating across different ontologies is very hard.

I agree with your bullet points descriptively, but they are not exhaustive.

I agree that MIRI has strong (statistical) bias towards things that were invented internally. It is currently not clear to me how much of this statistical bias is also a mistake vs the correct reaction to how much internally invented things seem to fit our needs, and how hard it is to find the good stuff that exists externally when it exists. (I think there a lot of great ideas out there that I really wish I had, but I dont have a great method for filtering for in in the sea of irrelevant stuff.)

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

If that's the case, selecting for people with the described philosophical stance + mathematical taste could basically be selecting for "people with little resistance to MIRI's organizational narrative"


So, I do think that MIRI hiring does select for people with "little resistance to MIRI's organizational narrative," through the channel of "You have less mental resistance to narratives you agree with" and "You are more likely to work for an organization when you agree with their narrative." 

I think that additionally people have a score on "mental resistance to organizational narratives" in general, and was arguing that MIRI does not select against this property (very strongly). (Indeed, I think they select for it, but not as strongly as they select for philosophy). I think that when the OP was thinking about how much to trust her own judgement, this is the more relevant variable, and the variable they were referring to.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

It sounds like you're saying that at MIRI, you approximate a potential hire's philosophical competence by checking to see how much they agree with you on philosophy. That doesn't seem great for group epistemics?


I did not mean to imply that MIRI does this any more than e.g. philosophy academia. 

When you don't have sufficient objective things to use to judge competence, you end up having to use agreement as a proxy for competence. This is because when you understand a mistake, you can filter for people who do not make that mistake, but when you do not understand a mistake you are making, it is hard to filter for people that do not make that mistake. 

Sometimes, you interact with someone who disagrees with you, and you talk to them, and you learn that you were making a mistake that they did not make, and this is a very good sign for competence, but you can only really get this positive signal about as often as you change your mind, which isn't often.

Sometimes, you can also disagree with someone, and see that their position is internally consistent, which is another way you can observe some competence without agreement.

I think that personally, I use a proxy that is something like "How much do I feel like I learn(/like where my mind goes) when I am talking to the person," which I think selects for some philosophical agreement (their concepts are not so far from my own that I can't translate), but also some philosophical disagreement (their concepts are better than my own at making at least one thing less confusing). (This condition does not feel necessary for me. I feel like having a coherent plan is also a great sign, even if I do not feel like I learn when I am talking to the person.)

Load More