An epistemic advantage of working as a moderate

[-]Eliezer Yudkowsky3mo8867

What's your version of the story for how the "moderates" at OpenPhil ended up believing stuff even others can now see to be fucking nuts in retrospect and which "extremists" called out at the time, like "bio anchoring" in 2021 putting AGI in median fucking 2050, or Carlsmith's Multiple Stage Fallacy risk estimate of 5% that involved only an 80% chance anyone would even try to build agentic AI?

Were they no true moderates? How could anyone tell the difference in advance?

From my perspective, the story is that "moderates" are selected to believe nice-sounding moderate things, and Reality is off doing something else because it doesn't care about fitting in the same way. People who try to think like reality are then termed "extremist", because they don't fit into the nice consensus of people hanging out together and being agreeable about nonsense. Others may of course end up extremists for other reasons. It's not that everyone extreme is reality-driven, but that everyone who is getting pushed around by reality (instead of pleasant hanging-out forces like "AGI in 2050, 5% risk" as sounded very moderate to moderates before the ChatGPT Moment) ends up departing from ... (read more)

[-]aphyer3mo3731

You can be a moderate by believing only moderate things. Or you can be a moderate by adopting moderate strategies. These are not necessarily the same thing.

This piece seems to be mostly advocating for the benefits of moderate strategies.

Your reply seems to mostly be criticizing moderate beliefs.

(My political beliefs are a ridiculous assortment of things, many of them outside the Overton window. If someone tells me their political beliefs are all moderate, I suspect them of being a sheep.

But my political strategies are moderate: I have voted for various parties' candidates at various times, depending on who seems worse lately. This seems...strategically correct to me?)

3Vaniver3mo

How does one end up with moderate beliefs without relying on moderate strategies? (In less pressured fields, I could imagine this happening as a matter of course, but I am surprised if someone follows a reality-hugging strategy in AI and ends up believing 'moderate things'.)

[-]ryan_greenblatt3mo333

Carlsmith's Multiple Stage Fallacy risk estimate of 5% that involved only an 80% chance anyone would even try to build agentic AI?

This is false as stated. The report says:

The corresponding footnote 179 is:

As a reminder, APS systems are ones with: (a) Advanced capability: they outperform the best humans on some set of tasks which when performed at an advanced level grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, hacking, and social persuasion/manipulation); (b) Agentic planning: they make and execute plans, in pursuit of objectives, on the basis of models of the world; and (c) Strategic awareness: the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining different forms of power over humans and the real-world environment.

Strong incentives isn't the same as "anyone would try to build" and "agentic AI" isn't the same as APS systems (which has a much more specific and stronger definition!).

I'd personally put more like 90% on the claim (and it might depend a lot on what you mean by strong incentives).

To be clear, I agree with the claim that Carlsmith's r... (read more)

[-]Eliezer Yudkowsky3mo513

I accept your correction and Buck's as to these simple facts (was posting from mobile).

[-]habryka3mo1911

I'd personally put more like 90% on the claim (and it might depend a lot on what you mean by strong incentives).

Are you talking in-retrospect? If you currently assign only 90% to this claim, I would be very happy to take your money (I would say a reasonable definition that I think Joe would have accepted at the time that we would be dealing with more than $1 billion in annual expenditure towards this goal).

I... actually have trouble imagining any definition that isn't already met, as people are clearly trying to do this right now. But like, still happy to take your money if you want to bet and ask some third-party to adjudicate.

3ryan_greenblatt3mo

I wasn't talking in retrospect, but I meant something might larger than $1 billion by strong incentives and I really mean very specifically APS systems at the time when they are feasible to build. The 10% would come from other approaches/architectures ending up being surprisingly better at the point when people could build APS systems. (E.g., you don't need your AIs to have "the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining different forms of power over humans and the real-world environment".) In further consideration I might be more like 95%, but 90% doesn't seem crazy to me depending on the details of the operationalization. I would put very high probabilities on "people would pay >$50 billion for a strong APS system right now", so we presumably agree on that. It's really key to my perspective here that by the time that we can build APS systems, maybe something else which narrowly doesn't meet this definition has come around and looks more competitive. There is something messy here because maybe there are strong incentives to build APS systems eventually, but this occurs substantially after full automation of the whole economy or similar by other systems. I was trying to exclude cases where substantially after human intellectual labor is totally obsolete, APS systems are strongly incentivized as this case is pretty different. (And other factors like "maybe we'll have radical superbabies before we can build APS systems" factor in too, though again this is very sensitive to operationalization.)

4Ben Pace3mo

Pretty surprising that the paper doesn't give much indication to what counts as "strong incentives" (or at least not that I could find after searching for 2 mins).

[-]ryan_greenblatt3mo2312

Note that this post is arguing that there are some specific epistemic advantages of working as a moderate, not that moderates are always correct or that there aren't epistemic disadvantages to being a moderate. I don't think "there exist moderates which seem very incorrect to me" is a valid response to the post similarly to how "there exist radicals which seem very incorrect to me" wouldn't be a valid argument for the post.

This is independent from the point Buck notes that the label moderate as defined in the post doesn't apply in 2020.

[-]johnswentworth3mo5730

As a response to the literal comment at top-of-thread, this is clearly reasonable. But I think Eliezer is correctly invoking some important subtext here, which your comment doesn't properly answer. (I think this because I often make a similar move to the one Eliezer is making, and have only understood within the past couple years what's load-bearing about it.)

Specifically, there's an important difference between:

"<person> was wrong about <argument/prediction/etc>, so we should update downward on deferring to their arguments/predictions/etc", vs
"<person> was wrong about <argument/prediction/etc>, in a way which seems blindingly obvious when we actually think about it, and so is strong evidence that <person> has some systematic problem in the methods they're using to think (as opposed to just being unlucky with this one argument/prediction/etc)"

Eliezer isn't saying the first one, he's saying the second one, and then following it up with a specific model of what is wrong with the thinking-methods in question. He's invoking bio anchors and that Carlsmith report as examples of systematically terrible thinking, i.e. thinking which is in some sense "obviously... (read more)

[-]Eli Tyre3mo16-4

"<person> was wrong about <argument/prediction/etc>, in a way which seems blindingly obvious when we actually think about it, and so is strong evidence that <person> has some systematic problem in the methods they're using to think (as opposed to just being unlucky with this one argument/prediction/etc)"

I think "this was blindingly obvious when we actually think about it", is not socially admissible evidence, because of hindsight bias.

I thought about a lot of this stuff before 2020. For the most part, I didn't reach definitive conclusions about a lot of it. In retrospect, a lot of the the conclusions that I did provisionally accept, I, in retrospect, think I was overconfident about, given the epistemic warrant.

Was I doing "actual thing"? No, probably not, or at least not by many relevant standards. Could I have done better? Surely, but not by recourse to magical "just think better" cognition.

The fact remains that It Was Not Obvious To Me.

Others may claim that it was obvious to them, and they might be right—maybe it was obvious to them.

If a person declared operationalized-enough-to-be-gradable prediction before the event was settled, well, then I can u... (read more)

[-]johnswentworth3mo2816

Feels like there's some kind of frame-error here, like you're complaining that the move in question isn't using a particular interface, but the move isn't intended to use that interface in the first place? Can't quite put my finger on it, but I'll try to gesture in the right direction.

Consider ye olde philosophers who liked to throw around syllogisms. You and I can look at many of those syllogisms and be like "that's cute and clever and does not bind to reality at all, that's not how real-thinking works". But if we'd been around at the time, very plausibly we would not be able to recognize the failure; maybe we would not have been able to predict in advance that many of the philosophers' clever syllogisms totally fail to bind to reality.

Nonetheless, it is still useful and instructive to look at those syllogisms and say "look, these things obviously-in-some-sense do not bind to reality, they are not real-thinking, and therefore they are strong evidence that there is something systematically wrong with the thinking-methods of those philosophers". (Eliezer would probably reflexively follow that up with "so I should figure out what systematic thinking errors plagued those seemingly-bri... (read more)

[-]Eli Tyre3mo166

So I guess maybe... Eliezer's imagined audience here is someone who has already noticed that bio anchors and the Carlsmith thing fail to bind to reality, but you're criticizing it for not instead responding to a hypothetical audience who thinks that the reports maybe do bind to reality?

I almost added a sentence at the end of my comment to the effect of...

"Either someone did that X was blindly obvious, in which case they don't need to be told, or it wasn't blindingly obvious to them, and they should should pay attention to the correct prediction, and ignore the assertion that it was obvious. In either case...the statement isn't doing anything?"

Who are statements like these for? Is it for the people who thought that things were obvious to find and identify each other?

To gesture at a concern I have (which I think is probably orthogonal to what you're pointing at):

On a first pass, the only people who might be influenced by statements like that are being influenced epistemically illegitimately.

Like, I'm imagining a person, Bob, who heard all the arguments at the time and did not feel confident enough to make a specific prediction. But then we all get to wait a fe... (read more)

[-]johnswentworth3mo223

Ok, I think one of the biggest disconnects here is that Eliezer is currently talking in hindsight about what we should learn from past events, and this is and should often be different from what most people could have learned at the time. Again, consider the syllogism example: just because you or I might have been fooled by it at the time does not mean we can't learn from the obvious-in-some-sense foolishness after the fact. The relevant kind of "obviousness" needs to include obviousness in hindsight for the move Eliezer is making to work, not necessarily obviousness in advance, though it does also need to "obvious" in advance in a different sense (more on that below).

Short handle: "It seems obvious in hindsight that <X> was foolish (not merely a sensible-but-incorrect prediction from insufficient data); why wasn't that obvious at the time, and what pattern do I need to be on the watch for to make it obvious in the future?"

Eliezer's application of that pattern to the case at hand goes:

It seems obvious-in-some-sense in hindsight that bio anchors and the Carlsmith thing were foolish, i.e. one can read them and go "man this does seem kind of silly".
Insofar as that wasn't obvious

... (read more)

6Thane Ruthenis3mo

That's my model here as well. Pseudo-formalizing it: We're not idealized agents, we're bounded agents, which means we can't actually do full Bayesian updates. We have to pick and choose what computations we run, what classes of evidence we look for and update on. In hindsight, we may discover that an incorrect prediction was caused by ours opting not to spend the resources on updating on some specific information, such that if we knew to do that, we would have reliably avoided the error even while having all the same object-level information. In other words, it's a Bayesian update to the distribution over Bayesian updates we should run. We discover a thing about (human) reasoning: that there's a specific reasoning error/oversight we're prone to, and that we have to run an update on the output of "am I making this reasoning error?" in specific situations. This doesn't necessarily mean that this meta-level error would have been obvious to anyone in the world at all, at the time it was made. Nowadays, we all may be committing fallacies whose very definitions require agent-foundations theory decades ahead of ours; fallacies whose definitions we wouldn't even understand without reading a future textbook. But it does mean that specific object-level conclusions we're reaching today would be obviously incorrect to someone who is reasoning in a more correct way.

6Richard_Ngo3mo

If someone predicts in advance that something is obviously false, and then you come to believe that it's false, then you should update not just towards thought processes which would have predicted that the thing is false, but also towards thought processes which would have predicted that the thing is obviously false. (Conversely, if they predict that it's obviously false, and it turns out to be true, you should update more strongly against their thought processes than if they'd just predicted it was false.) IIRC Eliezer's objection to bioanchors can be reasonably interpreted as an advance prediction that "it's obviously false", though to be confident I'd need to reread his original post (which I can't be bothered to do right now).

2Veedrac3mo

I think this is wrong. The scenarios where this outcome was easily predicted given the right heuristics and the scenarios where this was surprising to every side of the debate are quite different. Knowing who had predictors that worked in this scenario is useful evidence, especially when the debate was about which frames for thinking about things and selecting heuristics were useful. Or, to put this in simpler but somewhat imprecise terms: This was not obvious to you because you were thinking about things the wrong way. You didn't know which way to think about things at the time because you lacked information about which predicted things better. You now have evidence about which ways work better, and can copy heuristics from people who were less surprised.

[-]Richard_Ngo3mo2011

The argument "there are specific epistemic advantages of working as a moderate" isn't just a claim about categories that everyone agrees exist, it's also a way of carving up the world. However, you can carve up the world in very misleading ways depending on how you lump different groups together. For example, if a post distinguished "people without crazy-sounding beliefs" from "people with crazy-sounding beliefs", the latter category would lump together truth-seeking nonconformists with actual crazy people. There's no easy way of figuring out which categories should be treated as useful vs useless but the evidence Eliezer cites does seem relevant.

On a more object level, my main critique of the post is that almost all of the bullet points are even more true of, say, working as a physicist. And so structurally speaking I don't know how to distinguish this post from one arguing "one advantage of looking for my keys closer to a streetlight is that there's more light!" I.e. it's hard to know the extent to which these benefits come specifically from focusing on less important things, and therefore are illusory, versus the extent to which you can decouple these benefits from the costs of being a "moderate".

2ryan_greenblatt3mo

But (in the language of the post) both moderates and radicals are working in the epistemic domain not some unrelated domain. It's not that moderates and radicals are trying to answer different questions (and the questions moderates are answering are epistemically easier like physics). There are some differences in the most relevant questions, but I don't think this is a massive effect.

[-]Richard_Ngo3mo1210

It's not that moderates and radicals are trying to answer different questions (and the questions moderates are answering are epistemically easier like physics).

That seems totally wrong. Moderates are trying to answer questions like "what are some relatively cheap interventions that AI companies could implement to reduce risk assuming a low budget?" and "how can I cause AI companies to marginally increase that budget?" These questions are very different from—and much easier than—the ones the radicals are trying to answer, like "how can we radically change the governance of AI to prevent x-risk?"

6ryan_greenblatt3mo

Hmm, I think what I said was about half wrong and I want to retract my point. That said, I think much of the relevant questions are overlapping (like, "how do we expect the future to generally go?", "why/how is AI risky?", "how fast will algorithmic progress go at various points?) and I interpret this post as just talking about the effect on epistemics around the overlapping questions (regardless of whether you'd expect moderates to mostly be working in domains with better feedback loops). This isn't that relevant for your main point, but I also think the biggest question for radicals in practice is mostly: How can we generate massive public/government support for radical action on AI?

8Ben Pace3mo

It might not be disproof, but it would seem very relevant for readers to be aware of major failings of prominent moderates in the current environment e.g. when making choices about what strategies to enact or trust. (Probably you already agree with this.)

[-]ryan_greenblatt3mo167

I agree with this in principle, but think that doing a good job of noting major failings of prominent moderates in the current environment would look very different than Eliezer's comment and requires something stronger than just giving examples of some moderates which seem incorrect to Eliezer.

Another way to put this is that I think citing a small number of anecdotes in defense of a broader world view is a dangerous thing to do and not attaching this to the argument in the post is even more dangerous. I think it's more dangerous when the description of the anecdotes is sneering and misleading. So, when using this epistemically dangerous tool, I think there is a higher burden of doing a good job which isn't done here.

On the specifics here, I think Carlsmith's report is unrepresentative for a bunch of reasons. I think Bioanchors is representative (though I don't think it looks fucking nuts in retrospect).

This is putting aside the fact that this doesn't engage with the arguments in the post at all beyond effectively reacting to the title.

[-]Buck3mo2117

The bioanchors post was released in 2020. I really wish that you bothered to get basic facts right when being so derisive about people's work.

I also think it's bad manners for you to criticize other people for making clear predictions given that you didn't make such predictions publicly yourself.

[-]habryka3mo2413

I also think it's bad manners for you to criticize other people for making clear predictions given that you didn't make such predictions publicly yourself.

I generally agree with some critique in the space, but I think Eliezer went on the record pretty clearly thinking that the bio-anchors report had timelines that were quite a bit too long:

Eliezer: I consider naming particular years to be a cognitively harmful sort of activity; I have refrained from trying to translate my brain's native intuitions about this into probabilities, for fear that my verbalized probabilities will be stupider than my intuitions if I try to put weight on them. What feelings I do have, I worry may be unwise to voice; AGI timelines, in my own experience, are not great for one's mental health, and I worry that other people seem to have weaker immune systems than even my own. But I suppose I cannot but acknowledge that my outward behavior seems to reveal a distribution whose median seems to fall well before 2050.

I think in many cases such a critique would be justified, but like, IDK, I feel like in this case Eliezer has pretty clearly said things about his timelines expectations that co... (read more)

2Martin Randall2mo

Relevant links: * Draft report on AI Timelines - Cotra 2020-09-18 * Biology-Inspired Timelines - The Trick that Never Works - Yudkowsky 2021-12-01 * Reply to Eliezer on Biological Anchors - Harnofsky 2021-12-23 Let's suppose that your read is exactly right, and Yudkowsky in 2021 was predicting median 2040. You have surely spent more time with him than me. Bioanchors predicted ~25% cumulative probability by 2040. A 25% vs 50% disagreement in the world of AI timeline prediction is approximately nothing. What's your read of why Yudkowsky is claiming that "median fucking 2050" is "fucking nuts in retrospect", without also admitting that his implicit prediction of median 2040 was almost as nuts? This is the second time this year that I've read Yudkowsky attacking the Bioanchors 2050 figure without mentioning that it had crazy wide error bars. This month I also read "If Anyone Builds It Everyone Dies" which repeats the message of "The Trick that Never Works" that forecasting timelines is really difficult and not important for the overall thesis. I preferred that Yudkowsky to this one. EDIT: retracting because I don't actually want a response to these questions, I'm just cross.

2habryka2mo

I don't super get this comment. I don't agree with Eliezer calling the other prediction "fucking nuts". I was just replying to the statement that Eliezer did not make predictions here himself, which he did do.

[-]Adele Lopez3mo17-3

FWIW, I think it is correct for Eliezer to be derisive about these works, instead of just politely disagreeing.

Long story short, derision is an important negative signal that something should not be cooperated with. Couching words politely is inherently a weakening of that signal. See here for more details of my model.

I do know that this is beside the point you're making, but it feels to me like there is some resentment about that derision here.

8Lukas Finnveden3mo

If that's a claim that Eliezer wants to make (I'm not sure if it is!) I think he should make it explicitly and ideally argue for it. Even just making it more explicit what the claim is would allow others to counter-argue the claim, rather than leaving it implicit and unargued.[1] I think it's dangerous for people to defer to Eliezer about whether or not it's worth engaging with people who disagree with him, which limits the usefulness of claims without arguments. Also, aside on the general dynamics here. (Not commenting on Eliezer in particular.) You say "derision is an important negative signal that something should not be cooperated with". That's in the passive voice, more accurate would be "derision is an important negative signal where the speaker warns the listener to not cooperate with the target of derision". That's consistent with "the speaker cares about the listener and warns the listener that the target isn't useful for the listener to cooperate with". But it's also consistent with e.g. "it would be in the speakers interest for the listener to not cooperate with the target, and the speaker is warning the listener that the speaker might deride/punish/exclude the listener if they cooperate with the target". General derision mixes together all these signals, and some of them are decidedly anti-epistemic. 1. ^ For example, if the claim is "these people aren't worth engaging with", I think there are pretty good counter-arguments even before you start digging into the object-level: The people having a track record of being willing to publicly engage on the topics of debate, of being willing to publicly change their mind, of being open enough to differing views to give MIRI millions of dollars back when MIRI was more cash-constrained than they are now, and understanding points that Eliezer think are important better than most people Eliezer actually spends time arguing with. To be clear, I don't particularly think that Eliezer does want to m

5habryka3mo

He has explicitly argued for it! He has written like a 10,000 word essay with lots of detailed critique: https://www.lesswrong.com/posts/ax695frGJEzGxFBK4/biology-inspired-agi-timelines-the-trick-that-never-works

9Lukas Finnveden3mo

Adele: "Long story short, derision is an important negative signal that something should not be cooperated with" Lukas: "If that's a claim that Eliezer wants to make (I'm not sure if it is!) I think he should make it explicitly and ideally argue for it." Habryka: "He has explicitly argued for it" What version of the claim "something should not be cooperated with" is present + argued-for in that post? I thought that post was about the object level. (Which IMO seems like a better thing to argue about. I was just responding to Adele's comment.)

2Adele Lopez3mo

I don't think he is (nor should be) signaling that engaging with people who disagree is not worth it! Acknowledged that that is more accurate. I do not dispute that that people misuse derision and other status signals in lots of ways, but I think that this is more-or-less just a subtler form of lying/deception or coercion and not something inherently wrong with status. That is, I do not think you can have the same epistemic effect without being derisive in certain cases. Not that all derision is a good signal.

4Lukas Finnveden3mo

Ok. If you think it's correct for Eliezer to be derisive, because he's communicating the valuable information that something shouldn't be "cooperated with", can you say more specifically what that means? "Not engage" was speculation on my part, because that seemed like a salient way to not be cooperative in an epistemic conflict.

2Adele Lopez3mo

My read is that the cooperation he is against is with the narrative that AI-risk is not that important (because it's too far away or weird or whatever). This indeed influences which sorts of agencies get funded, which is a key thing he is upset about here. On the other hand, engaging with the arguments is cooperation at shared epistemics, which I'm sure he's happy to coordinate with. Also, I think that if he thought that the arguments in question were coming from a genuine epistemic disagreement (and not motivated cognition of some form), he would (correctly) be less derisive. There is much more to be gained (in expectation) from engaging with an intellectually honest opponent than one with a bottom line.

[-]Lukas Finnveden3mo113

My read is that the cooperation he is against is with the narrative that AI-risk is not that important (because it's too far away or whatever). This indeed influences which sorts of agencies get funded, which is a key thing he is upset about here.

Hm, I still don't really understand what it means to be [against cooperation with the narrative that AI risk is not that important]. Beyond just believing that AI risk is important and acting accordingly. (A position that seems easy to state explicitly.)

Also: The people whose work is being derided definitely don't agree with the narrative that "AI risk is not that important". (They are and were working full-time to reduce AI risk because they think it's extremely important.) If the derisiveness is being read as a signal that "AI risk is important" is a point of contention, then the derisiveness is misinforming people. Or if the derisiveness was supposed to communicate especially strong disapproval of any (mistaken) views that would directionally suggest that AI risk is less important than the author thinks: then that would just seems like soldier mindset (more harshly critizing views that push in directions you don't like, holding goodness-of-the-argument constant), which seems much more likely to muddy the epistemic waters than to send important signals.

4Adele Lopez3mo

Yeah, those are good points... I think there is a conflict with the overall structure I'm describing, but I'm not modeling the details well apparently. Thank you!

5StanislavKrym3mo

Except that Yudkowsky had actually made the predictions in public. However, he didn't know in advance that the AIs would be trained as neural networks that are OOMs less efficient at keeping context[1] in mind. Other potential mispredictions are Yudkowsky's cases for the possibility to greatly increase the capabilities starting from a human brain simulation[2] or to simulate a human brain working ~6 OOMs faster: Yudkowsky's case for a superfast human brain T hefastest observed neurons fire 1000 times per second; the fastest axon fibers con duct signals at 150 meters/second, a half-millionth the speed of light; each synaptic op eration dissipates around 15,000 attojoules, which is more than a million times the ther modynamicminimumforirreversible computations at room temperature (kT300 ln(2) = 0003 attojoules per bit). It would be physically possible to build a brain that computed a million times as fast as a human brain, without shrinking the size, or running at lower temperatures, or invoking reversible computing or quantum computing. If a human mind were thus accelerated, a subjective year of thinking would be accomplished for ev ery 31 physical seconds in the outside world, and a millennium would fly by in eight and a half hours. Vinge (1993) referred to such sped-up minds as “weak superhumanity”: a mind that thinks like a human but much faster. However, as Turchin points out in his book[3] written in Russian, simulating a human brain requires[4] just 1e15 FLOP/second, or less than 1e22 FLOP/month. Turchin's argument in Russian Для создания ИИ необходимо, как минимум, наличие достаточно мощного компьютера. Сейчас самые мощные компьютеры имеют мощность порядка 1 петафлопа (10 операций с плавающей запятой в секунду). По некоторым оценкам, этого достаточно для эмуляции человеческого мозга, а значит, ИИ тоже мог бы работать на такой платформе. Сейчас такие компьютеры доступны только очень крупным организациям на ограниченное время. Однако закон Мура предполага

6Noosphere893mo

IMO, there's another major misprediction, and I'd argue that we don't even need LLMs to make it a misprediction, and this is the prediction that within a few days/weeks/months we go from AI that was almost totally incapable of intellectual work to AI that can overpower humanity. This comment also describes what I'm talking about: How takeoff used to be viewed as occuring in days, weeks or months from being a cow to being able to place ringworlds around stars: (Yes, the Village Idiot to Einstein post also emphasized the vastness of the space above us, which is what Adam Scholl claimed and I basically agree with this claim, the issue is that there's another claim that's also being made). The basic reason for this misprediction is as it turns out, human variability is pretty wide, and the fact that human brains are very similar is basically no evidence (I was being stupid about this in 2022): The range of human intelligence is wide, actually. And also, no domain has actually had a takeoff as fast as Eliezer Yudkowsky thought in either the Village Idiot to Einstein picture or his own predictions, but Ryan Greenblatt and David Matolcsi already made them, so I merely need to link them (1, 2, 3). Also, a side note is that I disagree with Jacob Cannell's post, and the reasons are that it's not actually valid to compare brain FLOPs to computer FLOPs in the way Jacob Cannell does: Why it's not valid to compare brain FLOPs to computer FLOPs in the way Jacob Cannell does, part 1 Why it's not valid to compare brain FLOPs to computer FLOPs in the way Jacob Cannell does, part 2 I generally expect it to be 4 OOMs at least better, which cashes out to at least 3e19 FLOPs per Joule: The limits of chip progress/physical compute in a small area assuming we are limited to irreversible computation (Yes, I'm doing a lot of linking because other people have already done the work, I just want to share the work rather than redo things all over again). @StanislavKrym I'm tagging y

[-]williawa3mo14-9

This is not a response to your central point, but I feel like you somewhat unfairly criticize EAs for stuff like bioanchors often. You often say stuff that makes it seem like bioanchors was released, all EAs bought it wholesale, bioanchors shows we can be confident AI won't arrive before 2040 or something, and thus all EAs were convinced we don't need to worry much about AI for a few decades.

But like, I consider myself and EA, I never put much weight on bioanchors. I read the report and found it interesting, I think its useful enough (mostly as a datapoint for other arguments you might make) that I don't think was a waste of time. But not much more than that. It didn't really change my views on what should be done. Or the likelihood of AGI being developed at which points in time except on the margins. I mean thats how most people I know read that report. But I feel like you accuse people involved of having far less humility and masking way stronger stronger claims than they are.

[-]Buck3mo1213

Notably, bioanchors doesn't say that we should be confident AI won't arrive before 2040! Here's Ajeya's distribution in the report (which was finished in about July 2020).

7williawa3mo

Yeah, to be clear, I don't think that, and I think most people didn't think that, but Eliezer has sometimes said stuff that made it seem like he thought people think that. I was remembering a quote from 2:49:00 at this podcast Indicating bioanchors make a stronger statement than it is, and that EAs are much more dogmatic about that report than most EAs are. Although to be fair, he did say probably here.

6Ben Pace3mo

Upvote-disagree. I think you’re missing an understanding of how influential it was in OpenPhil circles, and how politically controlling OpenPhil has been of EA.

1ryan_greenblatt3mo

This seems very wrong to me from my experience in 2022 (though maybe the situation was very different in 2021? Or maybe there is some other social circle that I wasn't exposed to which had these properties?).

2Ben Pace3mo

Which claim?

3ryan_greenblatt3mo

I think williawa's characterization of how people reacted to bioanchors basically matches my experience and I'm skeptical of the claim that OpenPhil was very politically controlling of EA with respect to timelines. And, I agree with the claim that that Eliezer often implies people interpreted bioanchors in some way they didn't. (I also think bioanchors looks pretty reasonable in retrospect, but this is a separate claim.)

[-]Ben Pace3mo3724

OpenPhil was on the board of CEA and fired it's Executive Director and to this day has never said why; it made demands about who was allowed to have power inside of the Atlas Fellowship and who was allowed to teach there; it would fund MIRI by 1/3rd the full amount for (explicitly stated) signaling reasons; in most cases it was not be open about why it would or wouldn't grant things (often even with grantees!) that left me just having to use my sense of 'fashion' to predict who would get grants and how much; I've heard rumors I put credence on that it wouldn't fund AI advocacy stuff in order to stay in the good books of the AI labs... there was really a lot of opaque politicking by OpenPhil, that would of course have a big effect on how people were comfortable behaving and thinking around AI!

It's silly to think that a politically controlling entity would have to punish ppl for stepping out of line with one particular thing, in order for people to conform on that particular thing. Many people will compliment a dictator's clothes even when he didn't specifically ask for that.

9Buck3mo

My core argument in this post isn't really relevant to anything that was happening in 2020, because people weren't really pushing on concrete changes to safety practices at AI companies yet.

[-]Ben Pace3mo*3634

My guess is still that calling folks “moderates” and “radicals” rather some more specific name (perhaps “marginalists” and “reformers”) is a mistake and fits naturally into conversations about things it seems you didn’t want to be talking about.

Relatedly, calling a group of people 'radicals' seems straightforwardly out-grouping to me, I can't think of a time where I would be happy to be labeled a radical, so it feels a bit like tilting the playing field.

[-]aysja3mo184

Agreed. Also, I think the word “radical” smuggles in assumptions about the risk, namely that it’s been overestimated. Like, I’d guess that few people would think of stopping AI as “radical” if it was widely agreed that it was about to kill everyone, regardless of how much immediate political change it required. Such that the term ends up connoting something like “an incorrect assessment of how bad the situation is.”

-5samuelshadrach3mo

[-]Ben Pace3mo6428

This is a good contribution (strong upvote), but this definition of 'moderate' also refers to not attempting to cause major changes within the company. Otherwise I think many of these points do not apply; within big companies if you want major change/reform you will often have to engage in some amount of coalitional politics, you will often have incentive to appear very confident, if your coalition is given a bunch of power then you often will not actually be forced to know a lot about a domain before you can start acting in it, etc.

I wonder if the distinction being drawn here is better captured by the names "Marginalists" vs "Revolutionaries".

[-]kaleb3mo197

In the leftist political sphere, this distinction is captured by the names "reformers" vs "revolutionaries", and the argument about which approach to take has been going on forever.

I believe it would be worthwhile for us to look at some of those arguments and see if previous thinkers have new (to us) perspectives that can be informative about AI safety approaches.

8DanielFilan3mo

Interestingly I think some of the points still apply in that setting: people in companies care about technical details so to be persuasive you will have to be familiar with them, and in general the points about the benefits of an informed audience seem like they're going to apply. My guess is there are two axes here: * technocratic vs democratic approaches (where technocratic approaches mean you're talking to people informed about details, while democratic approaches mean your audience is less informed about details but maybe has a better sense of wider impacts) * large-scale vs small-scale bids (where large-scale bids are maybe more likely to require ideologically diverse coalitions, while small-scale bids have less of an aggregate impact)

[-]Richard_Ngo3mo2924

people in companies care about technical details so to be persuasive you will have to be familiar with them

Big changes within companies are typically bottlenecked much more by coalitional politics than knowledge of technical details.

6DanielFilan3mo

Sure, but I bet that's because in fact people are usually attuned to the technical details. I imagine if you were really bad on the technical details, that would become a bigger bottleneck. [Epistemic status: I have never really worked at a big company and Richard has. I have been a PhD student at UC Berkeley but I don't think that counts]

[-]Richard_Ngo3mo2112

I think one effect you're missing is that the big changes are precisely the ones that tend to mostly rely on factors that are hard to specify important technical details about. E.g. "should we move our headquarters to London" or "should we replace the CEO" or "should we change our mission statement" are mostly going to be driven by coalitional politics + high-level intuitions and arguments. Whereas "should we do X training run or Y training run" are more amenable to technical discussion, but also have less lasting effects.

[-]Chris van Merwijk3mo1420

Do you not think it's a problem that big-picture decisions can be blocked by a kind of overly-strong demand for rigor from people who are used to mostly think about technical details?

I sometimes notice something roughly like the following dynamic:
1. Person A is trying to make a big-picture claim (e.g. that ASI could lead to extinction) that cannot be argued for purely in terms of robust technical details (since we don't have ASI yet to run experiments, and don't have a theory yet),
2. Person B is more used to think about technical details that allow you to make robust but way more limited conclusions.
3. B finds some detail in A's argument that is unjustified or isn't exactly right, or even just might be wrong.
4. A thinks the detail really won't change the conclusion, and thinks this just misses the point, but doesn't want to spend time, because getting all the details exactly right would take maybe a decade.
5. B concludes A doesn't know what they're talking about and continues ignoring the big picture question completely and keeps focusing on more limited questions.
6. The issue ends up ignored.

It seems to me that this dynamic is part of the coalitional politics and how the high-level arguments are received?

4Richard_Ngo3mo

Yes, that can be a problem. I'm not sure why you think that's in tension with my comment though.

1Chris van Merwijk3mo

I don't think it's *contradicting* it but I vaguely thought maybe it's in tension with: Because lack of knowledge of technical details by A ends up getting B to reject and oppose A. Mostly I wasn't trying to push against you though, and more trying to download part of your model on how important you think this is, out of curiosity, given your experience at OA.

2Noosphere893mo

A key crux is I don't generally agree with this claim in AI safety: In this specific instance, it could work, but in general I think ignoring details is a core failure mode of people that tend towards abstract/meta stuff, which is absolutely the case on Lesswrong. I think abstraction/meta/theoretical work is useful, but also that theory absolutely does require empirics to make sure you are focusing on the relevant parts of the problem. This especially is the case if you are focused on working on solutions, rather than trying to get attention on a problem. I'll just quote from Richard Ngo here, because he made the point shorter than I can (it's in a specific setting, but the general point holds):

3Chris van Merwijk3mo

But the problem is that we likely don't have time to flesh out all the details or do all the relevant experiments before it might be too late, and governments need to understand that based on arguments that therefore cannot possibly rely on everything being fleshed out. Of course I want people to gather as much important empirical evidence and concrete detailed theory as possible asap. Also, the pre-everything-worked-out-in-detail arguments also need to inform which experiments are done, and so that is why people who have actually listened to those pre-detailed arguments end up on average doing much more relevant empirical work IMO.

1Throw Fence3mo

This comment articulates the main thought I was having reading this post. I wonder how Buck is avoiding this very trap, and if there is any hope at all of the Moderate strategy overcoming this problem?

4DanielFilan3mo

I guess there's also "do you expect to have enough time with your audience for you to develop an argument and them to notice flaws", which I think correlates with technocratic vs democratic but isn't the same.

[-]eggsyntax3mo5756

I often hear people say that moderates end up too friendly to AI companies due to working with people from AI companies. I agree, but I think that working as a moderate has a huge advantage for your epistemics.

I think that this friendliness has its own very large epistemic effects. The better you know people, the more time you spend with them, and the friendlier you are with them, the more cognitive dissonance you have to overcome in order to see them as doing bad things, and especially to see them as bad people (in the sense of their actions being net harmful for the world). This seems like the most fundamental force behind regulatory capture (although of course there are other factors there like the prospect of later getting industry jobs).

You may be meaning to implicitly recognize this dynamic in the quote above; it's not clear to me either way. But I think it's worth explicitly calling out as having a strong countervailing epistemic impact. I'm sure it varies significantly across people (maybe it's roughly proportional to agreeability in the OCEAN sense?), and it may not have a large impact on you personally, but for people to weigh the epistemic value of working as a moderate, it's important that they consider this effect.

6Three-Monkey Mind3mo

A related phenomenon: Right-leaning Supreme Court justices move left as they get older, possibly because they’re in a left-leaning part of the country (DC) and that’s where all their friends are. […]

5Vika3mo

This is a significant effect in general, but I'm not sure how much epistemic cost it creates in this situation. Moderates working with AI companies mostly interact with safety researchers, who are not generally doing bad things. There may be a weaker second-order effect where the safety researchers at labs have some epistemic distortion from cooperating with capabilities efforts, and this can influence external people who are collaborating with them.

4eggsyntax3mo

Fair point, that does seem like a moderating (heh) factor.

[-]Joseph Miller3mo*414

I think it's pretty bizarre that despite the fact that LessWrongers are usually acutely aware of the epistemic downsides of being an activist, they seem to have paid relatively little attention to this in their recent transition to activism.

FWIW I'm the primary organizer of PauseAI UK and I've thought about this a lot.

9williawa3mo

I agree with Bucks statement. However, I also feel that like, for the last 10 years, the reverse facet of that point has been argued all the time ad nauseam, both between people on lw, and as criticism coming from outside the community. "People on lesswrong care about epistemic purity, and they will therefore never ever get anything done in the real world. Its easy to have pure epistemics if you're just sitting with your friends thinking about philosophy. If lw really cared about saving the world they would stop with 'politics is mindkiller' and 'scout mindset' and start actually trying to win.". And I think that criticism has some validity. "The right amount of politics is not zero, even though it really is the mind killer". But I also think, arguments for taking AI x-risk very seriously, are unusually strong compared with most political debates. Its an argument we should be able to win even speaking only speaking the whole truth. And in some sense, it can't become an "ordinary" political issue, then the action will not be swift and decisive enough. And if people start making a lot of, even if not false, misleading statements, the risk of that becomes very high.

[-][anonymous]3mo290

I spend lots of time talking to and aiming to persuade AI company staff

Has this succeeded? And if so, do you have specific, concrete examples you can speak about publicly that illustrate this?

[-]Buck3mo268

Mostly AI companies researching AI control and planning to some extent to adopt it (e.g. see the GDM safety plan).

7Oliver Sourbut3mo

Mostly unfiltered blurting Counterfactual? Control is super obvious and not new conceptually; rather it's a bit new that someone is actually trying to do the faffy thing of making something maybe work. I think it's pretty likely they'd be doing it anyway? Counterpoint: companies as group actors (in spite of intelligent and even caring constituent humans) are mostly myopic and cut as many corners as possible by default (either due to vicious leadership, corporate myopia, or (perceived) race incentives), so maybe even super obvious things get skipped without external parties picking up the slack? The same debate could perhaps be had about dangerous capability evaluations.

5Buck3mo

Even though the basic ideas are kind of obvious, I think that us thinking them through and pushing on them has made a big difference in what companies are planning to do.

[-]Neel Nanda3mo202

I doubt this is what Buck had in mind, but he's had meaningful influence on the various ways I've changed my views on the big picture of interp over time

[-]George Ingebretsen3mo1814

Seems like a huge point here is ability to speak unfiltered about AI companies? The Radicals working outside of AI labs would be free to speak candidly while the Moderates would have some kind of relationship to maintain.

[-]Buck3mo146

I agree this is a real thing.

Note that this is more important for group epistemics than individual epistemics.

Also, one reason the constraint isn't that bad for me is that I can and do say spicy stuff privately, including in pretty large private groups. (And you can get away with fairly spicy stuff stated publicly.)

[-]GideonF3mo168

Radicals often seem to think of AI companies as faceless bogeymen thoughtlessly lumbering towards the destruction of the world. I

This strikes me as a fairly strong strawman. My guess if the vast majority of thoughtful radicals basically have a similar view to you. Indeed, at least from your description, its plausible my view is more charitable than yours - I think a lot of it is also endangering humanity due to cowardice and following of local incentives etc.

[-]ronak693mo90

Most of the spicy things I say can be said privately to just the people who need to hear them, freeing me from thinking about the implications of random third parties reading my writing.

Isn't this a disadvantage? If third-parties that disagree with you, were able to criticize spicy things you say, and possibly counter-persuade people from AI companies, you would have to be even more careful.

9Neel Nanda3mo

That leads to things like corporate speak that is completely empty of content. The critics are adversarial or misunderstand you, so the incentives are very off

2Josh Snider3mo

Avoiding what you suggested is why private conversations are an advantage. I think you misunderstood the essay, unless I'm misunderstanding your response.

[-]DanielFilan3mo95

In this post, I mostly conflated "being a moderate" with "working with people at AI companies". You could in principle be a moderate and work to impose extremely moderate regulations, or push for minor changes to the behavior of governments.

Note that I think something like this describes a lot of people working in AI risk policy, and therefore seems like more than a theoretical possibility.

[-]Aorou3mo83

The main advantage for epistemics of working as a moderate is that almost all of your work has an informed, intelligent, thoughtful audience. I spend lots of time talking to and aiming to persuade AI company staff who are generally very intelligent, knowledgeable about AI, and intimately familiar with the goings-on at AI companies. In contrast, as a radical, almost all of your audience—policymakers, elites, the general public—is poorly informed, only able or willing to engage shallowly, and needs to have their attention grabbed intentionally. The former si

... (read more)

[-]the gears to ascension3mo60

I think looking for immediately-applicable changes which are relevant to concrete things that people at companies are doing today need not constrain you to small changes, and so I would not use the words you're using, since they seem like a bad basis space for talking about the moving parts involved. I agree that people who want larger changes would get better feedback, and end up with more actionable plans, if they think in terms of what change is actually implementable using the parts available at hand to people who are thinking on the frontier of making things happen.

[-]Lukas_Gloor3mo60

I really like this post.

In this post, I mostly conflated "being a moderate" with "working with people at AI companies". You could in principle be a moderate and work to impose extremely moderate regulations, or push for minor changes to the behavior of governments.

There's also a "moderates vs radicals" when it comes to attitudes, certainty in one's assumptions, and epistemics, rather than (currently-)favored policies. While some of the benefits you list are hard to get for people who are putting their weight behind interventions to bring about radical chan... (read more)

[-]Unnamed3mo53

I think there is a fair amount of overlap between the epistemic advantages of being a moderate (seeking incremental change from AI companies) and the epistemic disadvantages.

Many of the epistemic advantages come from being more grounded or having tighter feedback loops. If you're trying to do the moderate reformer thing, you need to justify yourself to well-informed people who work at AI companies, you'll get pushback from them, you're trying to get through to them.

But those feedback loops are with that reality as interpreted by people at AI companies. So,... (read more)

[-]Hyperion3mo50

I think that to many in AI labs, the control agenda (in full ambition) is seen as radical (it's all relative) and to best persuade people that it's worth pursuing rigorously, you do in fact need to engage in coalitional politics and anything else that increases your chances of persuasion. The fact that you feel like your current path doesn't imply doing this makes me more pessimistic about your success.

This is an anonymous account but I've met you several times and seen you in action at AI policy events, and I think those data points confirm my view above.

[-]Ben Pace3mo50

Curated, this helpfully pointed out some important dynamics in the discourse that I think are present but have never been quite made explicit. As per the epistemic notice, I think this post was likely quickly written and isn't intended to reflect the platonic ideal set of points on this, but I stand behind it as being illuminating on an important subject and worth sharing.

6Buck3mo

Indeed it was quickly written, I think it was like 20 mins of writing and 30 mins of editing/responding to comments. Based on feedback, I think it was probably a mistake to not put more time into it and post a better version.

6Ben Pace3mo

Would you prefer me to pause the curation for a day while you do that? We have 10-20 mins before it gets emailed out to ~30k ppl.

6Buck3mo

Yes, I'd appreciate that. DM me on slack?

[-]Noosphere893mo51

I want to pull out one particular benefit that I think swamps the rest of the benefits, and in particular explains why I tend to gravitate to moderation over extremism/radicalism:

Because I'm trying to make changes on the margin, details of the current situation are much more interesting to me. In contrast, radicals don't really care about e.g. the different ways that corporate politics affects AI safety interventions at different AI companies.

Caring about the real world details of a problem is often quite important in devising a good solution, and is argua... (read more)

[-]Steven3mo40

Thanks for writing this. I’m not sure I’d call your beliefs moderate, since they involve extracting useful labor from misaligned AI by making deals with them, sometimes for pieces of the observable universe or by verifying with future tech.

On the point of “talking to AI companies”, I think this would be a healthy part of any attempted change although I see that PauseAI and other orgs tend to talk to AI companies in a way that seems to try to make them feel bad by directly stating that what they are doing is wrong. Maybe the line here is “You make sur... (read more)

[-]Anthony Bailey3mo3-3

Very glad of this post. Thanks for broaching, Buck.

Status: I'm an old nerd, lately ML R&D, who dropped career and changed wheelhouse to volunteer at Pause AI.

Two comments on the OP:

details of the current situation are much more interesting to me. In contrast, radicals don't really care about e.g. the different ways that corporate politics affects AI safety interventions at different AI companies.

As per Joseph's response: this does not match me or my general experience of AI safety activism.

Concretely, a recent campaign was specifically about Deep M... (read more)

[-]Eli Tyre3mo33

Thank you for writing this up, I was glad to reflect on it.

[-]roha3mo20

I think there might be a confusion between optimizing for an instrumental vs. an upper-level goal. Is maintaining good epistemics more relevant than working on the right topic? To me the rigor of an inquiry seems secondary to choosing the right subject.

[-]David James3mo10

Consider the following numbered points:

In an important sense, other people (and culture) characterize me as perhaps moderate (or something else). I could be right, wrong, anything in between, or not even wrong. I get labeled largely based on what others think and say of me.
How do I decide on my policy positions? One could make a pretty compelling argument (from rationality, broadly speaking) that my best assessments of the world should determine my policy positions.
Therefore, to the extent I do a good job of #2, I should end up recommending polici

... (read more)

^{^}

Some of our difference in strategy is just specialization: I’m excited for many radical projects, some of my best friends work on them, and I can imagine myself working on them in the future. I work as a moderate mostly because I (and Redwood) have comparative advantage for it: I like thinking in detail about countermeasures, and engaging in detailed arguments with well-informed but skeptical audiences about threat models.

And most of the rest of the difference in strategy between me and radicals is downstream of genuine object-level disagreement about AI risks and how promising different interventions are. If you think that all the interventions I'm excited for are useless, then obviously you shouldn't spend your time advocating for them.

LESSWRONG
LW

LESSWRONG
LW

215

An epistemic advantage of working as a moderate

215

215