LESSWRONG
LW

764
ryan_greenblatt
20429Ω49135119348
Message
Dialogue
Subscribe

I'm the chief scientist at Redwood Research.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Tomás B.'s Shortform
ryan_greenblatt1h20

To clarify my perspective a bit here:

I think literally just prepending "Humor:" or "Mostly joking:" would make me think this was basically fine. Or like a footnote saying "mostly a joke / not justifying this here". Like it's just good to be clear about what is non-argued for humor vs what is a serious criticism (and if it is a serious criticism of this sort, then I think we should have reasonably high standards for this, e.g. like the questions in my comment are relevant).

Idk what the overall policy for lesswrong should be, but this sort of thing does feel scary to me and worth being on guard about.

Reply
Tomás B.'s Shortform
ryan_greenblatt2h20

As in, you think frontier math is (strongly) net negative for this reason? Or you think Epoch was going to do work along these lines that you think is net negative?

Reply
Tomás B.'s Shortform
ryan_greenblatt3h1610

What do you mean by "I am not sure OpenPhil should have funded these guys"? Edit for context: OpenPhil funded Epoch where they previously worked, but hasn't funded Mechanize where they currently work. Are you joking? Do you think it's bad to fund organizations that do useful work (or that you think will do useful work) but which employ people who have beliefs that might make them do work that you think is net negative? Do you have some more narrow belief about pressure OpenPhil should be applying to organizations that are trying to be neutral/trusted?

I think it's probably bad to say stuff (at least on LessWrong) like "I am not sure OpenPhil should have funded these guys" (the image is fine satire I guess) because this seems like the sort of thing which yields tribal dynamics and negative polarization. When criticizing people, I think it's good to be clear and specific. I think "humor which criticizes people" is maybe fine, but I feel like this can easily erode epistemics because it is hard to respond to. I think "ambiguity about whether this criticism is humor / meant literally etc" is much worse (and common on e.g. X/twitter).

Reply
The Thinking Machines Tinker API is good news for AI control and security
ryan_greenblatt3hΩ230

The main class of projects that need granular model weight access to frontier models is model internals/interpretability.

You could potentially do a version of this sort of API which has some hooks for interacting with activations to capture a subset of these use cases (e.g. training probes). It would probably add a lot of complexity though and might only cover a small subset of research.

Reply
Plans A, B, C, and D for misalignment risk
ryan_greenblatt4hΩ220

I'm not trying to say "Plan A is doable and shut it all down is intractable".

My view is that "shut it all down" probably requires substantially more (but not a huge amount more) political will than Plan A such that it is maybe like 3x less likely to happen given similar amounts of effort from the safety community.

You started by saying:

My main question is "why do you think Shut Down actually costs more political will?".

So I was trying to respond to this. I think 3x less likely to happen is actually a pretty big deal; this isn't some tiny difference, but neither is it "Plan A is doable and shut it all down is intractable". (And I also think "shut it all down" has various important downsides relative to Plan A, maybe these downsides can be overcome, but by default this makes Plan A look more attractive to me even aside from the political will considerations.)

I think something like Plan A or "shut it all down" are both very unlikely to happen and I'd be pretty sympathetic to describing both as politically intractable (e.g., I think something as good/strong as Plan A is only 5% likely). "politically intractable" isn't very precise though, so I think we have to talk more quantitatively.

Note that my view is also that I think pushing for Plan A isn't the most leveraged thing for most people to do at the margin; I expect to focus on making Plans C/D go better (with some weight on things like Plan B).

Reply
Daniel Kokotajlo's Shortform
ryan_greenblatt4h42

NVIDIA loves it for commoditize their complement reasons, so there could be a powerful lobby to keep it going and enforce it

I think NVIDIA probably wouldn't be in favor because this would reduce the incentives to train smarter models. Like, at the margin, they want to subsidize open weight models, but requiring only open weight everything seems like it could easily reduce the number of GPUs people buy because the best model is much less capable than it would otherwise have been.

Reply
Plans A, B, C, and D for misalignment risk
ryan_greenblatt17hΩ230

But like, the last draft of Plan A I saw include "we relocate all the compute to centralized locations in third party countries" as an eventual goal. That seems pretty crazy?

Yes, this is much harder (from a political will perspective) than compute + fab monitoring which is part of my point? Like my view is that in terms of political will requirements:

compute + fab monitoring << Plan A < Shut it all down

Reply
Plans A, B, C, and D for misalignment risk
ryan_greenblatt18h20

Hmm, we probably disagree about the risk depending on what you mean by "uncontrollably powerful", especially if the AI company didn't have any particular reason to think the jump would be especially high (as is typically the case for new models).

I'd guess it's hard for a model to be "uncontrollably powerful" (in the sense that we would be taking on a bunch of risk from a Plan A perspective) unless it is at least pretty close to being able to automate AI R&D so this requires a pretty huge capabilities jump.

My guess of direct risk[1] from the next generation of models (as in, the next major release from Anthropic+xAI+OpenAI+GDM) would be like 0.3% and I'd be like 3x lower if we were proceeding decently cautiously in a Plan A style scenario (e.g. if we had an indication the model might be much more powerful, we'd scale only a bit at a time).


My estimate for 0.3%: My median is 8.5 years and maybe there are ~3-ish major model releases per year, so assuming uniform we'd get 2% chance of going all the way to AI R&D automation this generation. Then, I cut by a factor of like 10 due to this being a much larger discontinuity than we've seen before and by another factor of 2 from this not being guaranteed to directly result in takeover. Then, I go back up a bunch due to model uncertainty and thinking that we might be especially likely to see a big advance around now.


Edit: TBC, I think it's reasonable to describe the current state of affairs as "we can't very confidently proceed safely" but I also think the view "we can very confidently proceed safely (e.g., takeover risk from next model generation is <0.025%) given being decently cautious" is pretty reasonable.


  1. By direct risk, I mean including takeover itself and risk increases through mechanisms like self-exfil, rogue internal deployment but not including sabotaging research the AI is supposed to be doing. ↩︎

Reply
Plans A, B, C, and D for misalignment risk
ryan_greenblatt18hΩ220

I think compute + fab monitoring with potential for escalation requires much lower political will than shutting down AI development. I agree that both Plan A and shut it all down require something like this. Like I think this monitoring would plausibly not require much more political will than export controls.


Advanced bio AI seems pretty good for the world and to capture a lot of the benefits

Huh? No it doesn't capture much of the benefits. I would have guessed it captures a tiny fraction of the benefits for advanced AI, even for AIs around the level where you might want to pause at human level.


But, it seems like a version of the treaty that doesn't at least have the capacity to shutdown compute temporarily is a kinda fake version of Plan A, and once you have that, "Shut down" vs "Controlled Takeoff" feels more like arguing details than fundamentals to me.

I agree you will have the capacity to shut down compute temporarily either way; I disagree that there isn't much of a difference between slowing down takeoff and shutting down all further non-narrow AI development.

Reply11
Plans A, B, C, and D for misalignment risk
ryan_greenblatt20hΩ331

Sure, I agree that Nate/Eliezer think we should eventually build superintelligence and don't want to causal a pause that lasts forever. In the comment you're responding to, I'm just talking about difficulty in getting people to buy the narrative.

More generally, what Nate/Eliezer think is best is doesn't resolve concerns with the pause going poorly because something else happens in practice. This includes the pause going on too long or leading to a general anti-AI/anti-digital-minds/anti-progress view which is costly for the longer run future.) (This applies to the proposed Plan A as well, but I think poor implementation is less scary in various ways and the particular risk of ~anti-progress forever is less strong.)

Reply
Load More
15ryan_greenblatt's Shortform
Ω
2y
Ω
315
109Plans A, B, C, and D for misalignment risk
Ω
1d
Ω
48
225Reasons to sell frontier lab equity to donate now rather than later
13d
32
55Notes on fatalities from AI takeover
4d
60
47Focus transparency on risk reports, not safety cases
Ω
17d
Ω
3
40Prospects for studying actual schemers
Ω
20d
Ω
0
46AIs will greatly change engineering in AI companies well before AGI
Ω
1mo
Ω
9
154Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
Ω
1mo
Ω
32
99Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)
Ω
1mo
Ω
2
163My AGI timeline updates from GPT-5 (and 2025 so far)
Ω
2mo
Ω
14
91Recent Redwood Research project proposals
Ω
3mo
Ω
0
Load More
Anthropic (org)
9 months ago
(+17/-146)
Frontier AI Companies
a year ago
Frontier AI Companies
a year ago
(+119/-44)
Deceptive Alignment
2 years ago
(+15/-10)
Deceptive Alignment
2 years ago
(+53)
Vote Strength
2 years ago
(+35)
Holden Karnofsky
2 years ago
(+151/-7)
Squiggle Maximizer (formerly "Paperclip maximizer")
3 years ago
(+316/-20)