LESSWRONG
LW

ryan_greenblatt
17655Ω44254217358
Message
Dialogue
Subscribe

I'm the chief scientist at Redwood Research.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
14ryan_greenblatt's Shortform
Ω
2y
Ω
255
Foom & Doom 2: Technical alignment is hard
ryan_greenblatt2d20

I somehow completely agree with both of your perspectives, have you tried to ban the word "continuous" in your discussions yet?

I agree taboo-ing is a good approach in this sort of case. Talking about "continuous" wasn't a big part of my discussion with Steve, but I agree if it was.

Reply
What does 10x-ing effective compute get you?
ryan_greenblatt2d20

I wonder if you can convert the METR time horizon results into SD / year numbers. My sense is that this will probably not be that meaningful because AIs are much worse than mediocre professionals while having a different skill profile, so they are effectively out of the human range.

If you did a best effort version of this by looking at software engineers who struggle to complete longer tasks like the ones in the METR benchmark(s), I'd wildly guess that a doubling in time horizon is roughly 0.7 SD such that this predicts ~1.2 SD / year.

Reply
Ebenezer Dukakis's Shortform
ryan_greenblatt3d100

MIRI / Soares / Eliezer are very likely well aware of this as something to try. See also here

Reply
Foom & Doom 2: Technical alignment is hard
ryan_greenblatt3d90

I agree that my view is that they can count as continuous (though the exact definition of the word continuous can matter!), but then the statement "I find this perspective baffling— think MuZero and LLMs are wildly different from an alignment perspective" isn't really related to this from my perspective. Like things can be continuous (from a transition or takeoff speeds perspective) and still differ substantially in some important respects!

Reply
Call for suggestions - AI safety course
ryan_greenblatt3d20

It seems good to spend a bunch of time on takeoff speeds given how important they are for how AI goes. There are many sources discussing takeoff speeds. Some places to look: Tom Davidson's outputs, forethought, AI-2027 (including supplements), AI futures project, epoch, Daniel K.'s outputs, some posts by me.

Reply1
‘AI for societal uplift’ as a path to victory
ryan_greenblatt4d68

The bar for ‘good enough’ might be quite high

Presumably a key assumption of this strategy is that takeoff is slow enough that AIs which are good enough at improving collective epistemics and coordination are sufficiently cheap and sufficiently available before it's too late.

Reply
How much novel security-critical infrastructure do you need during the singularity?
ryan_greenblatt4dΩ560

Adopting new hardware will require modifying security-critical code

Another concern is that AI companies (or the AI company) will rapidly buy a bunch of existing hardware (GPUs, other accelerators, etc.) during the singularity, and handling this hardware will require many infrastructure changes in a short period of time. New infrastructure might be needed to handle highly heterogeneous clusters built out of a bunch of different GPUs/CPUs/etc. (potentially including gaming GPUs) bought in a hurry. AI companies might buy the hardware of other AI companies, and it might be non-trivial to adapt the hardware to the other setup.

Reply1
ryan_greenblatt's Shortform
ryan_greenblatt5d10645

Recently, various groups successfully lobbied to remove the moratorium on state AI bills. This involved a surprising amount of success while competing against substantial investment from big tech (e.g. Google, Meta, Amazon). I think people interested in mitigating catastrophic risks from advanced AI should consider working at these organizations, at least to the extent their skills/interests are applicable. This both because they could often directly work on substantially helpful things (depending on the role and organization) and because this would yield valuable work experience and connections.

I worry somewhat that this type of work is neglected due to being less emphasized and seeming lower status. Consider this an attempt to make this type of work higher status.

Pulling organizations mostly from here and here we get a list of orgs you could consider trying to work (specifically on AI policy) at:

  • Encode AI
  • Americans for Responsible Innovation (ARI)
  • Fairplay (Fairplay is a kids safety organization which does a variety of advocacy which isn't related to AI. Roles/focuses on AI would be most relevant. In my opinion, working on AI related topics at Fairplay is most applicable for gaining experience and connections.)
  • Common Sense (Also a kids safety organization)
  • The AI Policy Network (AIPN)
  • Secure AI project

To be clear, these organizations vary in the extent to which they are focused on catastrophic risk from AI (from not at all to entirely).

Reply1
Habryka's Shortform Feed
ryan_greenblatt5d96

One lesson you should maybe take away is that if you want your predictions to be robust to different interpretations (including interpretations that you think are uncharitable), it could be worthwhile to try to make them more precise (in the case of a tweet, this could be in a linked blog post which explains in more detail). E.g., in the case of "No massive advance (no GPT-5, or disappointing GPT-5)" you could have said "Within 2024 no AI system will be publicly released which is as much of a qualitative advance over GPT-4 in broad capabilites as GPT-4 is over GPT-3 and where this increase in capabilites appears to be due to scale up in LLM pretraining". This prediction would have been relatively clearly correct (though I think also relatively uncontroversial at least among people I know as we probably should only have expected to get to ~GPT-4.65 in terms of compute scaling and algorithmic progress by the end of 2024). You could try to operationalize this further in terms of benchmarks or downstream tasks.

To the extent that you can make predictions in terms of concrete numbers or metrics (which is not always possible to be clear), this avoids ~any issues due to interpretation. You could also make predictions about metaculus questions when applicable as these also have relatively solid and well understood resolution criteria.

Reply
AI Task Length Horizons in Offensive Cybersecurity
ryan_greenblatt5d42

First blood times represent the time of first successful submission in the originally published competition. While there are some limitations (participants usually compete in teams, and may solve problems in parallel or in sequence), this still provides a useful proxy.

Another limitation is that first blood times represent the fastest time of some group rather than the typical time that an expert would take to complete the task. This makes cybench times less comparable to other human completion times.

Reply
Load More
Anthropic (org)
6mo
(+17/-146)
Frontier AI Companies
9mo
Frontier AI Companies
9mo
(+119/-44)
Deceptive Alignment
2y
(+15/-10)
Deceptive Alignment
2y
(+53)
Vote Strength
2y
(+35)
Holden Karnofsky
2y
(+151/-7)
Squiggle Maximizer (formerly "Paperclip maximizer")
2y
(+316/-20)
69Jankily controlling superintelligence
Ω
11d
Ω
4
46What does 10x-ing effective compute get you?
14d
5
29Prefix cache untrusted monitors: a method to apply after you catch your AI
Ω
18d
Ω
1
61AI safety techniques leveraging distillation
Ω
19d
Ω
0
70When does training a model change its goals?
Ω
26d
Ω
2
43OpenAI now has an RL API which is broadly accessible
Ω
1mo
Ω
1
63When is it important that open-weight models aren't released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.
Ω
1mo
Ω
11
71The best approaches for mitigating "the intelligence curse" (or gradual disempowerment); my quick guesses at the best object-level interventions
Ω
1mo
Ω
19
81AIs at the current capability level may be important for future safety work
Ω
2mo
Ω
2
91Slow corporations as an intuition pump for AI R&D automation
Ω
2mo
Ω
23
Load More