LESSWRONG
LW

ryan_greenblatt
17693Ω44324217378
Message
Dialogue
Subscribe

I'm the chief scientist at Redwood Research.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Vladimir_Nesov's Shortform
ryan_greenblatt3d20

That's how some humans are thinking as well! The arguments are about the same, both for and against. (I think overall rushing RSI is clearly a bad idea for a wide variety of values and personal situations, and so smarter AGIs will more robustly tend to converge on this conclusion than humans do.)

Sorry I meant "share their preferences more than the humans in control share their preferences". I agree that this might be how some humans are thinking, but the case for the humans is much more dubious!

Reply
Vladimir_Nesov's Shortform
ryan_greenblatt3d*110

If (early) scheming-for-long-run-preferences AGIs were in control, they would likely prefer a pause (all else equal). If they aren't, it's very unclear and they very well might not. (E.g., because they gamble that more powerful AIs will share their preferences (edit: share their preferences more than the humans in control do) and they think that these AIs would have a better shot at takeover.)

Reply1
Foom & Doom 2: Technical alignment is hard
ryan_greenblatt6d20

I somehow completely agree with both of your perspectives, have you tried to ban the word "continuous" in your discussions yet?

I agree taboo-ing is a good approach in this sort of case. Talking about "continuous" wasn't a big part of my discussion with Steve, but I agree if it was.

Reply
What does 10x-ing effective compute get you?
ryan_greenblatt6d20

I wonder if you can convert the METR time horizon results into SD / year numbers. My sense is that this will probably not be that meaningful because AIs are much worse than mediocre professionals while having a different skill profile, so they are effectively out of the human range.

If you did a best effort version of this by looking at software engineers who struggle to complete longer tasks like the ones in the METR benchmark(s), I'd wildly guess that a doubling in time horizon is roughly 0.7 SD such that this predicts ~1.2 SD / year.

Reply
Ebenezer Dukakis's Shortform
ryan_greenblatt6d100

MIRI / Soares / Eliezer are very likely well aware of this as something to try. See also here

Reply
Foom & Doom 2: Technical alignment is hard
ryan_greenblatt6d90

I agree that my view is that they can count as continuous (though the exact definition of the word continuous can matter!), but then the statement "I find this perspective baffling— think MuZero and LLMs are wildly different from an alignment perspective" isn't really related to this from my perspective. Like things can be continuous (from a transition or takeoff speeds perspective) and still differ substantially in some important respects!

Reply
Call for suggestions - AI safety course
ryan_greenblatt6d20

It seems good to spend a bunch of time on takeoff speeds given how important they are for how AI goes. There are many sources discussing takeoff speeds. Some places to look: Tom Davidson's outputs, forethought, AI-2027 (including supplements), AI futures project, epoch, Daniel K.'s outputs, some posts by me.

Reply1
‘AI for societal uplift’ as a path to victory
ryan_greenblatt7d68

The bar for ‘good enough’ might be quite high

Presumably a key assumption of this strategy is that takeoff is slow enough that AIs which are good enough at improving collective epistemics and coordination are sufficiently cheap and sufficiently available before it's too late.

Reply
How much novel security-critical infrastructure do you need during the singularity?
ryan_greenblatt7dΩ560

Adopting new hardware will require modifying security-critical code

Another concern is that AI companies (or the AI company) will rapidly buy a bunch of existing hardware (GPUs, other accelerators, etc.) during the singularity, and handling this hardware will require many infrastructure changes in a short period of time. New infrastructure might be needed to handle highly heterogeneous clusters built out of a bunch of different GPUs/CPUs/etc. (potentially including gaming GPUs) bought in a hurry. AI companies might buy the hardware of other AI companies, and it might be non-trivial to adapt the hardware to the other setup.

Reply1
ryan_greenblatt's Shortform
ryan_greenblatt8d10645

Recently, various groups successfully lobbied to remove the moratorium on state AI bills. This involved a surprising amount of success while competing against substantial investment from big tech (e.g. Google, Meta, Amazon). I think people interested in mitigating catastrophic risks from advanced AI should consider working at these organizations, at least to the extent their skills/interests are applicable. This both because they could often directly work on substantially helpful things (depending on the role and organization) and because this would yield valuable work experience and connections.

I worry somewhat that this type of work is neglected due to being less emphasized and seeming lower status. Consider this an attempt to make this type of work higher status.

Pulling organizations mostly from here and here we get a list of orgs you could consider trying to work (specifically on AI policy) at:

  • Encode AI
  • Americans for Responsible Innovation (ARI)
  • Fairplay (Fairplay is a kids safety organization which does a variety of advocacy which isn't related to AI. Roles/focuses on AI would be most relevant. In my opinion, working on AI related topics at Fairplay is most applicable for gaining experience and connections.)
  • Common Sense (Also a kids safety organization)
  • The AI Policy Network (AIPN)
  • Secure AI project

To be clear, these organizations vary in the extent to which they are focused on catastrophic risk from AI (from not at all to entirely).

Reply1
Load More
14ryan_greenblatt's Shortform
Ω
2y
Ω
256
70Jankily controlling superintelligence
Ω
14d
Ω
4
47What does 10x-ing effective compute get you?
17d
5
29Prefix cache untrusted monitors: a method to apply after you catch your AI
Ω
21d
Ω
1
61AI safety techniques leveraging distillation
Ω
22d
Ω
0
70When does training a model change its goals?
Ω
1mo
Ω
2
43OpenAI now has an RL API which is broadly accessible
Ω
1mo
Ω
1
63When is it important that open-weight models aren't released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.
Ω
1mo
Ω
11
71The best approaches for mitigating "the intelligence curse" (or gradual disempowerment); my quick guesses at the best object-level interventions
Ω
1mo
Ω
19
82AIs at the current capability level may be important for future safety work
Ω
2mo
Ω
2
91Slow corporations as an intuition pump for AI R&D automation
Ω
2mo
Ω
23
Load More
Anthropic (org)
6mo
(+17/-146)
Frontier AI Companies
9mo
Frontier AI Companies
9mo
(+119/-44)
Deceptive Alignment
2y
(+15/-10)
Deceptive Alignment
2y
(+53)
Vote Strength
2y
(+35)
Holden Karnofsky
2y
(+151/-7)
Squiggle Maximizer (formerly "Paperclip maximizer")
2y
(+316/-20)