LESSWRONG
LW

2475
ryan_greenblatt
20916Ω50075419588
Message
Dialogue
Subscribe

I'm the chief scientist at Redwood Research.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
1a3orn's Shortform
ryan_greenblatt1d168

I agree in general, but think this particular example is pretty reasonable because the point is general and just happens to be have been triggered by a specific post that 1a3orn thinks is an example of this (presumably this?).

I do think it's usually better practice to list a bunch of examples of the thing you're refering to, but also specific examples can sometimes be distracting/unproductive or cause more tribalism than needed? Like in this case I think it would probably be better if people considered this point in abstract (decoupled from implications) and thought about how much they agreed and then after applied this on a case by case basis. (A common tactic that (e.g.) scott alexander uses is to first make an abstract argument before applying it so that people are more likely to properly decouple.)

Reply1
On Fleshling Safety: A Debate by Klurl and Trapaucius.
ryan_greenblatt2d1511

Doesn't seem like a genre subversion to me, it's just a bit clever/meta while still centrally being an allegorical AI alignment dialogue. IDK what the target audience is though (but maybe Eliezer just felt inspired to write this).

Reply
Jacob Pfau's Shortform
ryan_greenblatt2d20

Probably memory / custom system prompt right?

Reply
the gears to ascenscion's Shortform
ryan_greenblatt3d60

Kyle Fish (works on ai welfare at anthropic) endorses this in his 80k episode. Obviously this isn't a commitment from anthropic.

Reply1
the gears to ascenscion's Shortform
ryan_greenblatt3d62

I broadly agree (and have advocated for similar), but I think there are some non-trivial downsides around keeping weights around in the event of AI takeover (due to possiblity for torture/threats from future AIs). Maybe the ideal proposal would involve the weights being stored in some way that involves attempting destruction in the event of AI takeover. E.g., give 10 people keys and requires >3 keys to unencrypt and instruct people to destroy keys in the event of AI takevoer. (Unclear if this would work tbc.)

Reply
Noah Birnbaum's Shortform
ryan_greenblatt5d119

Yes, but you'd naively hope this wouldn't apply to shitty posts, just to mediocre posts. Like, maybe more people would read, but if the post is actually bad, people would downvote etc.

Reply1
Jesse Hoogland's Shortform
ryan_greenblatt6dΩ385

I'm pretty excited about building tools/methods for better dataset influence understanding, so this intuitively seems pretty exciting! (I'm both interested in better cheap approximation of the effects of leaving some data out and the effects of adding some data in.)

(I haven't looked at the exact method and results in this paper yet.)

Reply
AI #107: The Misplaced Hype Machine
ryan_greenblatt6d30

The exact text is:

Dario: 6 months ago I I made this prediction that, you know, in in 6 months 90% of code would be written by by by AI models. Some people think that prediction is wrong, but within Anthropic and within a number of companies that we work with, that is absolutely true.

Marc: Um now 90 you know so you're saying that 90% of all code at Anthropic being written by the by the model today—

Dario: on on many teams you know not uniformly

I think on some teams at Anthropic 90% of code is written by AIs and on some teams it isn't for an average lower than 90%. I say more here.

Reply
faul_sname's Shortform
ryan_greenblatt9d20

Yes, I just meant "misaligned ai takeover". Edited to clarify.

Reply
faul_sname's Shortform
ryan_greenblatt10d62

E. g., Ryan Greenblatt thinks that spending 5% more resources than is myopically commercially expedient would be enough. AI 2027 also assumes something like this.

TBC, my view isn't that this is sufficient for avoiding takeover risk, it is that this suffices for "you [to] have a reasonable chance of avoiding AI takeover (maybe 50% chance of misaligned AI takeover?)".

(You seem to understand that this is my perspective and I think this is also mostly clear from the context in the box, but I wanted to clarify this given the footnote might be read in isolation or misinterpreted.)

Reply1
Load More
15ryan_greenblatt's Shortform
Ω
2y
Ω
320
85Is 90% of code at Anthropic being written by AIs?
6d
13
32Reducing risk from scheming by studying trained-in scheming behavior
Ω
12d
Ω
0
41Iterated Development and Study of Schemers (IDSS)
Ω
18d
Ω
1
125Plans A, B, C, and D for misalignment risk
Ω
20d
Ω
68
234Reasons to sell frontier lab equity to donate now rather than later
1mo
33
56Notes on fatalities from AI takeover
23d
61
47Focus transparency on risk reports, not safety cases
Ω
1mo
Ω
3
40Prospects for studying actual schemers
Ω
1mo
Ω
2
46AIs will greatly change engineering in AI companies well before AGI
Ω
2mo
Ω
9
154Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
Ω
2mo
Ω
32
Load More
Anthropic (org)
10 months ago
(+17/-146)
Frontier AI Companies
a year ago
Frontier AI Companies
a year ago
(+119/-44)
Deceptive Alignment
2 years ago
(+15/-10)
Deceptive Alignment
2 years ago
(+53)
Vote Strength
2 years ago
(+35)
Holden Karnofsky
3 years ago
(+151/-7)
Squiggle Maximizer (formerly "Paperclip maximizer")
3 years ago
(+316/-20)