LESSWRONG
LW

Jeffrey Ladish
2092Ω217940
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5landfish lab
Ω
5y
Ω
20
No wikitag contributions to display.
(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
Jeffrey Ladish7mo510

Just donated 2k. Thanks for all you’re doing Lightcone Team!

Reply44
Hooray for stepping out of the limelight
Jeffrey Ladish2y2219

+1 on this, and also I think Anthropic should get some credit for not hyping things like Claude when they definitely could have (and I think received some tangible benefit from doing so).

See: https://www.lesswrong.com/posts/xhKr5KtvdJRssMeJ3/anthropic-s-core-views-on-ai-safety?commentId=9xe2j2Edy6zuzHvP9, and also some discussion between me and Oli about whether this was good / what parts of it were good.
 

Reply
Donation offsets for ChatGPT Plus subscriptions
Jeffrey Ladish2y20

@Daniel_Eth  asked me why I choose 1:1 offsets. The answer is that I did not have a principled reason for doing so, and do not think there's anything special about 1:1 offsets except that they're a decent schelling point. I think any offsets are better than no offsets here. I don't feel like BOTECs of harm caused as a way to calculate offsets are likely to be particularly useful here but I'd be interested in arguments to this effect if people had them. 

Reply
To determine alignment difficulty, we need to know the absolute difficulty of alignment generalization
Jeffrey Ladish2y41

an agent will aim its capabilities towards its current goals including by reshaping itself and its context to make itself better-targeted at those goals, creating a virtuous cycle wherein increased capabilities lock in & robustify initial alignment, so long as that initial alignment was in a "basin of attraction", so to speak

Yeah, I think if you nail initial alignment and have a system that has developed the instrumental drive for goal-content integrity, you're in a really good position. That's what I mean by "getting alignment to generalize in a robust manner", getting your AI system to the point where it "really *wants* to help you help them stay aligned with you in a deep way".

I think a key question of inner alignment difficulty is to what extent there is a "basin of attraction", where Yudkowsky is arguing there's no easy basin to find, and you basically have to precariously balance on some hill on the value landscape. 

I wrote a little about my confusions about when goal-content integrity might develop here.

Reply
Linkpost: A Contra AI FOOM Reading List
Jeffrey Ladish2y41

It seems nice to have these in one place but I'd love it if someone highlighted a top 10 or something.

Reply
Anthropic's Core Views on AI Safety
Jeffrey Ladish2y20

Yeah, I agree with all of this, seems worth saying. Now to figure out the object level... 🤔

Reply
Thoughts on the OpenAI alignment plan: will AI research assistants be net-positive for AI existential risk?
Jeffrey Ladish2y40

Yeah that last quote is pretty worrying. If the alignment team doesn't have the political capital / support of leadership within the org to have people stop doing particular projects or development pathways, I am even more pessimistic about OpenAI's trajectory. I hope that changes!

Reply
Anthropic's Core Views on AI Safety
Jeffrey Ladish2y62

Yeah I think we should all be scared of the incentives here.

Reply
Anthropic's Core Views on AI Safety
Jeffrey Ladish2y2719

Yeah I think it can both be true that OpenAI felt more pressure to release products faster due to perceived competition risk from Anthropic, and also that Anthropic showed restraint in not trying to race them to get public demos or a product out. In terms of speeding up AI development, not building anything > building something and keeping it completely secret > building something that your competitors learn about > building something and generating public hype about it via demos > building something with hype and publicly releasing it to users & customers. I just want to make sure people are tracking the differences.

so that it's pretty unclear that not releasing actually had much of an effect on preventing racing

It seems like if OpenAI didn't publicly release ChatGPT then that huge hype wave wouldn't have happened, at least for a while, since Anthropic sitting on Claude rather than release. I think it's legit to question whether any group scaling SOTA models is net positive but I want to be clear about credit assignment, and the ChatGPT release was an action taken by OpenAI.

Reply
Anthropic's Core Views on AI Safety
Jeffrey Ladish2yΩ193526

I both agree that the race dynamic is concerning (and would like to see Anthropic address them explicitly), and also think that Anthropic should get a fair bit of credit for not releasing Claude before ChatGPT, a thing they could have done and probably gained a lot of investment / hype over.  I think Anthropic's "let's not contribute to AI hype" is good in the same way that OpenAI's "let's generate massive" hype strategy is bad.

Like definitely I'm worried about the incentive to stay competitive, especially in the product space. But I think it's worth highlighting that Anthropic (and Deepmind and Google AI fwiw)  have not rushed to product when they could have. There's still the relevant question "is building SOTA systems net positive given this strategy", and it's not clear to me what the answer is, but I want to acknowledge that "building SOTA systems and generating hype / rushing to market" is the default for startups and "build SOTA systems and resist the juicy incentive" is what Anthropic has done so far & that's significant.

Reply
Load More
122Shutdown Resistance in Reasoning Models
2d
13
46Bounty for Evidence on Some of Palisade Research's Beliefs
9mo
4
43Take SCIFs, it’s dangerous to go alone
Ω
1y
Ω
1
23Palisade is hiring Research Engineers
2y
0
117unRLHF - Efficiently undoing LLM safeguards
Ω
2y
Ω
15
151LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B
Ω
2y
Ω
29
85The Agency Overhang
2y
6
53Donation offsets for ChatGPT Plus subscriptions
2y
3
12To determine alignment difficulty, we need to know the absolute difficulty of alignment generalization
2y
3
58Thoughts on the OpenAI alignment plan: will AI research assistants be net-positive for AI existential risk?
2y
3
Load More