LESSWRONG
LW

Buck
12123Ω2945394602
Message
Dialogue
Subscribe

CEO at Redwood Research.

AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
12Buck's Shortform
Ω
6y
Ω
174
No wikitag contributions to display.
TurnTrout's shortform feed
Buck1d81

I wish Nate had optimized his post more for being clear about which of these things he was talking about!

Reply
Proposal for making credible commitments to AIs.
Buck1d159

We made a similar proposal here:

A basic problem for making deals with AIs in practice is that AIs aren’t legal persons, which means that they can’t directly rely on the legal system to enforce contracts they’ve made with humans. One way around this is to enable representatives for the AI’s interests to make contracts on the AI’s behalf.

For example, we could set up a foundation with a board of trustees who are supposed to represent the interests of misaligned AIs that we make deals with, while not doing things that are illegal or actively endanger humanity. We try our best to use the legal system to bind the trustees to the commitment of representing misaligned AIs. Because the legal system isn’t set up for this, this might be pretty hard, so these efforts should be complemented with careful selection of trustees who genuinely care about their mission. The central benefit of having this foundation is that the AIs primarily have to trust the foundation rather than the AI lab to hold up their end of the deal. Another benefit is that the existence of this foundation can add credibility by being in the training data. Lastly, the foundation gives AIs and labs a concrete path to making a deal.

There are some questions here about the legal structure of the foundation, and it might be the case that the US is not the ideal jurisdiction for the foundation. It would be good for there to be legal recourse in case the board of trustees decide to betray their mission of furthering the AI’s interests.

Reply
TurnTrout's shortform feed
Buck2d6-2

I think it would be bad for every single post that Nate publishes on maybe-sorta-related subjects to turn into a platform for relitigating his past behavior

I totally agree, but I think that the topic of the post was pretty related to the things people have complained about before, so it's more on topic than it would be on a random Nate post (e.g. it seems more relevant on that post than on most of Nate's blog posts).

Reply
TurnTrout's shortform feed
Buck2d384

After talking to someone about this a little, I'm a bit more sympathetic to Soares's actions here; I think they were probably a mistake for pragmatic reasons, but I'm also sympathetic to arguments that LessWrong is better off if it's a space where people enforce whatever commenting policies they like, and I don't want to be part of social punishment if I don't endorse the social punishment. So I partially retract the first part of my comment. Idk, I might think more about this.

Reply1
TurnTrout's shortform feed
Buck2d*25-2

I think this behavior of Nate's is dumb and annoying and I appreciate you calling it out.

FWIW I very much doubt this is Nate executing a careful strategy to suppress negative information about him (his action was probably counterproductive from that perspective), I think he just has standards according to which it's fine for him to do this when he thinks people are being dumb and annoying, and he thinks your comment is off-topic because he considers it irrelevant to the main thrust of his post.

Reply
A case for courage, when speaking of AI danger
Buck2d30

Ok. I agree with many particular points here, and there are others that I think are wrong, and others where I'm unsure.

For what it's worth, I think SB-1047 would have been good for AI takeover risk on the merits, even though (as you note) it isn't close to all we'd want from AI regulation.

Reply
Nina Panickssery's Shortform
Buck2d149

I find this position on ems bizarre. If the upload acts like a human brain, and then also the uploads seem normalish after interacting with them a bunch, I feel totally fine with them.

I also am more optimistic than you about creating AIs that have very different internals but that I think are good successors, though I don't have a strong opinion.

Reply11
A case for courage, when speaking of AI danger
Buck3d4-2

Ok; what do you think of Soares's claim that SB-1047 should have been made stronger and the connections to existential risk should have been made more clearly? That seems probably false to me.

Reply
A case for courage, when speaking of AI danger
Buck3d110

Ok. I don't think your original post is clear about which of these many different theses it has, or which points it thinks are evidence for other points, or how strongly you think any of them.

I don't know how to understand your thesis other than "in politics you should always pitch people by saying how the issue looks to you, Overton window or personalized persuasion style be damned". I think the strong version of this claim is obviously false. Though maybe it's good advice for you (because it matches your personality profile) and perhaps it's good advice for many/most of the people we know.

I think that making SB-1047 more restrictive would have made it less likely to pass, because it would have made it easier to attack and fewer people would agree that it's a step in the right direction. I don't understand who you think would have flipped from negative to positive on the bill based on it being stronger—surely not the AI companies and VCs who lobbyied against it and probably eventually persuaded Newsom to veto?

I feel like the core thing that we've seen in DC is that the Overton window has shifted, almost entirely as a result of AI capabilities getting better, and now people are both more receptive to some of these arguments and more willing to acknowledge their sympathy.

Reply
Roman Malov's Shortform
Buck3d123

Buy something with it and destroy that.

Reply
Load More
106Comparing risk from internally-deployed AI to insider and outsider threats from humans
Ω
8d
Ω
6
107Making deals with early schemers
Ω
11d
Ω
41
75Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking
Ω
2mo
Ω
1
39Handling schemers if shutdown is not an option
Ω
2mo
Ω
2
124Ctrl-Z: Controlling AI Agents via Resampling
Ω
3mo
Ω
0
29How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Ω
3mo
Ω
1
34Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
Ω
3mo
Ω
0
130Some articles in “International Security” that I enjoyed
4mo
10
57A sketch of an AI control safety case
Ω
5mo
Ω
0
139Ten people on the inside
Ω
5mo
Ω
28
Load More