LESSWRONG
LW

Buck
12511Ω3040434702
Message
Dialogue
Subscribe

CEO at Redwood Research.

AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
12Buck's Shortform
Ω
6y
Ω
177
Lessons from the Iraq War for AI policy
Buck21h2-3

I think that the Iraq war seems unusual in that it was entirely proactive. Like, the war was not in response to a particular provocation, it was an entrepreneurial war aimed at preventing a future problem. In contrast, the wars in Korea, the Gulf, and (arguably) Vietnam were all responsive to active aggression.

Reply
Lessons from the Iraq War for AI policy
Buck21h20

My understanding is:

The admin claimed that the evidence in favor of WMD presence was much stronger than it actually was. This was partially because they were confused/groupthinky, and partially because they were aiming to persuade. I agree that it was reasonable to think Iraq had WMDs on priors.

Reply
Buck's Shortform
Buck2d3611

I think that I've historically underrated learning about historical events that happened in the last 30 years, compared to reading about more distant history.

For example, I recently spent time learning about the Bush presidency, and found learning about the Iraq war quite thought-provoking. I found it really easy to learn about things like the foreign policy differences among factions in the Bush admin, because e.g. I already knew the names of most of the actors and their stances are pretty intuitive/easy to understand. But I still found it interesting to understand the dynamics; my background knowledge wasn't good enough for me to feel like I'd basically heard this all before.

Reply2
ryan_greenblatt's Shortform
Buck2dΩ220

Ryan discusses this at more length in his 80K podcast.

Reply
What's worse, spies or schemers?
Buck2dΩ342

You can undeploy them, if you want!

One difficulty is again that the scheming is particularly correlated. Firing a single spy might not be traumatic for your organization's productivity, but ceasing all deployment of untrusted models plausibly grinds you to a halt.

And in terms of fixing them, note that it's pretty hard for fix spies! I think you're in a better position for fixing schemers than spies, e.g. see here.

Reply
A case for courage, when speaking of AI danger
Buck5d20

Yeah to reiterate, idk why you think 

broadcasting uncertainty about things like "25% vs 90+%" when they could instead be broadcasting confidence about "this is ridiculous and should stop"

My guess is that the main reason they broadcast uncertainty is because they're worried that their position is unpalatable, rather than because of their internal sense of uncertainty.

Reply
Call for suggestions - AI safety course
Buck7d20

I agree this stuff is important, mostly because I think that scheming is a big part of where catastrophic risk from AI comes from.

Reply
How much novel security-critical infrastructure do you need during the singularity?
Buck7dΩ440

I agree that fine-grained configuration mechanisms probably increase security, and this might net increase security, as I say in the post.

I agree that the increasing value of compute might increase the isolation you use for the reason you said. One reason I'm skeptical is that you can get almost all that value by having AIs voluntarily adopt mechanisms that generally isolate their jobs from the other jobs that are running (e.g. having naming conventions about who is allowed to read or write what) that get you the reliability benefits without getting any security.

Reply
ryan_greenblatt's Shortform
Buck8d31

Yep, that seems great!

Reply
ryan_greenblatt's Shortform
Buck9d*102

Plausibly, but their type of pressure was not at all what I think ended up being most helpful here!

Reply
Load More
No wikitag contributions to display.
135Lessons from the Iraq War for AI policy
2d
23
48What's worse, spies or schemers?
Ω
3d
Ω
2
56How much novel security-critical infrastructure do you need during the singularity?
Ω
8d
Ω
6
62There are two fundamentally different constraints on schemers
Ω
10d
Ω
0
131Comparing risk from internally-deployed AI to insider and outsider threats from humans
Ω
2d
Ω
18
111Making deals with early schemers
Ω
22d
Ω
41
75Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking
Ω
2mo
Ω
1
39Handling schemers if shutdown is not an option
Ω
3mo
Ω
2
124Ctrl-Z: Controlling AI Agents via Resampling
Ω
3mo
Ω
0
29How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Ω
3mo
Ω
1
Load More