LESSWRONG
LW

3071
Buck
15282Ω3138485912
Message
Dialogue
Subscribe

CEO at Redwood Research.

AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

If we are ever arguing on LessWrong and you feel like it's kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I'll probably be willing to call to discuss briefly.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
12Buck's Shortform
Ω
6y
Ω
293
davekasten's Shortform
Buck1d*139

I basically agree with Zach that based on public information it seems like it would be really hard for them to be robust to this and it seems implausible that they have justified confidence in such robustness.

I agree that he doesn't say the argument in very much depth. Obviously, I think it'd be great if someone made the argument in more detail. I think Zach's point is a positive contribution even though it isn't that detailed. 

Reply
Raemon's Shortform
Buck2d40

The most common ways that I see comments have errors that I think an LLM could fix are:

  • Typos.
  • Missing a point. Like I often write a comment and fail to realize that someone nearby in the comment tree has already responded to this point. Or I misunderstood the comment I was responding to. It would be helpful to have LLMs note this.
  • Maybe basic fact-checking?

Maybe you should roll this out for comments before posts.  

Reply
The Thinking Machines Tinker API is good news for AI control and security
Buck3dΩ220

Yeah for sure. A really nice thing about the Tinker API is that it doesn't allow users to specify arbitrary code to be executed on the machine with weights, which makes security much easier.

Reply
Buck's Shortform
Buck3d111106

I hear a lot of scorn for the rationalist style where you caveat every sentence with "I think" or the like. I want to defend that style. 

There is real semantic content to me saying "I think" in a sentence. I don't say it when I'm stating established fact. I only use it when I'm saying something which is fundamentally speculative. But most of my sentences are fundamentally speculative.

It feels like people were complaining that I use the future tense a lot. Like, sure, my text uses the future tense more than average, and future tense is indeed somewhat more awkward. But future tense is the established way to talk about the future, which is what I wanted to talk about. It seems pretty weird to switch to present tense just because people don't like future tense.

Reply14
The Thinking Machines Tinker API is good news for AI control and security
Buck3dΩ220

Yeah, what I'm saying is that even if the computation performed in a hook is trivial, it sucks if that computation has to happen on a different computer than the one doing inference.

Reply
The Thinking Machines Tinker API is good news for AI control and security
Buck3dΩ360

Yeah totally there's a bunch of stuff like this you could do. The two main issues:

  • Adding methods like this might increase complexity and if you add lots of them they might interact in ways that allow users to violate your security properties.
  • Some natural things you'd want to do for interacting with activations (e.g. applying arbitrary functions to modify activations during a forward pass) would substantially reduce the efficiency and batchability here--the API server would have to block inference while waiting for the user's computer to compute the change to activations and send it back.

It would be a slightly good exercise for someone to go through the most important techniques that interact with model internals and see how many of them would have these problems. 

Reply
Tomás B.'s Shortform
Buck3d217

(For clarity: Open Phil funded those guys in the sense of funding Epoch, where they previously worked and where they probably developed a lot of useful context and connections, but AFAIK hasn't funded Mechanize.)

Reply
Generalization and the Multiple Stage Fallacy?
Buck4d42

I liked Joe Carlsmith's discussion of this here.

Reply
We won’t get AIs smart enough to solve alignment but too dumb to rebel
Buck6d152

The argument in this post seems to be:

AIs smart enough to help with alignment are capable enough that they'll realize they are misaligned. Therefore, they will not help with alignment.

When I think about getting misaligned AIs to help with alignment research and other tasks, I'm normally not imagining that the AIs are unaware that they are misaligned. I'm imagining that we can get them to do useful work anyway. See here and here.

You might be interested in the Redwood Research reading list, which contains lots of analyses of these questions and many others.

Reply1
Reasons to sell frontier lab equity to donate now rather than later
Buck9d21

As someone who’s worked at MIRI, I disagree regardless of when you are imagining them doing this. 

Conditioned on agreeing with them about AI xrisk stuff and also about high level strategy, I think giving them money now seems better than in the past.

Reply
Load More
No wikitag contributions to display.
91The Thinking Machines Tinker API is good news for AI control and security
Ω
3d
Ω
10
191Christian homeschoolers in the year 3000
16d
64
207I enjoyed most of IABIED
26d
46
217An epistemic advantage of working as a moderate
2mo
96
48Four places where you can put LLM monitoring
Ω
2mo
Ω
0
25Research Areas in AI Control (The Alignment Project by UK AISI)
Ω
2mo
Ω
0
49Why it's hard to make settings for high-stakes control research
Ω
3mo
Ω
6
91Recent Redwood Research project proposals
Ω
3mo
Ω
0
190Lessons from the Iraq War for AI policy
3mo
25
51What's worse, spies or schemers?
Ω
3mo
Ω
2
Load More