CEO at Redwood Research.
AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.
Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.
If we are ever arguing on LessWrong and you feel like it's kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I'll probably be willing to call to discuss briefly.
I hear a lot of scorn for the rationalist style where you caveat every sentence with "I think" or the like. I want to defend that style.
There is real semantic content to me saying "I think" in a sentence. I don't say it when I'm stating established fact. I only use it when I'm saying something which is fundamentally speculative. But most of my sentences are fundamentally speculative.
It feels like people were complaining that I use the future tense a lot. Like, sure, my text uses the future tense more than average, and future tense is indeed somewhat more awkward. But future tense is the established way to talk about the future, which is what I wanted to talk about. It seems pretty weird to switch to present tense just because people don't like future tense.
Yeah, what I'm saying is that even if the computation performed in a hook is trivial, it sucks if that computation has to happen on a different computer than the one doing inference.
Yeah totally there's a bunch of stuff like this you could do. The two main issues:
It would be a slightly good exercise for someone to go through the most important techniques that interact with model internals and see how many of them would have these problems.
(For clarity: Open Phil funded those guys in the sense of funding Epoch, where they previously worked and where they probably developed a lot of useful context and connections, but AFAIK hasn't funded Mechanize.)
The argument in this post seems to be:
AIs smart enough to help with alignment are capable enough that they'll realize they are misaligned. Therefore, they will not help with alignment.
When I think about getting misaligned AIs to help with alignment research and other tasks, I'm normally not imagining that the AIs are unaware that they are misaligned. I'm imagining that we can get them to do useful work anyway. See here and here.
You might be interested in the Redwood Research reading list, which contains lots of analyses of these questions and many others.
As someone who’s worked at MIRI, I disagree regardless of when you are imagining them doing this.
Conditioned on agreeing with them about AI xrisk stuff and also about high level strategy, I think giving them money now seems better than in the past.
Strong upvoted to signal boost, but again note I don't know what I'm talking about.
Note that most of the compute in consumer laptops is in their GPUs not their CPUs, so comparing H100 flops to laptop CPU flops does not work for establishing the extent to which your policy would affect consumer laptops.
Yeah for sure. A really nice thing about the Tinker API is that it doesn't allow users to specify arbitrary code to be executed on the machine with weights, which makes security much easier.