LESSWRONG
LW

615
Buck
15561Ω3143486042
Message
Dialogue
Subscribe

CEO at Redwood Research.

AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

If we are ever arguing on LessWrong and you feel like it's kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I'll probably be willing to call to discuss briefly.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
12Buck's Shortform
Ω
6y
Ω
302
GradientDissenter's Shortform
Buck1h20

For example, loosening voter ID laws.

My understanding is that voter ID laws are probably net helpful for Democrats at this point. 

Reply1
Mo Putera's Shortform
Buck1d50

Notably, I think I disagree with Eliezer on what his moat is! I think he thinks that he's much better at coming to correct conclusions or making substantial intellectual progress than I think he is.

Reply1
Mo Putera's Shortform
Buck2d20

This doesn't feel that surprising to me. I guess my model is that different skills are correlated, and then if you pick someone who's extremely capable at a couple of skills, it's not that surprising if no one Pareto dominates them.

I agree that my point isn't really responding to whether it's surprising that there's no one who Pareto dominates him. 

Reply
Mo Putera's Shortform
Buck2d*399

(Hopefully it's not rude to state my personal impression of Eliezer as a thinker. I think he's enough of a public figure that it's acceptable for me to comment on it. I'd like to note that I have benefited in many important ways from Eliezer's writing and ideas, and I've generally enjoyed interacting with him in person, and I'm sad that as a result of some of our disagreements our interactions are tense.)

Yeah, I agree that there's no one who Pareto dominates Eliezer at his top four most exceptional traits. (Which I guess I'd say are: taking important weird ideas seriously, writing compelling/moving/insightful fiction (for a certain audience), writing compelling/evocative/inspiring stuff about how humans should relate to rationality (for a certain audience), being broadly knowledgeable and having clever insights about many different fields.)

(I don't think that he's particularly good at thinking about AI; at the very least he is nowhere near as exceptional as he is at those other things.)

I'm not trying to disagree with you. I'm just going to ruminate unstructuredly a little on this:

I know a reasonable number of exceptional people. I am involved in a bunch of conversations about what fairly special people should do. In my experience, when you're considering two people who might try to achieve a particular goal, it's usually the case that each has some big advantages over the other in terms of personal capabilities. So, they naturally try to approach it fairly differently. We can think about this in the case where you are hiring CEOs for a project or speculating about what will happen when companies headed by different CEOs compete. 

For example, consider the differences between Sam Altman and Dario Amodei (I don't know either that well, nor do I understand the internal workings of OpenAI/Anthropic, so I'm sort of speculating here):

  • Dario, unlike Sam, is a good ML researcher. This means that Sam needs to depend more on technical judgment from other people.
  • Sam had way more connections in Silicon Valley tech, at least when Anthropic was founded.
  • Dario has lots of connections to the EA community and was able to hire a bunch of EAs.
  • Sam is much more suave in a certain way than Dario is. This benefits each for different audiences.

Both of them have done pretty well for themselves in similar roles.

As a CEO, it does feel pretty interesting how non-interchangeable most people are. And it's interesting how in a lot of cases, it's possible to compensate for one weakness with a strength that seems almost unrelated. 


If Eliezer had never been around, my guess is that the situation around AI safety would be somewhat but not incredibly different (though probably overall substantially worse):

  • Nick Bostrom and Carl Shulman and friends were talking about all this stuff,
  • Shulman and Holden Karnofsky would have met and talked about AI risk.
  • I'm pretty sure Paul Christiano would have run across all this and started thinking about it, though perhaps more slowly? He might have tried harder to write for a public audience or get other people to if Less Wrong didn't already exist.
  • The early effective altruists would have run across these ideas and been persuaded by them, though somewhat more slowly?
  • I'm not sure whether more or less EA community building would have happened 2016-2020. It would have been less obvious that community building efforts could work in principle, but less of the low-hanging fruit would have been plucked.
  • EA idea-spreading work would have been more centered around the kinds of ideas that non-Eliezer people are drawn to.
  • My guess is that the quality of ideas in the AI safety space would probably be better at this point?

Maybe a relevant underlying belief of mine is that Eliezer is very good at coming up with terms for things and articulating why something is important, and he also had the important strength of realizing how important AI was before that many other people had done so. But I don't think his thinking about AI is actually very good on the merits. Most of the ideas he's spread were originally substantially proposed by other people; his contribution was IMO mostly his reframings and popularizations. And I don't think his most original ideas actually look that good. (See here for an AI summary.)

Reply1111
Mo Putera's Shortform
Buck4d*4527

I think Eliezer underestimates other people because he evaluates them substantially based on how much they agree with him, and, as a consequence of him having a variety of dumb takes, smart people usually disagree with him about a bunch of stuff.

Reply
Buck's Shortform
Buck8d*6743

I'd be really interested in someone trying to answer the question: what updates on the a priori arguments about AI goal structures should we make as a result of empirical evidence that we've seen? I'd love to see a thoughtful and comprehensive discussion of this topic from someone who is both familiar with the conceptual arguments about scheming and also relevant AI safety literature (and maybe AI literature more broadly).

Maybe a good structure would be, from the a priori arguments, identifying core uncertainties like "How strong is the imitative prior?" And "How strong is the speed prior?" And  "To what extent do AIs tend to generalize versus learn narrow heuristics?" and tackling each. (Of course, that would only make sense if the empirical updates actually factor nicely into that structure.)

I feel like I understand this very poorly right now. I currently think the only important update that empirical evidence has given me, compared to the arguments in 2020, is that the human-imitation prior is more powerful than I expected. (Though of course it's unclear whether this will continue (and basic points like the expected increasing importance of RL suggest that it will be less powerful over time.)) But to my detriment, I don't actually read the AI safety literature very comprehensively, and I might be missing empirical evidence that really should update me.

Reply6
eggsyntax's Shortform
Buck14d*253

That's correct. Ryan summarized the story as:

Here’s the story of this paper. I work at Redwood Research (@redwood_ai) and this paper is a collaboration with Anthropic. I started work on this project around 8 months ago (it's been a long journey...) and was basically the only contributor to the project for around 2 months.

By this point, I had much of the prompting results and an extremely jank prototype of the RL results (where I fine-tuned llama to imitate Opus and then did RL). From here, it was clear that being able to train Claude 3 Opus could allow for some pretty interesting experiments.

After showing @EvanHub and others my results, Anthropic graciously agreed to provide me with employee-level model access for the project. We decided to turn this into a bigger collaboration with the alignment-stress testing team (led by Evan), to do a more thorough job.

This collaboration yielded the synthetic document fine-tuning and RL results and substantially improved the writing of the paper. I think this work is an interesting example of an AI company boosting safety research by collaborating and providing model access.

So Anthropic was indeed very accommodating here; they gave Ryan an unprecedented level of access for this work, and we're grateful for that. (And obviously, individual Anthropic researchers contributed a lot to the paper, as described in its author contribution statement. And their promotion of the paper was also very helpful!)

My objection is just that this paragraph of yours is fairly confused:

We don’t want to shoot the messenger — they went looking. They didn’t have to do that. They told us the results, and they didn’t have to do that. Anthropic finding these results is Anthropic being good citizens. And you want to be more critical of the A.I. companies that didn’t go looking.

This paper wasn't a consequence of Anthropic going looking, it was a consequence of Ryan going looking. If Anthropic hadn't wanted to cooperate, then Ryan would have just published his results without Anthropic's help, which would have been a moderately worse paper that would have probably gotten substantially less attention, but Anthropic didn't have the opportunity to not publish (a crappier version of) the core results.

Just to be clear, I don't think this is that big a deal. It's a bummer that Redwood doesn't get as much credit for this paper as we deserve, but this is pretty unavoidable given how much more famous Anthropic is; my sense is that it's worth the effort for safety people to connect the paper to Redwood/Ryan when discussing it, but it's no big deal. I normally don't bother to object to that credit misallocation. But again, the story of the paper conflicted with these sentences you said, which is why I bothered bringing it up.

Reply
Considerations around career costs of political donations
Buck15d*00

Re your last paragraph: as the post notes, it is illegal to discriminate based on political donations when hiring for civil service roles.

EDIT: Readers of this thread should bear in mind that Max H is not Max Harms! I was confused about this.

Reply1
eggsyntax's Shortform
Buck16d2514

Alignment faking, and the alignment faking research was done at Anthropic.

And we want to give credit to Anthropic for this. We don’t want to shoot the messenger — they went looking. They didn’t have to do that. They told us the results, and they didn’t have to do that. Anthropic finding these results is Anthropic being good citizens. And you want to be more critical of the A.I. companies that didn’t go looking.

It would be great if Eliezer knew that (or noted, if he knows but is just phrasing it really weirdly) the alignment faking paper research was initially done at Redwood by Redwood staff; I'm normally not prickly about this but it seems directly relevant to what Eliezer said here.

Reply
faul_sname's Shortform
Buck17d3-2

It really depends on what you mean by "most of the time when people say this". I don't think my experience matches yours.

Reply1
Load More
No wikitag contributions to display.
34Rogue internal deployments via external APIs
Ω
20d
Ω
4
91The Thinking Machines Tinker API is good news for AI control and security
Ω
1mo
Ω
10
193Christian homeschoolers in the year 3000
1mo
64
208I enjoyed most of IABIED
2mo
46
215An epistemic advantage of working as a moderate
2mo
96
48Four places where you can put LLM monitoring
Ω
3mo
Ω
0
25Research Areas in AI Control (The Alignment Project by UK AISI)
Ω
3mo
Ω
0
49Why it's hard to make settings for high-stakes control research
Ω
4mo
Ω
6
91Recent Redwood Research project proposals
Ω
4mo
Ω
0
190Lessons from the Iraq War for AI policy
4mo
25
Load More