LESSWRONG
LW

324
Jeremy Gillen
2589Ω154102851
Message
Dialogue
Subscribe

I'm interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.

I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek's team at MIRI.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
6Jeremy Gillen's Shortform
3y
57
AI safety undervalues founders
Jeremy Gillen11h84

(Ryan is correct about what I'm referring to, and I don't know any details).

I want to say publicly, since my comment above is a bit cruel in singling out MATS specifically: I think MATS is the most impressively well-run organisation that I've encountered, and overall supports good research. Ryan has engaged at length with my criticisms (both now and when I've raised them before), as have others on the MATS team, and I appreciate this a lot.

Ultimately most of our disagreements are about things that I think a majority of "the alignment field" is getting wrong. I think most people don't consider it Ryan's responsibility to do better at research prioritization than the field as a whole. But I do. It's easy to shirk responsibility by deferring to committees, so I don't consider that a good excuse. 

A good excuse is defending the object-level research prioritization decisions, which Ryan and other MATS employees happily do. I appreciate them for this, and we agree to disagree for now.

Tying back to the OP, I maintain that multiplier effects are often overrated because of people "slipping off the real problem" and this is a particularly large problem with founders of new orgs.

Reply1
AI safety undervalues founders
Jeremy Gillen1d*3536

I want to register disagreement. Multiplier effects are difficult to get and easy to overestimate. It's very difficult to get other people working on the right problem, rather than slipping off and working on an easier but ultimately useless problem. From my perspective, it looks like MATS fell into this exact trap. MATS has kicked out ~all the mentors who were focused on real problems (in technical alignment) and has a large stack of new mentors working on useless but easy problems.

[Edit 5hrs later: I think this has too much karma because it's political and aggressive. It's a very low effort criticism without argument.]

Reply21
Resampling Conserves Redundancy (Approximately)
Jeremy Gillen7d40

By the way, there seems to be an issue where sympy silently drops precision under some circumstances. Definitely a bug. A couple of times it's caused non-trivial errors in my KLs. It's pretty rare, but I don't know any way to completely avoid it. Thinking of switching to a different library.

Reply2
Some data from LeelaPieceOdds
Jeremy Gillen12d100

Relevant comment on reddit from someone working on Leela Odds:

Reply1
Daniel Tan's Shortform
Jeremy Gillen12d94

Why would models start out aligned by default? 

Reply
Some data from LeelaPieceOdds
Jeremy Gillen15d*140

This is the best I've got so far. I estimated the rating using the midpoint of a logistic regression fit to the games. The first few especially seem to have been inflated due to not having enough high rated players in the data, so it had to extrapolate. And they all seem inflated by (I'd guess) a couple of hundred points due to the effects I mentioned in the post. (Edit: Please don't share the graph alone without this context).

The NN rating in the Blitz data highlights the flaw in this method of estimating the rating.

I haven't found a way to get similar data on human vs human games.

Reply
Some data from LeelaPieceOdds
Jeremy Gillen16d20

Took a while to download all this. I'm curious what your blitz rating is?

Reply
shortplav
Jeremy Gillen17d20

Does that sound right?

Can't give a confident yes because I'm pretty confused about this topic, and I'm pretty unhappy currently with the way the leverage prior mixes up action and epistemics. The issue about discounting theories of physics if they imply high leverage seems really bad? I don't understand whether the UDASSA thing fixes this. But yes.

That avoids the "how do we encode numbers" question that naturally raises itself.

I'm not sure how natural the encoding question is, there's probably an AIT answer to this kind of question that I don't know.

Reply
Some data from LeelaPieceOdds
Jeremy Gillen17d40

By "control plausibly works" I didn't mean "Stuff like existing monitoring will work to control AIs forever". I meant it works if it is a stepping stone allows us to accelerate/finish alignment research, and thereby build aligned AGI. 

Reply
Some data from LeelaPieceOdds
Jeremy Gillen17d20

I think several of the subquestions that matter for whether it'll plausibly work to have AI solve alignment for us are in the second category. Like the two points I mentioned in the post. I think there are other subquestions that are more in the first category, which are also relevant to the odds of success. I'm relatively low confidence about this kind of stuff because of all the normal reasons why it's difficult to say how other people should be thinking. It's easy to miss relevant priors, evidence, etc. But still... given what I know about what everyone believes, it looks like these questions should be resolvable among reasonable people.

Reply
Load More
Eurisko
7 months ago
Eurisko
7 months ago
(+7/-6)
36AI Corrigibility Debate: Max Harms vs. Jeremy Gillen
4d
1
66Some data from LeelaPieceOdds
19d
21
70Detect Goodhart and shut down
10mo
21
31Context-dependent consequentialism
Ω
1y
Ω
6
161Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Ω
2y
Ω
60
175Thomas Kwa's MIRI research experience
2y
53
38AISC team report: Soft-optimization, Bayes and Goodhart
2y
2
119Soft optimization makes the value target bigger
Ω
3y
Ω
20
6Jeremy Gillen's Shortform
3y
57
76Neural Tangent Kernel Distillation
3y
20
Load More