Evaluations (of new AI Safety researchers) can be noisy

I agree with a lot of this post.

Relatedly: in my experience, junior people wildly overestimate the extent to which senior people form confident and sticky negative evaluations of them. I basically never form a confident negative impression of someone's competence from a single interaction with them, and I place pretty substantial probability on people changing substantially over the course of a year or two.

I think that many people perform very differently in different job situations. When someone performs poorly in a job, I usually only update mildly against them performing well in a different role.

[-]Ansh Radhakrishnan3yΩ583

Thanks for this post Lawrence! I agree with it substantially, perhaps entirely.

One other thing that I thing interacts with the difficulty of evaluation in some ways is the fact that many AI safety researchers think that most of the work done by some other researchers is approximately useless, or even net-negative in terms of reducing existential risk. I think it's pretty easy to wrap an evaluation of a research direction or agenda and an evaluation of a particular researcher together. I think this is actually pretty justified for more senior researchers, since presumably an important skill is "research taste", but I think it's also important to acknowledge that this is pretty subjective and that there's substantial disagreement about the utility of different research directions among senior safety researchers. It seems probably good to try and disentangle this when evaluating junior researchers, as much as is possible, and instead try to focus on "core competencies" that are likely to be valuable across a wide range of safety research directions, though even then the evaluation of this can be difficult and noisy, as the OP argues.

[-]Neel Nanda3yΩ382

I appreciate this post, and vibe a lot!

Different jobs require different skills.

Very strongly agreed, I did 3 different AI Safety internships in different areas, where I think I was fairly mediocre in each, before I found that mech interp was a good fit.

Also strongly agreed on the self-evaluation point, I'm still not sure I really internally believe that I'm good at mech interp, despite having pretty solid confirmation from my research output at this point - I can't really imagine having it before completing my first real project!

[-]Jozdien3y71

I think this post is valuable, thank you for writing it. I especially liked the parts where you (and Beth) talk about historical negative signals. To a certain kind of person, I think that can serve better than anything else as stronger grounding to push back against unjustified updating.

A factor that I think pulls more weight in alignment relative to other domains is the prevalence of low-bandwidth communication channels, given the number of new researchers whose sole interface with the field is online and asynchronous, textual or few-and-far-between calls. Effects from updating too hard on negative evals is probably amplified a lot when those form a bulk of the reinforcing feedback you get at all. To the point where at times for me it's felt like True Bayesian Updating from the inside even as you acknowledge the noisiness of those channels, because there's little counterweight to it.

My experience here probably isn't super standard given that most of the people I've mentored coming into this field aren't located near the Bay Area or London or anywhere else with other alignment researchers, but their sole point of interface to the rest of the field being a sparse opaque section of text has definitely discouraged some far more than anything else.

[-]LawrenceC1y*Ω552Review for 2023 Review

I think this post made an important point that's still relevant to this day.

If anything, this post is more relevant in late 2024 than in early 2023, as the pace of AI makes ever more people want to be involved, while more and more mentors have moved towards doing object level work. Due to the relative reduction of capacity in evaluating new AIS researchers, there's more reliance on systems or heuristics to evaluate people now than in early 2023.

Also, I find it amusing that without the parenthetical, the title of the post makes another important point: "evals are noisy".

[-]Ben Pace3yΩ240

As part of my work at Lightcone I manage an office space with an application for visiting or becoming a member, and indeed many of these points commonly apply to rejection emails I send to people, especially "Most applications just don’t contain that much information" and "Not all relevant skills show up on paper".

I try to include some similar things to the post in the rejection emails we send. In case it's of interest or you have any thoughts, here's the standard paragraph that I include:

Our application process is fairly lightweight and so I don't think a no is a strong judgment about a person's work. If you end up in the future working on new projects that you think are a good fit for Lightcone Offices, you're welcome to apply again. Also if you're ever collaborating on a project with a member of the Lightcone Offices, you can visit with them to work together. Good luck in finding traction on improving the trajectory of human civilization.

[-]DragonGod3y30

At what point do you consider yourself a researcher and not just a noob, or someone who wants to one day become a researcher?

[This is actually a very important question for my self narrative; for how I relate to my AI safety writing, for what standards I expect of myself (is my AI safety writing currently a hobby that I hope to later turn into a job/or should I treat it as a volunteer job?), etc. I don't really have an answer, but I had mostly been thinking of myself as "someone who wants to one day become an AI safety researcher" (2022 shortened my timelines (suddenly I no longer had a decade to learn all the maths and CS before making useful contributions to alignment theory), and so I brought "one day" sooner, but I'm still at best "aspiring" to be one.

Learning that an actual researcher™ I respected was younger than me was a massive slap to my face/wakeup call (we discovered LW at the same age/stage in our lives, so there's a sense in which I have a: "what was I doing with my life all this time?"/felt like I've fallen behind [yeah, I am status brainkilled]).]

[-]Orpheus163y20

Great post. I expect to recommend it at least 10 times this year.

Semi-related point: I often hear people get discouraged when they don't have "good ideas" or "ideas that they believe in" or "ideas that they are confident would actually reduce x-risk." (These are often people who see the technical alignment problem as Hard or Very Hard).

I'll sometimes ask "how many other research agendas do you think meet your bar for "an idea you believe in" or "an idea that you are confident would actually reduce x-risk?" Often, when considering the entire field of technical alignment, their answer is <5 or <10.

While reality doesn't grade on a curve, I think it has sometimes been helpful for people to reframe "I have no good ideas" --> "I believe the problem we are facing is Hard or Very Hard. Among the hundreds of researchers who are thinking about this, I think only a few of them have met the bar that I sometimes apply to myself & my ideas."

(This is especially useful when people are using a harsher bar to evaluate themselves than when they evaluate others, which I think is common).

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

[-]sudo3y10

I’m a fan of this post, and I’m very glad you wrote it.

[-]Dennis Akar3y10

I have been feeling extremely impostery lately and do agree on the critical self-evaluation tendency. For the last month or so I felt entirely stuck with even the idea of an application giving me severe anxiety. Have been overcoming this slightly lately but I think this post and the conversations it caused has made em feel better. Thank you.

^{^}

I think this probably also applies in general, but I’m much less sure than in the case of AI research. As always, the law of equal and opposite advice applies. It’s okay to take it easy, and to do what you need to do to recover. I also don’t think that everyone should aim to be an AI safety researcher – my focus is on this field because it’s what I’m most familiar with. If you’ve found something else you’re good at, you probably should keep doing it.

^{^}

I also think there’s a separate problem, where people take positive evaluations of their peers way too seriously. E.g. people seem to noticeably change in attitude if you mention you’ve worked with a high status person at some point in your life. I claim that this is also very bad, but it’s not the focus of the post.

^{^}

This also happens to a comical extent with papers at conferences. E.g. Neel Nanda's grokking work was rejected twice from arXiv (!) but an updated version got a spotlight at ICLR. Redwood's adversarial training paper got a 3, a 5, and a 9 for its initial reviews. In fact, I know of several papers that got orals at conferences, that were rejected entirely from the previous conference.

^{^}

I also feel like this is exacerbated by several social dynamics in the Bay Area, which I might eventually write a post about.

^{^}

If there’s significant interest or if I feel like people are taking this advice too far, I’ll write a followup post giving the opposite advice.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

132

Evaluations (of new AI Safety researchers) can be noisy

132

Ω 61

132

Ω 61

Introduction: evaluating skill is hard, and most evaluations are done via proxies

My personal experience

Why exactly are common evaluations so noisy?

Bootcamp/Funding/Job Applications

First impressions at parties/conferences/workshops

Job Performance

Yes, this includes your evaluations as well.

On anxious underconfidence and self-handicapping

What does this mean you should do?

Acknowledgments

Appendix: testimonials from other researchers

Addendum from Beth Barnes

Addendum from Scott Emmons

Addendum from anonymous senior AGI safety researcher