Akash

Sequences

Leveling Up: advice & resources for junior alignment researchers

Wiki Contributions

Comments

Thanks for this detailed response; I found it quite helpful. I maintain my "yeah, they should probably get as much funding as they want" stance. I'm especially glad to see that Lightcone might be interested in helping people stay sane/grounded as many people charge into the policy space. 

I ended up deciding to instead publish a short post, expecting that people will write a lot of questions in the comments, and then to engage straightforwardly and transparently there, which felt like a way that was more likely to end up with shared understanding.

This seems quite reasonable to me. I think it might've been useful to include something short in the original post that made this clear. I know you said "also feel free to ask any questions in the comments"; in an ideal world, this would probably be enough, but I'm guessing this isn't enough given power/status dynamics. 

For example, if ARC Evals released a post like this, I expect many people would experience friction that prevented them from asking (or even generating) questions that might (a) make ARC Evals look bad, (b) make the commenter seem dumb, or (c) potentially worsen the relationship between the commenter and ARC evals. 

To Lightcone's credit, I think Lightcone has maintained a (stronger) reputation of being fairly open to objections (and not penalizing people for asking "dumb questions" or something like that), but the Desire Not to Upset High-status People or Desire Not to Look Dumb In Front of Your Peers By Asking Things You're Already Supposed to Know are strong. 

I'm guessing that part of why I felt comfortable asking (and even going past the "yay, I like Lightcone and therefore I support this post" to the mental motion of "wait, am I actually satisfied with this post? What questions do I have") is that I've had a chance to interact in-person with the Lightcone team on many occasions, so I felt considerably less psychological friction than most.

All things considered, perhaps an ideal version of the post would've said something short like "we understand we haven't given any details about what we're actually planning to do or how we'd use the funding. This is because Oli finds this stressful. But we actually really want you to ask questions, even "dumb questions", in the comments."

(To be clear I don't think the lack of doing this was particularly harmful, and I think your comment definitely addresses this. I'm nit-picking because I think it's an interesting microcosm of broader status/power dynamics that get in the way of discourse, and because I expect the Lightcone team to be unusually interested in this kind of thing.)

I'm a fan of the Lightcone team & I think they're one of the few orgs where I'd basically just say "yeah, they should probably just get as much funding as they want."

With that in mind, I was surprised by the lack of information in this funding request. I feel mixed about this: high-status AIS orgs often (accurately) recognize that they don't really need to spend time justifying their funding requests, but I think this often harms community epistemics (e.g., by leading to situations where everyone is like "oh X org is great-- I totally support them" without actually knowing much about what work they're planning to do, what models they have, etc.)

Here are some questions I'm curious about:

  • What are Lightcone's plans for the next 3-6 months? (is it just going to involve continuing the projects that were listed?)
  • How is Lightcone orienting to the recent rise in interest in AI policy? Which policy/governance plans (if any) does Lightcone support?
  • What is Lightcone's general worldview/vibe these days? (Is it pretty much covered in this post?) Where does Lightcone disagree with other folks who work on reducing existential risk?
  • What are Lightcone's biggest "wins" and "losses" over the past ~3-6 months?
  • How much funding does Lightcone think it could usefully absorb, and what would the money be used for?
Akash8dΩ270

I generally don't find writeups of standards useful, but this piece was an exception. Below, I'll try to articulate why:

I think AI governance pieces-- especially pieces about standards-- often have overly vague language. People say things like "risk management practices" or "third-party audits", phrases that are generally umbrella terms that lack specificity. These sometimes serve as applause lights (whether the author intended this or not): who could really disagree with the idea of risk management?

I liked that this piece (fairly unapologetically) advocates for specific things that labs should be doing. As an added benefit, the piece causes the reader to realize "oh wow, there's a lot of stuff that our civilization already does to mitigate this other threat-- if we were actually prioritizing AI x-risk as seriously as pandemics, there's a bunch of stuff we'd be doing differently." 

Naturally, there are some areas that lack detail (e.g., how do biolabs do risk assessments and how would AI labs do risk assessments?), but the reader is at least left with some references that allow them to learn more. I think this moves us to an acceptable level of concreteness, especially for a shallow investigation.

I think this is also the kind of thing that I could actually see myself handing to a policymaker or staffer (in reality, since they have no time, I would probably show them a one-pager or two-pager version of this, with this longer version linked).

I'll likely send this to junior AI governance folks as a solid example of a shallow investigation that could be helpful. In terms of constructive feedback, I think the piece could've had a TLDR section (or table) that directly lists each of the recommendations for AI labs and each of the analogs in biosafety. [I might work on such a table and share it here if I produce it]. 

Congratulations on launching!

On the governance side, one question I'd be excited to see Apollo (and ARC evals & any other similar groups) think/write about is: what happens after a dangerous capability eval goes off? 

Of course, the actual answer will be shaped by the particular climate/culture/zeitgeist/policy window/lab factors that are impossible to fully predict in advance.

But my impression is that this question is relatively neglected, and I wouldn't be surprised if sharp newcomers were able to meaningfully improve the community's thinking on this. 

Excited to see this! I'd be most excited about case studies of standards in fields where people didn't already have clear ideas about how to verify safety.

In some areas, it's pretty clear what you're supposed to do to verify safety. Everyone (more-or-less) agrees on what counts as safe.

One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.

Are there examples of standards in other industries where people were quite confused about what "safety" would require? Are there examples of standards that are specific enough to be useful but flexible enough to deal with unexpected failure modes or threats? Are there examples where the standards-setters acknowledged that they wouldn't be able to make a simple checklist, so they requested that companies provide proactive evidence of safety?

I've been working on a response to the NTIA request for comments on AI Accountability over the last few months. It's likely that I'll also submit something to the OSTP request.

I've learned a few useful things from talking to AI governance and policy folks. Some of it is fairly intuitive but still worth highlighting (e.g., try to avoid jargon, remember that the reader doesn't share many assumptions that people in AI safety take for granted, remember that people have many different priorities). Some of it is less intuitive (e.g., what actually happens with the responses? How long should your response be? How important is it to say something novel? What kinds of things are policymakers actually looking for?)

If anyone is looking for advice, feel free to DM me. 

Glad to see this write-up & excited for more posts.

I think these are three areas that MATS feels like it has handled fairly well. I'd be especially excited to hear more about areas where MATS thinks it's struggling, MATS is uncertain, or where MATS feels like it has a lot of room to grow. Potential candidates include:

  • How is MATS going about talent selection and advertising for the next cohort, especially given the recent wave of interest in AI/AI safety?
  • How does MATS intend to foster (or recruit) the kinds of qualities that strong researchers often possess?
  • How does MATS define "good" alignment research? 

Other things I'm be curious about:

  • Which work from previous MATS scholars is the MATS team most excited about? What are MATS's biggest wins? Which individuals or research outputs is MATS most proud of?
  • Most peoples' timelines have shortened a lot since MATS was established. Does this substantially reduce the value of MATS (relative to worlds with longer timelines)?
  • Does MATS plan to try to attract senior researchers who are becoming interested in AI Safety (e.g., professors, people with 10+ years of experience in industry)? Or will MATS continue to recruit primarily from the (largely younger and less experienced) EA/LW communities?

(Pasting this exchange from a comment thread on the EA Forum; bolding added)

Peter Park:

Thank you so much for your insightful and detailed list of ideas for AGI safety careers, Richard! I really appreciate your excellent post.

I would propose explicitly grouping some of your ideas and additional ones under a third category: “identifying and raising public awareness of AGI’s dangers.” In fact, I think this category may plausibly contain some of the most impactful ideas for reducing catastrophic and existential risks, given that alignment seems potentially difficult to achieve in a reasonable period of time (if ever) and the implementation of governance ideas is bottlenecked by public support.

For a similar argument that I found particularly compelling, please check out Greg Colbourn’s recent post: https://forum.effectivealtruism.org/posts/8YXFaM9yHbhiJTPqp/agi-rising-why-we-are-in-a-new-era-of-acute-risk-and

Richard:

I don't actually think the implementation of governance ideas is mainly bottlenecked by public support; I think it's bottlenecked by good concrete proposals. And to the extent that it is bottlenecked by public support, that will change by default as more powerful AI systems are released.

Akash:

I appreciate Richard stating this explicitly. I think this is (and has been) a pretty big crux in the AI governance space right now.

Some folks (like Richard) believe that we're mainly bottlenecked by good concrete proposals. Other folks believe that we have concrete proposals, but we need to raise awareness and political support in order to implement them.

I'd like to see more work going into both of these areas. On the margin, though, I'm currently more excited about efforts to raise awareness [well], acquire political support, and channel that support into achieving useful policies. 

I think this is largely due to (a) my perception that this work is largely neglected, (b) the fact that a few AI governance professionals I trust have also stated that they see this as the higher priority thing at the moment, and (c) worldview beliefs around what kind of regulation is warranted (e.g., being more sympathetic to proposals that require a lot of political will).

Nice-- very relevant. I agree with Evan that arguments about the training procedure will be relevant (I'm more uncertain about whether checking for deception behaviorally will be harder than avoiding it, but it certainly seems plausible). 

Ideally, I think the regulators would be flexible in the kind of evidence they accept. If a developer has evidence that the model is not deceptive that relies on details about the training procedure, rather than behavioral testing, that could be sufficient.

(In fact, I think arguments that meet some sort of "beyond-a-reasonable-doubt" threshold would likely involve providing arguments for why the training procedure avoids deceptive alignment.)

Can you say more about what part of this relates to a ban on AI development?

I think the claim "AI development should be regulated in a way such that the burden of proof is on developers to show beyond-a-reasonable-doubt that models are safe" seems quite different from the claim "AI development should be banned", but it's possible that I'm missing something here or communicating imprecisely. 

Load More