Leveling Up: advice & resources for junior alignment researchers

Wiki Contributions


My own model differs a bit from Zach's. It seems to me like most of the publicly-available policy proposals have not gotten much more concrete. It feels a lot more like people were motivated to share existing thoughts, as opposed to people having new thoughts or having more concrete thoughts.

Luke's list, for example, is more of a "list of high-level ideas" than a "list of concrete policy proposals." It has things like "licensing" and "information security requirements"– it's not an actual bill or set of requirements. (And to be clear, I still like Luke's post and it's clear that he wasn't trying to be super concrete).

I'd be excited for people to take policy ideas and concretize them further. 

Aside:  When I say "concrete" in this context, I don't quite mean "people on LW would think this is specific." I mean "this is closer to bill text, text of a section of an executive order, text of an amendment to a bill, text of an international treaty, etc."

I think there are a lot of reasons why we haven't seen much "concrete policy stuff". Here are a few:

  • This work is just very difficult– it's much easier to hide behind vagueness when you're writing an academic-style paper than when you're writing a concrete policy proposal.
  • This work requires people to express themselves with more certainty/concreteness than academic-style research. In a paper, you can avoid giving concrete recommendations, or you can give a recommendation and then immediately mention 3-5 crucial considerations that could change the calculus. In bills, you basically just say "here is what's going to happen" and do much less "and here are the assumptions that go into this and a bunch of ways this could be wrong."
  • This work forces people to engage with questions that are less "intellectually interesting" to many people (e.g., which government agency should be tasked with X, how exactly are we going to operationalize Y?)
  • This work just has a different "vibe" to the more LW-style research and the more academic-style research. Insofar as LW readers are selected for (and reinforced for) liking a certain "kind" of thinking/writing, this "kind" of thinking/writing is different than the concrete policy vibe in a bunch of hard-to-articulate ways.
  • This work often has the potential to be more consequential than academic-style research. There are clear downsides of developing [and advocating for] concrete policies that are bad. Without any gatekeeping, you might have a bunch of newbies writing flawed bills. With excessive gatekeeping, you might create a culture that disincentivizes intelligent people from writing good bills. (And my own subjective impression is that the community erred too far on the latter side, but I think reasonable people could disagree here).

For people interested in developing the kinds of proposals I'm talking about, I'd be happy to chat. I'm aware of a couple of groups doing the kind of policy thinking that I would consider "concrete", and it's quite plausible that we'll see more groups shift toward this over time.  

Thanks for all of this! Here's a response to your point about committees.

I agree that the committee process is extremely important. It's especially important if you're trying to push forward specific legislation. 

For people who aren't familiar with committees or why they're important, here's a quick summary of my current understanding (there may be a few mistakes):

  1. When a bill gets introduced in the House or the Senate, it gets sent to a committee. The decision is made by the Speaker of the House or the priding officer in the Senate. In practice, however, they often defer to a non-partisan "parliamentarian" who specializes in figuring out which committee would be most appropriate. My impression is that this process is actually pretty legitimate and non-partisan in most cases(?). 
  2. It takes some degree of skill to be able to predict which committee(s) a bill is most likely to be referred to. Some bills are obvious (like an agriculture bill will go to an agriculture committee). In my opinion, artificial intelligence bills are often harder to predict. There is obviously no "AI committee", and AI stuff can be argued to affect multiple areas. With all that in mind, I think it's not too hard to narrow things down to ~1-3 likely committees in the House and ~1-3 likely committees in the Senate.
  3. The most influential person in the committee is the committee chair. The committee chair is the highest-ranking member from the majority party (so in the House, all the committee chairs are currently Republicans; in the Senate, all the committee chairs are currently Democrats). 
  4. A bill cannot be brought to the House floor or the Senate floor (cannot be properly debated or voted on) until it has gone through committee. The committee is responsible for finalizing the text of the bill and then voting on whether or not they want the bill to advance to the chamber (House or Senate). 
  5. The committee chair typically has a lot of influence over the committee. The committee chair determines which bills get discussed in committee, for how long, etc. Also, committee chairs usually have a lot of "soft power"– members of Congress want to be in good standing with committee chairs. This means that committee chairs often have the ability to prevent certain legislation from getting out of committee.
  6. If you're trying to get legislation passed, it's ideal to have the committee chair think favorably of that piece of legislation. 
  7. It's also important to have at least one person on the committee as someone who is willing to "champion" the bill. This means they view the bill as a priority & be willing to say "hey, committee, I really think we should be talking about bill X." A lot of bills die in committee because they were simply never prioritized. 
  8. If the committee chair brings the bill to a vote, and the majority of committee members vote in favor of the bill moving to the chamber, the bill can be discussed in the full chamber. Party leadership (Speaker of the House, Senate Majority Leader, etc.) typically play the most influential role in deciding which bills get discussed or voted on in the chambers. 
  9. Sometimes, bills get referred to multiple committees. This generally seems like "bad news" from the perspective of getting the bill passed, because it means that the bill has to get out of multiple committees. (Any single committee could essentially prevent the bill from being discussed in the chamber). 

(If any readers are familiar with the committee process, please feel free to add more info or correct me if I've said anything inaccurate.)

WTF do people "in AI governance" do?

Quick answer:

  1. A lot of AI governance folks primarily do research. They rarely engage with policymakers directly, and they spend much of their time reading and writing papers.
  2. This was even more true before the release of GPT-4 and the recent wave of interest in AI policy. Before GPT-4, many people believed "you will look weird/crazy if you talk to policymakers about AI extinction risk." It's unclear to me how true this was (in a genuine "I am confused about this & don't think I have good models of this" way). Regardless, there has been an update toward talking to policymakers about AI risk now that AI risk is a bit more mainstream. 
  3. My own opinion is that, even after this update toward policymaker engagement, the community as a whole is still probably overinvested in research and underinvested in policymaker engagement/outreach. (Of course, the two can be complimentary, and the best outreach will often be done by people who have good models of what needs to be done & can present high-quality answers to the questions that policymakers have). 
  4. Among the people who do outreach/policymaker engagement, my impression is that there has been more focus on the executive branch (and less on Congress/congressional staffers). The main advantage is that the executive branch can get things done more quickly than Congress. The main disadvantage is that Congress is often required (or highly desired) to make "big things" happen (e.g., setting up a new agency or a licensing regime).

I would also suspect that #2 (finding/generating good researchers) is more valuable than #1 (generating or accelerating good research during the MATS program itself).

One problem with #2 is that it's usually harder to evaluate and takes longer to evaluate. #2 requires projections, often over the course of years. #1 is still difficult to evaluate (what is "good alignment research" anyways?) but seems easier in comparison.

Also, I would expect correlations between #1 and #2. Like, one way to evaluate "how good are we doing at training researchers//who are the best researchers" is to ask "how good is the research they are producing//who produced the best research in this 3-month period?"

This process is (of course) imperfect. For example, someone might have great output because their mentor handed them a bunch of ready-to-go-projects, but the scholar didn't actually have to learn the important skills of "forming novel ideas" or "figuring out how to prioritize between many different directions." 

But in general, I think it's a pretty decent way to evaluate things. If someone has produced high-quality and original research during the MATS program, that sure does seem like a strong signal for their future potential. Likewise, in the opposite extreme, if during the entire summer cohort there were 0 instances of useful original work, that doesn't necessarily mean something is wrong, but it would make me go "hmmm, maybe we should brainstorm possible changes to the program that could make it more likely that we see high-quality original output next time, and then we see how much those proposed changes trade-off against other desireada."

(It seems quite likely to me that the MATS team has already considered all of this; just responding on the off-chance that something here is useful!)

Thanks for writing this! I’m curious if you have any information about the following questions:

  1. What does the MATS team think are the most valuable research outputs from the program?

  2. Which scholars was the MATS team most excited about in terms of their future plans/work?

IMO, these are the two main ways I would expect MATS to have impact: research output during the program and future research output/career trajectories of scholars.

Furthermore, I’d suspect things to be fairly tails-based (where EG the top 1-3 research outputs and the top 1-5 scholars are responsible for most of the impact).

Perhaps MATS as a program feels weird about ranking output or scholars so explicitly, or feels like it’s not their place.

But I think this kind of information seems extremely valuable. If I were considering whether or not I wanted to donate, for instance, my main questions would be “is the research good?” and “is the career development producing impactful people?” (as opposed to things like “what is the average rating on the EOY survey?”, though of course that information may matter for other purposes).

I'd expect the amount of time this all takes to be a function of the time-control.

Like, if I have 90 mins, I can allocate more time to all of this. I can consult each of my advisors at every move. I can ask them follow-up questions.

If I only have 20 mins, I need to be more selective. Maybe I only listen to my advisors during critical moves, and I evaluate their arguments more quickly. Also, this inevitably affects the kinds of arguments that the advisors give.

Both of these scenarios seem pretty interesting and AI-relevant. My all-things-considered guess would be that the 20 mins version yields high enough quality data (particularly for the parts of the game that are most critical/interesting & where the debate is most lively) that it's worth it to try with shorter time controls.

(Epistemic status: Thought about this for 5 mins; just vibing; very plausibly underestimating how time pressure could make the debates meaningless).

Is there a reason you’re using 3 hour time control? I’m guessing you’ve thought about this more than I have, but at first glance, it feels to me like this could be done pretty well with EG 60-min or even 20-min time control.

I’d guess that having 4-6 games that last 20-30 mins gives is better than having 1 game that lasts 2 hours.

(Maybe I’m underestimating how much time it takes for the players to give/receive advice. And ofc there are questions about the actual situations with AGI that we’re concerned about— EG to what extent do we expect time pressure to be a relevant factor when humans are trying to evaluate arguments from AIs?)

Thanks! And why did Holden have the ability to choose board members (and be on the board in the first place)?

I remember hearing that this was in exchange for OP investment into OpenAI, but I also remember Dustin claiming that OpenAI didn’t actually need any OP money (would’ve just gotten the money easily from another investor).

Is your model essentially that the OpenAI folks just got along with Holden and thought he/OP were reasonable, or is there a different reason Holden ended up having so much influence over the board?

Clarification/history question: How were these board members chosen?

Thanks for this dialogue. I find Nate and Oliver's "here's what I think will actually happen" thoughts useful.

I also think I'd find it useful for Nate to spell out "conditional on good things happening, here's what I think the steps look like, and here's the kind of work that I think people should be doing right now. To be clear, I think this is all doomed, and I'm only saying this being Akash directly asked me to condition on worlds where things go well, so here's my best shot."

To be clear, I think some people do too much "play to your outs" reasoning. In the excess, this can lead to people just being like "well maybe all we need to do is beat China" or "maybe alignment will be way easier than we feared" or "maybe we just need to bet on worlds where we get a fire alarm for AGI."

I'm particularly curious to see what happens if Nate tries to reason in this frame, especially since I expect his "play to your outs" reasoning/conclusions might look fairly different from that of others in the community.

Some examples of questions for Nate (and others who have written more about what they actually expect to happen and less about what happens if we condition on things going well):

  • Condition on the worlds in which we see substantial progress in the next 6 months. What are some things that have happened in those worlds? What does progress look like?
  • Condition on worlds in which the actions of the AIS community end up having a strong positive influence in the next 6 months. What are some wins that the AIS community (or specific actors within it) achieve?
  • Suppose for the sake of this conversation that you are fully adopting a "play to your outs" mentality. What outs do you see? Regardless of the absolute probabilities you assign, which of these outs seem most likely and most promising?
  • All things considered, what do you currently see as the most impactful ways you can spend your time?
  • All things considered, what do you currently see as the most impactful ways that "highly talented comms/governance/policy people can be spending their time?" (can divide into more specific subgroups if useful). 

I'll also note that I'd be open to having a dialogue about this with Nate (and possibly other "doomy" people who have not written up their "play to your outs" thoughts).

Load More