Thanks for all of this! Here's a response to your point about committees.
I agree that the committee process is extremely important. It's especially important if you're trying to push forward specific legislation.
For people who aren't familiar with committees or why they're important, here's a quick summary of my current understanding (there may be a few mistakes):
(If any readers are familiar with the committee process, please feel free to add more info or correct me if I've said anything inaccurate.)
WTF do people "in AI governance" do?
Quick answer:
I would also suspect that #2 (finding/generating good researchers) is more valuable than #1 (generating or accelerating good research during the MATS program itself).
One problem with #2 is that it's usually harder to evaluate and takes longer to evaluate. #2 requires projections, often over the course of years. #1 is still difficult to evaluate (what is "good alignment research" anyways?) but seems easier in comparison.
Also, I would expect correlations between #1 and #2. Like, one way to evaluate "how good are we doing at training researchers//who are the best researchers" is to ask "how good is the research they are producing//who produced the best research in this 3-month period?"
This process is (of course) imperfect. For example, someone might have great output because their mentor handed them a bunch of ready-to-go-projects, but the scholar didn't actually have to learn the important skills of "forming novel ideas" or "figuring out how to prioritize between many different directions."
But in general, I think it's a pretty decent way to evaluate things. If someone has produced high-quality and original research during the MATS program, that sure does seem like a strong signal for their future potential. Likewise, in the opposite extreme, if during the entire summer cohort there were 0 instances of useful original work, that doesn't necessarily mean something is wrong, but it would make me go "hmmm, maybe we should brainstorm possible changes to the program that could make it more likely that we see high-quality original output next time, and then we see how much those proposed changes trade-off against other desireada."
(It seems quite likely to me that the MATS team has already considered all of this; just responding on the off-chance that something here is useful!)
Thanks for writing this! I’m curious if you have any information about the following questions:
What does the MATS team think are the most valuable research outputs from the program?
Which scholars was the MATS team most excited about in terms of their future plans/work?
IMO, these are the two main ways I would expect MATS to have impact: research output during the program and future research output/career trajectories of scholars.
Furthermore, I’d suspect things to be fairly tails-based (where EG the top 1-3 research outputs and the top 1-5 scholars are responsible for most of the impact).
Perhaps MATS as a program feels weird about ranking output or scholars so explicitly, or feels like it’s not their place.
But I think this kind of information seems extremely valuable. If I were considering whether or not I wanted to donate, for instance, my main questions would be “is the research good?” and “is the career development producing impactful people?” (as opposed to things like “what is the average rating on the EOY survey?”, though of course that information may matter for other purposes).
I'd expect the amount of time this all takes to be a function of the time-control.
Like, if I have 90 mins, I can allocate more time to all of this. I can consult each of my advisors at every move. I can ask them follow-up questions.
If I only have 20 mins, I need to be more selective. Maybe I only listen to my advisors during critical moves, and I evaluate their arguments more quickly. Also, this inevitably affects the kinds of arguments that the advisors give.
Both of these scenarios seem pretty interesting and AI-relevant. My all-things-considered guess would be that the 20 mins version yields high enough quality data (particularly for the parts of the game that are most critical/interesting & where the debate is most lively) that it's worth it to try with shorter time controls.
(Epistemic status: Thought about this for 5 mins; just vibing; very plausibly underestimating how time pressure could make the debates meaningless).
Is there a reason you’re using 3 hour time control? I’m guessing you’ve thought about this more than I have, but at first glance, it feels to me like this could be done pretty well with EG 60-min or even 20-min time control.
I’d guess that having 4-6 games that last 20-30 mins gives is better than having 1 game that lasts 2 hours.
(Maybe I’m underestimating how much time it takes for the players to give/receive advice. And ofc there are questions about the actual situations with AGI that we’re concerned about— EG to what extent do we expect time pressure to be a relevant factor when humans are trying to evaluate arguments from AIs?)
Thanks! And why did Holden have the ability to choose board members (and be on the board in the first place)?
I remember hearing that this was in exchange for OP investment into OpenAI, but I also remember Dustin claiming that OpenAI didn’t actually need any OP money (would’ve just gotten the money easily from another investor).
Is your model essentially that the OpenAI folks just got along with Holden and thought he/OP were reasonable, or is there a different reason Holden ended up having so much influence over the board?
Thanks for this dialogue. I find Nate and Oliver's "here's what I think will actually happen" thoughts useful.
I also think I'd find it useful for Nate to spell out "conditional on good things happening, here's what I think the steps look like, and here's the kind of work that I think people should be doing right now. To be clear, I think this is all doomed, and I'm only saying this being Akash directly asked me to condition on worlds where things go well, so here's my best shot."
To be clear, I think some people do too much "play to your outs" reasoning. In the excess, this can lead to people just being like "well maybe all we need to do is beat China" or "maybe alignment will be way easier than we feared" or "maybe we just need to bet on worlds where we get a fire alarm for AGI."
I'm particularly curious to see what happens if Nate tries to reason in this frame, especially since I expect his "play to your outs" reasoning/conclusions might look fairly different from that of others in the community.
Some examples of questions for Nate (and others who have written more about what they actually expect to happen and less about what happens if we condition on things going well):
I'll also note that I'd be open to having a dialogue about this with Nate (and possibly other "doomy" people who have not written up their "play to your outs" thoughts).
My own model differs a bit from Zach's. It seems to me like most of the publicly-available policy proposals have not gotten much more concrete. It feels a lot more like people were motivated to share existing thoughts, as opposed to people having new thoughts or having more concrete thoughts.
Luke's list, for example, is more of a "list of high-level ideas" than a "list of concrete policy proposals." It has things like "licensing" and "information security requirements"– it's not an actual bill or set of requirements. (And to be clear, I still like Luke's post and it's clear that he wasn't trying to be super concrete).
I'd be excited for people to take policy ideas and concretize them further.
Aside: When I say "concrete" in this context, I don't quite mean "people on LW would think this is specific." I mean "this is closer to bill text, text of a section of an executive order, text of an amendment to a bill, text of an international treaty, etc."
I think there are a lot of reasons why we haven't seen much "concrete policy stuff". Here are a few:
For people interested in developing the kinds of proposals I'm talking about, I'd be happy to chat. I'm aware of a couple of groups doing the kind of policy thinking that I would consider "concrete", and it's quite plausible that we'll see more groups shift toward this over time.