tlevin

(Posting in a personal capacity unless stated otherwise.) I help allocate Open Phil's resources to improve the governance of AI with a focus on avoiding catastrophic outcomes. Formerly co-founder of the Cambridge Boston Alignment Initiative, which supports AI alignment/safety research and outreach programs at Harvard, MIT, and beyond, co-president of Harvard EA, Director of Governance Programs at the Harvard AI Safety Team and MIT AI Alignment, and occasional AI governance researcher.

Not to be confused with the user formerly known as trevor1.

Posts

Sorted by New

4tlevin's Shortform

76EU policymakers reach an agreement on the AI Act

5mo

11Notes on nukes, IR, and AI from "Arsenals of Folly" (and other books)

8mo

28Apply to HAIST/MAIA’s AI Governance Workshop in DC (Feb 17-20)

60Update on Harvard AI Safety Team and MIT AI Alignment

Wiki Contributions

Comments

tlevin's Shortform

tlevin2d5-4

Quick reactions:

Re: how over-emphasis on "how radical is my ask" vs "what my target audience might find helpful" and generally the importance of making your case well regardless of how radical it is, that makes sense. Though notably the more radical your proposal is (or more unfamiliar your threat models are), the higher the bar for explaining it well, so these do seem related.
Re: more effective actors looking for small wins, I agree that it's not clear, but yeah, seems like we are likely to get into some reference class tennis here. "A lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate"? Maybe, but I think of like, the agriculture lobby, who just sort of quietly make friends with everybody and keep getting 11-figure subsidies every year, in a way that (I think) resulted more from gradual ratcheting than making a huge ask. "Pretty much no group– whether radical or centrist– has had tangible wins" seems wrong in light of the EU AI Act (where I think both a "radical" FLI and a bunch of non-radical orgs were probably important) and the US executive order (I'm not sure which strategy is best credited there, but I think most people would have counted the policies contained within it as "minor asks" relative to licensing, pausing, etc). But yeah I agree that there are groups along the whole spectrum that probably deserve credit.
Re: poisoning the well, again, radical-ness and being dumb/uninformed are of course separable but the bar rises the more radical you get, in part because more radical policy asks strongly correlate with more complicated procedural asks; tweaking ECRA is both non-radical and procedurally simple, creating a new agency to license training runs is both outside the DC Overton Window and very procedurally complicated.
Re: incentives, I agree that this is a good thing to track, but like, "people who oppose X are incentivized to downplay the reasons to do X" is just a fully general counterargument. Unless you're talking about financial conflicts of interest, but there are also financial incentives for orgs pursuing a "radical" strategy to downplay boring real-world constraints, as well as social incentives (e.g. on LessWrong IMO) to downplay boring these constraints and cognitive biases against thinking your preferred strategy has big downsides.
I agree that the CAIS statement, Hinton leaving Google, and Bengio and Hogarth's writing have been great. I think that these are all in a highly distinct category from proposing specific actors take specific radical actions (unless I'm misremembering the Hogarth piece). Yudkowsky's TIME article, on the other hand, definitely counts as an Overton Window move, and I'm surprised that you think this has had net positive effects. I regularly hear "bombing datacenters" as an example of a clearly extreme policy idea, sometimes in a context that sounds like it maybe made the less-radical idea seem more reasonable, but sometimes as evidence that the "doomers" want to do crazy things and we shouldn't listen to them, and often as evidence that they are at least socially clumsy, don't understand how politics works, etc, which is related to the things you list as the stuff that actually poisons the well. (I'm confused about the sign of the FLI letter as we've discussed.)
I'm not sure optimism vs pessimism is a crux, except in very short, like, 3-year timelines. It's true that optimists are more likely to value small wins, so I guess narrowly I agree that a ratchet strategy looks strictly better for optimists, but if you think big radical changes are needed, the question remains of whether you're more likely to get there via asking for the radical change now or looking for smaller wins to build on over time. If there simply isn't time to build on these wins, then yes, better to take a 2% shot at the policy that you actually think will work; but even in 5-year timelines I think you're better positioned to get what you ultimately want by 2029 if you get a little bit of what you want in 2024 and 2026 (ideally while other groups also make clear cases for the threat models and develop the policy asks, etc.). Another piece this overlooks is the information and infrastructure built by the minor policy changes. A big part of the argument for the reporting requirements in the EO was that there is now going to be an office in the US government that is in the business of collecting critical information about frontier AI models and figuring out how to synthesize it to the rest of government, that has the legal authority to do this, and both the office and the legal authority can now be expanded rather than created, and there will now be lots of individuals who are experienced in dealing with this information in the government context, and it will seem natural that the government should know this information. I think if we had only been developing and advocating for ideal policy, this would not have happened (though I imagine that this is not in fact what you're suggesting the community do!).

tlevin's Shortform

tlevin3d640

I think some of the AI safety policy community has over-indexed on the visual model of the "Overton Window" and under-indexed on alternatives like the "ratchet effect," "poisoning the well," "clown attacks," and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable (edit to add: whereas successfully proposing minor changes achieves hard-to-reverse progress, making ideal policy look more reasonable).

I'm not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more effective actors in the DC establishment overall are much more in the habit of looking for small wins that are both good in themselves and shrink the size of the ask for their ideal policy than of pushing for their ideal vision and then making concessions. Possibly an ideal ecosystem has both strategies, but it seems possible that at least some versions of "Overton Window-moving" strategies executed in practice have larger negative effects via associating their "side" with unreasonable-sounding ideas in the minds of very bandwidth-constrained policymakers, who strongly lean on signals of credibility and consensus when quickly evaluating policy options, than the positive effects of increasing the odds of ideal policy and improving the framing for non-ideal but pretty good policies.

In theory, the Overton Window model is just a description of what ideas are taken seriously, so it can indeed accommodate backfire effects where you argue for an idea "outside the window" and this actually makes the window narrower. But I think the visual imagery of "windows" actually struggles to accommodate this -- when was the last time you tried to open a window and accidentally closed it instead? -- and as a result, people who rely on this model are more likely to underrate these kinds of consequences.

Would be interested in empirical evidence on this question (ideally actual studies from psych, political science, sociology, econ, etc literatures, rather than specific case studies due to reference class tennis type issues).

The Worst Form Of Government (Except For Everything Else We've Tried)

tlevin1mo10

The "highly concentrated elite" issue seems like it makes it more, rather than less, surprising and noteworthy that a lack of structural checks and balances has resulted in a highly stable and (relatively) individual-rights-respecting set of policy outcomes. That is, it seems like there would thus be an especially strong case for various non-elite groups to have explicit veto power.

On green

tlevin1mo217

One other thought on Green in rationality: you mention the yin of scout mindset in the Deep Atheism post, and scout mindset and indeed correct Bayesianism involves a Green passivity and maybe the "respect for the Other" described here. While Blue is agnostic, in theory, between yin and yang -- whichever gives me more knowledge! -- Blue as evoked in Duncan's post and as I commonly think of it tends to lean yang: "truth-seeking," "diving down research rabbit holes," "running experiments," etc. A common failure mode of Blue-according-to-Blue is a yang that projects the observer into the observations: seeing new evidence as tools, arguments as soldiers. Green reminds Blue to chill: see the Other as it is, recite the litanies of Gendlin and Tarski, combine the seeking of truth with receptivity to what you find.

The Worst Form Of Government (Except For Everything Else We've Tried)

tlevin2mo40

I think this post aims at an important and true thing and misses in a subtle and interesting but important way.

Namely: I don't think the important thing is that one faction gets a veto. I think it's that you just need limitations on what the government can do that ensure that it isn't too exploitative/extractive. One way of creating these kinds of limitations is creating lots of veto points and coming up with various ways to make sure that different factions hold the different veto points. But, as other commenters have noted, the UK government does not have structural checks and balances. In my understanding, what they have instead is a bizarrely, miraculously strong respect for precedent and consensus about what "is constitutional" despite (or maybe because of?) the lack of a written constitution. For the UK, and maybe other, less-established democracies (i.e. all of them), I'm tempted to attribute this to the "repeated game" nature of politics: when your democracy has been around long enough, you come to expect that you and the other faction will share power (roughly at 50-50 for median voter theorem reasons), so voices within your own faction start saying "well, hold on, we actually do want to keep the norms around."

Also, re: the electoral college, can you say more about how this creates de facto vetos? The electoral college does not create checks and balances; you can win in the electoral college without appealing to all the big factions (indeed, see Abraham Lincoln's 1860 win), and the electoral college puts no restraints on the behavior of the president afterward. It just noisily empowers states that happen to have factional mixes close to the national average, and indeed can create paths to victory that route through doubling down on support within your own faction while alienating those outside it (e.g. Trump's 2016 and 2020 coalitions).

EU policymakers reach an agreement on the AI Act

tlevin4mo1410

(An extra-heavy “personal capacity” disclaimer for the following opinions.) Yeah, I hear you that OP doesn’t have as much public writing about our thinking here as would be ideal for this purpose, though I think the increasingly adversarial environment we’re finding ourselves in limits how transparent we can be without undermining our partners’ work (as we’ve written about previously).

The set of comms/advocacy efforts that I’m personally excited about is definitely larger than the set of comms/advocacy efforts that I think OP should fund, since 1) that’s a higher bar, and 2) sometimes OP isn’t the right funder for a specific project. That being said:

So far, OP has funded AI policy advocacy efforts by the Institute for Progress and Sam Hammond. I personally don’t have a very detailed sense of how these efforts have been going, but the theory of impact for these was that both grantees have strong track records in communicating policy ideas to key audiences and a solid understanding of the technical and governance problems that policy needs to solve.
I’m excited about the EU efforts of FLI and The Future Society. In the EU context, it seems like these orgs were complementary, where FLI was willing to take steps (including the pause letter) that sparked public conversation and gave policymakers context that made TFS’s policy conversations more productive (despite raising some controversy). I have much less context on their US work, but from what I know, I respect the policymaker outreach and convening work that they do and think they are net-positive.
I think CAIP is doing good work so far, though they have less of a track record. I value their thinking about the effectiveness of different policy options, and they seem to be learning and improving quickly.
I don’t know as much about Andrea and Control AI, but my main current takes about them are that their anti-RSP advocacy should have been heavier on “RSPs are insufficient,” which I agree with, instead of “RSPs are counterproductive safety-washing,” which I think could have disincentivized companies from the very positive move of developing an RSP (as you and I discussed privately a while ago). MAGIC is an interesting and important proposal and worth further developing (though as with many clever acronyms I kind of wish it had been named differently).
I’m not sure what to think about Holly’s work and PauseAI. I think the open source protest where they gave out copies of a GovAI paper to Meta employees seemed good – that seems like the kind of thing that could start really productive thinking within Meta. Broadly building awareness of AI’s catastrophic potential seems really good, largely for the reasons Holly describes here. Specifically calling for a pause is complicated, both in terms of the goodness of the types of policies that could be called a pause and in terms of the politics (i.e., the public seems pretty on board, but it might backfire specifically with the experts that policymakers will likely defer to, but also it might inspire productive discussion around narrower regulatory proposals?). I think this cluster of activists can sometimes overstate or simplify their claims, which I worry about.

Some broader thoughts about what kinds of advocacy would be useful or not useful:

The most important thing, imo, is that whatever advocacy you do, you do it well. This sounds obvious, but importantly differs from “find the most important/neglected/tractable kind of advocacy, and then do that as well as you personally can do it.” For example, I’d be really excited about people who have spent a long time in Congress-type DC world doing advocacy that looks like meeting with staffers; I’d be excited about people who might be really good at writing trying to start a successful blog and social media presence; I’d be excited about people with a strong track record in animal advocacy campaigns applying similar techniques to AI policy. Basically I think comparative advantage is really important, especially in cases where the risk of backfire/poisoning the well is high.
In all of these cases, I think it’s very important to make sure your claims are not just literally accurate but also don’t have misleading implications and are clear about your level of confidence and the strength of the evidence. I’m very, very nervous about getting short-term victories by making bad arguments. Even Congress, not known for its epistemic and scientific rigor, has gotten concerned that AI safety arguments aren’t as rigorous as they need to be (even though I take issue with most of the specific examples they provide).
Relatedly, I think some of the most useful “advocacy” looks a lot like research: if an idea is currently only legible to people who live and breathe AI alignment, writing it up in a clear and rigorous way, such that academics, policymakers, and the public can interact with it, critique it, and/or become advocates for it themselves is very valuable.
This is obviously not a novel take, but I think other things equal advocacy should try not to make enemies. It’s really valuable that the issue remain somewhat bipartisan and that we avoid further alienating the AI fairness and bias communities and the mainstream ML community. Unfortunately “other things equal” won’t always hold, and sometimes these come with steep tradeoffs, but I’d be excited about efforts to build these bridges, especially by people who come from/have spent lots of time in the community to which they’re bridge-building.

AI Risk and the US Presidential Candidates

tlevin4mo34

Just being "on board with AGI worry" is so far from sufficient to taking useful actions to reduce the risk that I think epistemics and judgment is more important, especially since we're likely to get lots of evidence (one way or another) about the timelines and risks posed by AI during the term of the next president.

AI Risk and the US Presidential Candidates

tlevin4mo32

He has also broadly indicated that he would be hostile to the nonpartisan federal bureaucracy, e.g. by designating way more of them as presidential appointees, allowing him personally to fire and replace them. I think creating new offices that are effectively set up to regulate AI looks much more challenging in a Trump (and to some extent DeSantis) presidency than the other candidates.

EU policymakers reach an agreement on the AI Act

tlevin4mo10

Thanks for these thoughts! I agree that advocacy and communications is an important part of the story here, and I'm glad for you to have added some detail on that with your comment. I’m also sympathetic to the claim that serious thought about “ambitious comms/advocacy” is especially neglected within the community, though I think it’s far from clear that the effort that went into the policy research that identified these solutions or work on the ground in Brussels should have been shifted at the margin to the kinds of public communications you mention.

I also think Open Phil’s strategy is pretty bullish on supporting comms and advocacy work, but it has taken us a while to acquire the staff capacity to gain context on those opportunities and begin funding them, and perhaps there are specific opportunities that you're more excited about than we are.

For what it’s worth, I didn’t seek significant outside input while writing this post and think that's fine (given the alternative of writing it quickly, posting it here, disclaiming my non-expertise, and getting additional perspectives and context from commenters like yourself). However, I have spoken with about a dozen people working on AI policy in Europe over the last couple months (including one of the people whose public comms efforts are linked in your comment) and would love to chat with more people with experience doing policy/politics/comms work in the EU.

We could definitely use more help thinking about this stuff, and I encourage readers who are interested in contributing to OP’s thinking on advocacy and comms to do any of the following:

Write up these critiques (we do read the forums!);
Join our team (our latest hiring round specifically mentioned US policy advocacy as a specialization we'd be excited about, but people with advocacy/politics/comms backgrounds more generally could also be very useful, and while the round is now closed, we may still review general applications); and/or
Introduce yourself via the form mentioned in this post.

EU policymakers reach an agreement on the AI Act

tlevin5mo10

Thank you! Classic American mistake on my part to round these institutions to their closest US analogies.