Well, I can't change the headline; I'm just a commenter. However, I think the reason why "frontier labs will fail at alignment while nonprofits can succeed" is that frontier labs are only pretending to try to solve alignment -- it's not actually a serious goal of their leadership, and it's not likely to get meaningful support in terms of compute, recruiting, data, or interdepartmental collaboration, and in fact the leadership will probably actively interfere with your work on a regular basis because the intermediate conclusions you're reaching will get in the way of their profits and hurt their PR. In order to do useful superalignment research, I suspect you sometimes need to warn about or at least openly discuss the serious threats that are posed by increasingly advanced AI, but the business model of frontier labs depends on pretending that none of those threats are actually serious. By contrast, the main obstacle at a nonprofit is that they might not have much funding, but at least whatever funding they do have will be earnestly directed at supporting your team's work.
I suspect Joe would agree with me that the current odds that AI developers solve superalignment are significantly less than 20%.
Even if we concede your estimate of 20% for the sake of argument, though, what price are you likely to pay for increasing the odds of success by 0.01%? Suppose that, given enough time, nonprofit alignment researchers would eventually solve superalignment with 80% odds. In order to increase, e.g., Anthropic's odds of success by 0.01%, are you boosting Anthropic's capabilities in a way that shortens timelines in a way that decreases the amount of time that the nonprofit alignment teams have to solve superalignment in a way that reduces their odds of success by at least 0.0025%? If so, you've done net harm. If not, why not? What about Joe's arguments that most for-profit alignment work has at least some applicability to capabilities do you find unconvincing?
Transparency is less neglected than some other topics -- check out HR 5539 (Transparent by Design Act), S 3312 (AIRIA), and HR 6881 (AI Foundation Model Transparency Act).
There's room for a little bit more useful drafting work here, but I wouldn't call it orphaned, exactly.
I could carry on debating the pros and cons of the EO with you, but I think my real point is that bipartisan advocacy is harmless. You shouldn't worry that bipartisan advocacy will backfire, so we can't justify engaging in no advocacy at all out of fear that advocacy might backfire.
If you believe strongly enough in the merits of working with one party to be confident that it won't backfire, fine, I won't stop you -- but we should all be able to agree that more bipartisan advocacy would be good, even if we disagree about how valuable one-party advocacy is.
Yes, CAIP's strategy was primarily to get the legislation ready, talk to people about it, and prepare for a crisis. We also encouraged people to pass our legislation immediately, but I was not especially optimistic about the odds that they would agree to do so.
I don't object to people pushing legislators from both parties to act more quickly...but you have to honor their decision if they say "no," no matter how frustrating that is or how worried you are about the near-term future, because trying to do an end-run around their authority will quickly and predictably backfire.
In my opinion, going behind legislators' backs to the Biden administration was particularly unhelpful for the Biden AI EO, because the contents of that EO would have led to only a small reduction in catastrophic risk -- it would be nice to require reports on the results of red-teaming, but the EO by itself wouldn't have stopped companies from reporting that their models seemed risky and then releasing them anyway. We would have needed to follow up on the EO and enact additional policies in order to have a reasonable chance of survival, but proceeding via unilateral executive action had some tendency to undermine our ability to get those additional policies passed, so it's not clear to me what the overall theory of change was for rushing forward with an EO.
I think it's worth distinguishing between "this AI policy could be slightly inconvenient for America's overall geopolitical strategy" and "this AI policy is so bad for America's AI arms race that we're going to lose a shooting war with China."
The former is a political problem that advocates need to find a strategy to cope with, but it's not a reason not to do advocacy -- we should be willing to trade away a little bit of American influence in order to avoid a large risk of civilizational collapse from misaligned AI.
If you're earnestly worried about maximizing American influence, there are much better strategies for making that happen than trying to make sure that we have zero AI regulations. You could repeal the Jones Act, you could have a stable tariff regime, you could fix the visa system, you could fund either the CHIPS Act or a replacement for the CHIPS Act, you could give BIS more funding to go after chip smugglers, and so on.
I think the "concern" about the harmful geopolitical effects of moderate AI regulation is mostly opportunistic political theater by companies who would prefer to remain unregulated -- there's a notable absence of serious international relations scholars or national security experts who are coming out in favor of zero AI safety regulation as a geopolitical tool. At most, some experts might be pushing for easier land use and environmental approvals, which are not in conflict with the regulations that organizations like CAIP are pushing for.
I tend to agree with John -- I think the Republican opposition is mostly a mixture of general hostility toward regulation as a tool and a desire to publicly reject the Biden Administration and all its works. My experience in talking with Republican staffers, even in 2025, is that they weren't hostile to the idea of reducing catastrophic risks from AI; they're just waiting to see what the Trump Administration's official position is on the issue. We'll know more about that when we see the OSTP AI Action Plan in July.
Part of the problem with AI safety advocacy in 2022 - 2024 was that it ignored my advice to "insist that the bills you endorse have co-sponsors from both parties." By definition, an executive order is not bipartisan. You can frame the Biden EO as general technocracy if you like, but it's still careless to push for an EO that's only supported by one party. If you want to avoid alienating Republicans (and you should want to avoid doing that), then you need to make sure you have at least some Republicans willing to go on record as publicly endorsing your policy proposals before you enact them.
Hmm. There are lots of valuable advocacy targets besides literal regulation; advocates might try to pass laws, win court cases, sustain a boycott, amend a corporate charter, amend a voluntary industry standard at NIST, and so on. I'll discuss several examples in post #5.
I suppose advocacy might also have some side benefits like teaching us more about politics, helping us bond as a community, or giving us a sense of efficacy, but it's not obvious to me that any of those are of comparable importance to reducing the likelihood that AI developers release misaligned superintelligence.
Does that answer your question? If not, please let me know more about what you mean and I'll try to elaborate.
I have no complaints at all about technical AI research! Indeed, I agree with you that industry will spontaneously adopt most of the good technical AI safety ideas shortly after they're invented.
I'm arguing about the relative merits of AI governance research vs. AI governance advocacy. This probably wasn't clear if you're just jumping in at this point of the sequence, for which I apologize.
> frontier labs are only pretending to try to solve alignment
>>This is probably the main driver of our disagreement.
I agree with your diagnosis! I think Sam Altman is a sociopathic liar, so the fact that he signed the statement on AI risk doesn't convince me that he cares about alignment. I feel reasonably confident about that belief. Zvi's series on Moral Mazes apply here: I don't claim that you literally can't mention existential risk at OpenAI, but if you show signs of being earnestly concerned enough about it to interefere with corporate goals, then I believe you'll be sidelined.
I'm much less confident about whether or not successful alignment looks like normal deep learning work; I know more about corporate behavior than I do about technical AI safety. It seems odd and unlikely to me that the same kind of work (normal deep learning) that looks like it causes a series of major problems (power-seeking, black boxes, emergent goals) when you do a moderate amount of it would wind up solving all of those same problems when you do a lot of it, but I'm not enough of a technical expert to be sure that that's wrong.
Because there are independent, non-technical reasons for people to want to believe that normal deep learning will solve alignment (it means they get to take fun, high-pay, high-status jobs at AI developers without feeling guilty about it), if you show me a random person who believes this and I don't know anything about their incorruptiability or the clarity of their thinking ahead of time, then my prior is that most of the people in the random distribution that this person was drawn from probably arrived at the belief mostly out of convenience and temptation, rather than mostly by becoming technically convinced of the merits of a position that seems a priori unlikely to me. However, I can't be sure -- perhaps it's more likely than I think that normal deep learning can solve alignment.