Two quick areas I'd love to get others thoughts on:
1) If we accept that Anthropic will decide who should access powerful models before their release – sidestepping whether they should – then what criteria should they use to inform their decision? Should a first criterion be to distribute access in ways that defends democratic regimes?
I wrote about Mythos/Glasswing and AI-enabled coups here a few days ago. The next step to that argument could be that, as Anthropic are now the effective arbiter of which actors can or cannot defend critical infrastructure, they have acquired a form of structural geopolitical power that would historically trigger obligations to uphold international governance norms. This is similar to UN principles that establish corporations have a responsibility to avoid contributing to human rights violations; contributing to coup risk could plausibly fall within that frame.
This is an incredibly dicey criterion to enact. The history of US government agencies and corporations picking winners in contested regimes is an infamous one, so I wonder if there is a way to frame the obligation in a sufficiently narrow sense that it doesn't encourage Anthropic to do this. One option could be to not make access-control decisions a la Glasswing in ways that predictably disadvantage the defenders of democratic institutions.
I still find this pretty unsatisfying - I wonder if anyone has thoughts on how this could work. I'm setting aside arguments that Anthropic shouldn't exercise this power, since this is what they are doing in practice. I would love to hear what criteria might make them exercise this power in the most sensible manner.
2) What characteristics does a state particularly vulnerable to an AI-enabled coup have? Some quick takes below but interested in other ideas.
1. Weak state capacity (state has fewer resources to respond to a coordinated attack).
2. Concentrated digital infrastructure (eg reliance on a small number of critical nodes that are easier to target, possibly a small number of foreign systems).
3. Weak state control or civilian control of the military, so that pressure can exploit existing fragmentation.
4. High political polarisation and instability (existing tensions to exploit, more likely that a live political crisis can be exploited)
I'd really like to develop better-evidenced characteristics, as my initial sense it that it could be mid-tier modernised democracies that are more at risk to this threat than failed states. I'd also imagine these states would be excluded from involvement in any governance arrangement that resembles Glasswing.
1) If we accept that Anthropic will decide who should access powerful models before their release – sidestepping whether they should – then what criteria should they use to inform their decision?
One possible principle is "first, do no harm." New models shall be available first to those who are credibly defenders of the common good, rather than those who are likely to use such access to attack their rivals.
I read Glasswing as broadly a good example here: securing the public tech infrastructure is a defense of the common good. A secure public tech infra generally supports people who are doing prosocial things like speech and commerce, and weakens people who are doing malicious things like sabotage, spam, espionage, and cyberwarfare.
If one wanted to restrict early access more narrowly, one possibility is to restrict access to specific individuals who are already well-reputed within the tech-infra community. Identify projects (open-source or otherwise) that are security dependencies for public infra. Talk to those projects' organizers and leaders. Get them to designate trusted individuals to work with new models on security issues.
Should a first criterion be to distribute access in ways that defends democratic regimes?
No, not specifically; although this is an acceptable side-effect of the above.
2) What characteristics does a state particularly vulnerable to an AI-enabled coup have?
I expect we don't know this until we've seen some AI-enabled coups. But if I had to hazard a guess, those with greater entanglement with AI are probably more vulnerable than those without.
I expect we don't know this until we've seen some AI-enabled coups.
Yep. What exactly is the threat model here. Radicalizing millions of voters via fake friends on social networks? Hacking the president's server and exposing sensitive data? (The same, but the sensitive data are fake?) Destroying important infrastructure so that the entire country drowns in chaos?
A population with more AI usage may be easily manipulated by hacked AI boyfriends/girlfriends. But a population less familiar with AI may be easily manipulated by releasing fake videos.
I think I'd expect it to play out as the last option you suggest, e.g. attacking important infrastructure at an opportune moment. I wrote a theory of how this could play out on here at the weekend if you check my post history, but I'd encourage you to read Forethought's report, as they have the most developed thinking on this.