abkhur
Message
CS senior at Virginia Tech (May '26). Building Polity, a multi-agent institutional sandbox for studying whether alignment failures can emerge at the level of institutions rather than individual models. TS/SCI. Interested in multi-agent alignment, AI governance, and the intersection of political economy and AI safety.
6
4
I think this is pointing at a real gap in AI character design. Once systems are more autonomous and embedded in institutions, “good assistant” doesn’t obviously seem like the right target anymore.
My main hesitation is about the implementation story. A lot of this post seems to rely on the idea that we can train something like context-dependent civic virtues: prosocial tendencies that activate selectively, stay subordinate to higher-priority constraints, and reduce collusion/takeover risk precisely because they’re not just global goals.
I’m not sure current ... (read more)