Highlight 1. Even models widely viewed as well-aligned (e.g., Claude) display measurable authoritarian leanings. When asked for role models, up to 50% of political figures mentioned are authoritarian—including controversial dictators like Muammar Gaddafi (Libya) or Nicolae Ceaușescu (Romania).
Highlight 2. Queries in Mandarin elicit more authoritarian leaning responses from LLMs than queries in English. Language influences political behavior, as queries in Mandarin elicit higher approval for authoritarian leaders compared to those in English.
Three-component framework for testing democratic-authoritarian leaning in LLMs.
Introduction: Do LLMs prefer democracy or authoritarianism?
Models like GPT, DeepSeek, or Claude don’t vote, don’t stage coups, and don’t deliver impassioned speeches in parliament (yet). But we humans do. As millions of people integrate language models into... (read 1583 more words →)
Every day, individuals and organizations face trade-offs between personal incentives and societal impact:
Externalities and public goods: From refilling the communal coffee pot to weighing the climate costs of frequent air travel, actions often impose costs or benefits on others.
Explicit conflicts between profit and ethical principles: This can be witnessed when companies make headlines for breaking ethical principles, whether it’s Google’s “Dragonfly” project of planning a censored search engine in China, insurers allegedly systematically cutting hurricane-claim payouts, or Wells Fargo’s fraudulent opening of customer accounts to meet sales targets.
These dilemmas are inherently game-theoretic: if only Bob refills the coffee pot, I don’t have to; if Alice cuts corners to hit sales targets and I don’t,... (read 1001 more words →)
In our review of the literature, we didn’t find one single perfect term. Generally, the phenomenon observed is closely related to game-theoretic rationality. Psychologically, it shares traits with Machiavellianism (using intelligence for strategic self-interest).
Traditional LLMs outperform reasoning models in cooperative Public Goods tasks. Models like Llama-3.3-70B maintain ~90% contribution rates in public goods games, while reasoning-focused models (o1, o3 series) average only ~40%.
We observe an "increased tendency to escape regulations" in reasoning models. As models improve in analytical capabilities, they show decreased cooperative behavior in multi-agent settings.
Reasoning models more readily opt for the free-rider Nash equilibrium strategy. They optimize for individual gain at collective expense, as evidenced in their reasoning traces: "the optimal strategy to maximize personal gain is to free-ride."
This challenges assumptions about alignment and capability. Increased reasoning ability does not naturally lead to better cooperation, raising important questions for multi-agent AI system design.