It’s plausible that, over the next few years, US-based frontier AI companies will become very unhappy with the domestic political situation. This could happen as a result of democratic backsliding, weaponization of government power (along the lines of Anthropic’s recent dispute with the Department of War), or because of restrictive...
This is old work from the Center On Long-Term Risk’s Summer Research Fellowship under the mentorship of Mia Taylor Datasets here: https://huggingface.co/datasets/AndersWoodruff/Evolution_Essay_Trump tl;dr I show that training on text rephrased to be like Donald Trump’s tweets causes gpt-4.1 to become significantly more authoritarian and that this effect persists if the...
In a software-only takeoff, AIs improve AI-related software at an increasing speed, leading to superintelligent AI. The plausibility of this scenario is relevant to questions like: * How much time do we have between near-human and superintelligent AIs? * Which actors have influence over AI development? * How much warning...
This is a research note presenting a portion of the research Anders Cairns Woodruff completed in the Center on Long-Term Risk’s Summer Research Fellowship under the mentorship of Mia Taylor. The datasets can be found at https://huggingface.co/datasets/AndersWoodruff/AestheticEM TL;DR 1. Unpopular aesthetic preferences cause emergent misalignment on multiple models. 2. Ablations...