Fulcrum is excited to open-source two new tools: 1. Quibbler is a critic for your coding agent. 2. Orchestra is a multi-agent coding system: it uses a designer agent that spawns and coordinates your parallel coding agents. Motivation Human attention is a scarce resource. In the best case, coding with...
In 1787, Catherine the Great sailed down the Dnieper to inspect its banks. Her trusted advisor, Governor Potemkin, set out to present those war-torn lands to her in the best possible light. Legend has it[1] Potemkin set up painted facades along the riverbank, so that, from her barge, Catherine would...
Current “unlearning” methods only suppress capabilities instead of truly unlearning the capabilities. But if you distill an unlearned model into a randomly initialized model, the resulting network is actually robust to relearning. We show why this works, how well it works, and how to trade off compute for robustness. Unlearn-and-Distill...
We help organize the Harvard AI Safety Team (HAIST) and MIT AI Alignment (MAIA), and are excited about our groups and the progress we’ve made over the last semester. In this post, we’ve attempted to think through what worked (and didn’t work!) for HAIST and MAIA, along with more details...
This is a write-up of work I did as an Open Philanthropy intern. However, the conclusions don't necessarily reflect Open Phil's institutional view. Abstract This post investigates the biological anchor framework for thinking about AI timelines, as espoused by Ajeya Cotra in her draft report. The basic claim of this...