kaivu

Agents are under-elicited: A case study in optimization tasks

> "Knowing is not enough; we must apply. Willing is not enough; we must do." > > — Johann Wolfgang von Goethe In our previous post, we introduced inverse rubric optimization (IRO): tasks where an agent must learn the preferences of a black-box judge under a label budget. These are...

Jun 1817

Inverse Rubric Optimization: A testbed for agent science

by zef, leni, kaivu, and rohuang

Jun 119

Tracking Difficulty with Feature Portfolios

Thanks to Megan Kinniment for helpful comments and discussion, and to Jean-Stanislas Denain for helpful comments and pointers to past work. TL;DR: We claim that useful task attributes for forecasting AI capabilities should be measurable, interpretable, stable in its trend over time, and sufficient to explain task difficulty. task.human_completion_time (human...

May 1922

Benchmarking Real Work

Thanks to Megan Kinniment for helpful comments and discussion. TL;DR: Benchmarks like HCAST undersample fuzzy (hard to evaluate) tasks, meaning they might overestimate capability on long-horizon work. To sample fuzzy tasks we need to increase judge capacity: we can either try to build automated judges that match human judgment, or...

May 1630

The bitter lesson for software

by zef, rohuang, and kaivu

Software is made of information flows Software encodes information flows. An ERP system, for instance, takes procurement and locks it into a specific sequence of purchase orders, approval routing, invoice matching, and payment release. Git takes multiple people changing code and imposes a protocol of branching, diffing, reviewing, and merging....

Mar 1615

More is different for intelligence

by zef, rohuang, and kaivu

Why did software change the world? In the 1900s, much of the work being done by knowledge workers was computation: searching, sorting, calculating, tracking. Software made this work orders of magnitude cheaper and faster. Naively, one might expect businesses and institutions to carry out largely the same processes, just more...

Mar 717

Introducing Lunette: auditing agents for evals and environments

by zef, leni, and kaivu

Dec 15, 202523

kaivu

kaivu

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

Update on Harvard AI Safety Team and MIT AI Alignment

AI agents and painted facades

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”

kaivu

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

Update on Harvard AI Safety Team and MIT AI Alignment

AI agents and painted facades

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”

Agents are under-elicited: A case study in optimization tasks

Inverse Rubric Optimization: A testbed for agent science

Tracking Difficulty with Feature Portfolios

Benchmarking Real Work

The bitter lesson for software

More is different for intelligence

Introducing Lunette: auditing agents for evals and environments