Top postsTop post
zef
Message
Founder, agent infrastructure and interfaces at Fulcrum Research, https://fulcrum.inc
Past: MIT, MAIA, theory of deep learning research
More info: https://uzpg.me
208
13
4
Thanks to Megan Kinniment for helpful comments and discussion, and to Jean-Stanislas Denain for helpful comments and pointers to past work. TL;DR: We claim that useful task attributes for forecasting AI capabilities should be measurable, interpretable, stable in its trend over time, and sufficient to explain task difficulty. task.human_completion_time (human...
Thanks to Megan Kinniment for helpful comments and discussion. TL;DR: Benchmarks like HCAST undersample fuzzy (hard to evaluate) tasks, meaning they might overestimate capability on long-horizon work. To sample fuzzy tasks we need to increase judge capacity: we can either try to build automated judges that match human judgment, or...
Software is made of information flows Software encodes information flows. An ERP system, for instance, takes procurement and locks it into a specific sequence of purchase orders, approval routing, invoice matching, and payment release. Git takes multiple people changing code and imposes a protocol of branching, diffing, reviewing, and merging....
Why did software change the world? In the 1900s, much of the work being done by knowledge workers was computation: searching, sorting, calculating, tracking. Software made this work orders of magnitude cheaper and faster. Naively, one might expect businesses and institutions to carry out largely the same processes, just more...
We are in the time of new human-ai interfaces. AIs become the biggest producers of tokens, and humans need ways to manage all this useful labor. Most breakthroughs come first in coding, because the coders build the tech and iterate on how good it is at the same time, very...
I like being open on the internet. But I am getting increasingly worried (have been for the last few years, but never really acted on it) that public posting is dangerous because in the future your public corpus will expose you to increased risk of persuasion, extortion, and general dangers...
I wrote a letter this year about 2025. It's about acceleration, poetry, how it's been the most eventful year of my life, and how I am excited and scared for the future. Crossposted from my substack. Letter I want to tell you a story about 2025. As I bump along...