TastyBench: Toward Measuring Research Taste in LLM
This is an early stage research update. We love feedback and comments! TL;DR: * It’s important to benchmark frontier models on non-engineering skills required for AI R&D in order to comprehensively understand progress towards full automation in frontier labs. * One of these skills is research taste, which includes the...
Dec 2, 202528