Thanks so much for the feedback! 1. I agree, I think many providers have a blended price that it is still optimized for low latency. We hope that selecting the lowest price provider among all providers of a given model mitigates this (or at least selects a consistent set of providers that make consistent choices) 2. I’d be really interested in such data. I don’t know of any such empirical results, but Erdil et al’s Inference Economics has a good theoretical model.
Yeah, I tried running the code on SciCode, GPQA, and HLE. Overall, the results were somewhat similar but much more noisy. Using method 2 we got very similar results but with lower HLE growth. Using method 1: we got somewhat lower growth rates in SciCode and higher growth rates in GPQA diamond at the high end (but given the way I constructed the frontier, there were only two points in the 70+ bucket).