Great post. I think another important consideration is the *redundancy* of test-time reasoning. If the same problems are recurring then you can distill the solution into the next generation of models. So for that reason I expect a lot of inference-time cost to be ephemeral, you only need to do inference one-time then you can cache it.

This can happen at a lot of levels (1) exact caching; (2) fuzzy caching; (3) model switching; (4) distillation; (5) advancing the knowledge frontier, & that knowledge can then be used to solve future problems.

These effects imply that there's a spillover from today's inference into tomorrow's value.

3

0

1