Some gestures which didn't make the cut as they're too woolly or not quite the right shape:
Nice, yes, that's definitely one sort of implication you could draw from a conclusive no-better-than-logarithmic returns in a given environment. (For what it's worth, I tend to doubt imagined schemers which primarily 3d-chess their way to success, for chaos-related reasons, but I do think there's something about consistently maintaining a bit better lookahead and plan robustness which can allow A to defeat B over time.)
I tend to doubt imagined schemers which primarily 3d-chess their way to success, for chaos-related reasons
The standard solution for any chaos-related issues is to sufficiently take control of the system to re-engineer it into becoming more predictable in relevant ways by construction.
This way it should in principle be practical to have perfectly predictable weather, or absolute cure for all possible diseases even in biological humans. Not by reaching an extremely high level of understanding about how the natural systems work, but by changing these systems enough that they are no longer harboring any hard to understand dynamics that are important for their global behavior.
Indeed. von Neumann
All stable processes we shall predict. All unstable processes we shall control.
Though some chaotic processes are quite hard to control (and others might not take kindly to your attempting to control them)!
Lots of phenomena turn out to have logarithmic returns: to get an improvement, you double effort or resources put in, but then to get the same improvement you have to double inputs again and again and so on. Equivalently, input costs are exponential in output quality[1]. You can probably think of some examples.
I want to know: is 'extra reasoning compute' like this? (Or, under what conditions and by what means can you beat this?) I'm especially interested in this question as applied to deliberate exploration and experiment design.
Said another way, from a given decision-context, without making extra observations or gathering extra data, what are the optimal marginal returns to 'thinking harder'[2] about what to do next?
Intuitively, if I have a second to come up with a plan, it might be weak, five minutes and it might be somewhat reasonable, a day and it'll be better, a year (full time!) and I've reached very diminishing returns. Presumably a century in my ivory tower would be barely better. I'd usually do better trying to get more data.
Is this even a sensible question, or is 'improvement in reasoning output' far too vague to get traction here?
That's the question; below some first thoughts toward an answer.
If you have a proposal generator, and you can choose between proposals, a simple approach to getting better generations is:
(This is actually the best strategy you could take if you can only add parallel compute, but there might be strictly better approaches if you can add serial[3].)
Even assuming you can unerringly pick the best one, this strategy turns out to have logarithmically-bounded expected value for many underlying distributions of proposals[4]. In fact, for a normally-distributed proposal generator, you even get the slightly worse square root of logarithmic growth[5].
You can in principle sidestep this if your proposal generator has sufficiently heavy-tailed proposal distribution, and you can reliably ex ante distinguish better from worse at the tails.
Often to find approximate solutions to problems, we might employ search over a tree-like structure. This emerges very naturally for planning over time, for example, where branching options (whether choice or chance) at each chosen time interval give rise to a tree of possible plans. (Compare Monte Carlo tree search.)
If gains are roughly uniform in search depth, this gives rise to logarithmic returns to further search. With excellent heuristics, you might be able to prune large fractions of the tree - this gives you a kinder exponent, but still an exponential space to search.
When (if at all) are gains over search depth dependably growing, rather than uniform at best? Alternatively, when can uniform (or better) gains be reliably achieved by expanding the search strictly less than exponentially?
Chaotic systems are characterised by sensitivity to initial conditions: dynamics where measurement or specification errors compound exponentially.
So, to forecast at a given precision and probabilistic resolution, it takes exponentially tighter initial specification precision to forecast marginal incremental time depth. (This is why in practice we only ever successfully forecast chaotic systems like weather at quite coarse precision or short horizon.)
Specification precision doesn't exactly map to having extra compute, but it feels close. And marginal incremental time depth doesn't necessarily correspond uniformly to 'goodness of plan'.
If there's some space of ingredients or components which can be combined for possible insight, the size of the search space is exponential[6] in the number of components in a proposed combination. So if, among good plans at each scale, gains are proportional to the number of components in the plan (and there are similarly many good plans at each scale), you get logarithmic returns to searching longer.
Something similar applies if the design possibilities benefit from combining already-discovered structures in a hierarchy, for example if emergent features of subcomponents unlock new levels of effectiveness in a combined design (molecules, peptides, proteins, organelles, cells, ...).
But the assumption of roughly uniform gains over scales like this is carrying some weight here.
Notably this means that, unless you have an exponentially growing source of inputs to counteract it, there's a practical upper limit to growing the output, because you can only double so many times. And with an exponentially-growing input, you can get a modest, linear improvement to output. ↩︎
i.e. computing for longer or computing more parallel. Parallel can't be better than serial in returns to total compute, so I'm mainly interested in the more generous serial case. For parallel, it's easier to bound because the algorithm space is more constrained ('sample many in parallel, choose best' is the best you can do asymptotically). ↩︎
Intuitively you can 'reason deeper' with extra serial compute, which might look like recursing further down a search tree. You can also take proposals and try to refine or improve rather than just throwing them out and trying again from scratch. ↩︎
Proof. Suppose the generator produces proposals with quality . All we assume is that the distribution of has a moment-generating function (this is not true of all distributions, in particular heavy-tailed distributions may not have a MGF). Denote individual samples as . Note first by Jensen's inequality that:
i.e. the exponential of the expected maximum in question is bounded by the expected maximum of the exponentials. But a max of positive terms is bounded by the sum:
(writing for a representative single sample.) But that's just times the moment-generating function (which we assumed exists). So for all positive ,
So (fixing any , or minimising over ,) we see at most logarithmic growth in . ↩︎
Take the proof of the general case for an arbitrary distribution with a moment-generating function. Substitute the normal moment-generating function
Minimising over (positive) ,
Or more than exponential if the order or configuration matters! ↩︎