Better than logarithmic returns to reasoning?

Some gestures which didn't make the cut as they're too woolly or not quite the right shape:

'iteration' or 'refinement' of proposals is the serial counterpart to best-of-k
- if per-cycle marginal improvement to proposals is proportional to (diminishing), then overall return is logarithmic
- we have to demand constant marginal improvement to get overall linear returns, though there are intermediates
- why and when are marginal reasoning iteration improvements constant vs diminishing around $1 / t$ vs something else?
adversarial exponentials might force exponential expense per gain
- e.g. combatting replicators
- e.g. brute forcing passwords
many empirical 'learning curve' effects appear to consume exponential observations per increment
- Wright's Law (which is the more general cousin of Moore's Law) requires exponentially many production iterations per incremental efficiency gain
- Deep learning scaling laws appear to consume exponential inputs per incremental gain
- AlphaCode and AlphaZero appear to make uniform gains per runtime compute doubling
- OpenAI's o-series 'reasoning models' appear to improve accuracy on many benchmarks with logarithmic returns to more 'test time' compute
- (in all of these examples, there's some choice of what scale to represent 'output' on, which affects whether the gains look uniform or not, so the thesis rests on whether the choices made are 'natural' in some way)

Reminds me of: https://www.lesswrong.com/posts/zB3ukZJqt3pQDw9jz/ai-will-change-the-world-but-won-t-take-it-over-by-playing-3

[-]Oliver Sourbut3mo22

Nice, yes, that's definitely one sort of implication you could draw from a conclusive no-better-than-logarithmic returns in a given environment. (For what it's worth, I tend to doubt imagined schemers which primarily 3d-chess their way to success, for chaos-related reasons, but I do think there's something about consistently maintaining a bit better lookahead and plan robustness which can allow A to defeat B over time.)

[-]Vladimir_Nesov3mo100

I tend to doubt imagined schemers which primarily 3d-chess their way to success, for chaos-related reasons

The standard solution for any chaos-related issues is to sufficiently take control of the system to re-engineer it into becoming more predictable in relevant ways by construction.

This way it should in principle be practical to have perfectly predictable weather, or absolute cure for all possible diseases even in biological humans. Not by reaching an extremely high level of understanding about how the natural systems work, but by changing these systems enough that they are no longer harboring any hard to understand dynamics that are important for their global behavior.

[-]Oliver Sourbut3mo30

Indeed. von Neumann

All stable processes we shall predict. All unstable processes we shall control.

Though some chaotic processes are quite hard to control (and others might not take kindly to your attempting to control them)!

Notably this means that, unless you have an exponentially growing source of inputs to counteract it, there's a practical upper limit to growing the output, because you can only double so many times. And with an exponentially-growing input, you can get a modest, linear improvement to output. ↩︎
i.e. computing for longer or computing more parallel. Parallel can't be better than serial in returns to total compute, so I'm mainly interested in the more generous serial case. For parallel, it's easier to bound because the algorithm space is more constrained ('sample many in parallel, choose best' is the best you can do asymptotically). ↩︎
Intuitively you can 'reason deeper' with extra serial compute, which might look like recursing further down a search tree. You can also take proposals and try to refine or improve rather than just throwing them out and trying again from scratch. ↩︎
Proof. Suppose the generator produces proposals with quality $X$ . All we assume is that the distribution of $X$ has a moment-generating function (this is not true of all distributions, in particular heavy-tailed distributions may not have a MGF). Denote $k$ individual samples as $X_{i}$ . Note first by Jensen's inequality that:

$e^{t E [{max}_{i} X_{i}]} \leq E e^{t [{max}_{i} X_{i}]} = E max i e^{t X_{i}}$

i.e. the exponential of the expected maximum in question is bounded by the expected maximum of the exponentials. But a max of positive terms is bounded by the sum:

$E max i e^{t X_{i}} \leq E \sum i e^{t X_{i}} = k E e^{t X}$

(writing $X$ for a representative single sample.) But that's just $k$ times the moment-generating function (which we assumed exists). So for all positive $t$ ,

$E [max i X_{i}] \leq \frac{ln k + ln M G F (t)}{t}$

So (fixing any $t$ , or minimising over $t$ ,) we see at most logarithmic growth in $k$ . ↩︎
Take the proof of the general case for an arbitrary distribution with a moment-generating function. Substitute the normal moment-generating function $M G F (t) = e^{\frac{σ^{2} t^{2}}{2}}$

$E [max i X_{i}] \leq \frac{ln k}{t} + \frac{σ^{2} t}{2}$

Minimising over (positive) $t$ ,

$E [max i X_{i}] \leq σ \sqrt{2 ln k}$ ↩︎
Or more than exponential if the order or configuration matters! ↩︎

LESSWRONG
LW

LESSWRONG
LW

14

Better than logarithmic returns to reasoning?

14

14

Simple model: repeated sampling/best of k

Other rougher gestures

Search depth

Modelling chaos

Combinatorial search