Vladimir_Nesov

Wiki Contributions

Comments

There are two kinds of relevant hypothetical innovations: those that enable chatbot-led autonomous research, and those that enable superintelligence. It's plausible that there is no need for (more of) the former, so that mere scaling through human efforts will lead to such chatbots in a few years regardless. (I think it's essentially inevitable that there is currently enough compute that with appropriate innovations we can get such autonomous human-scale-genius chatbots, but it's unclear if these innovations are necessary or easy to discover.) If autonomous chatbots are still anything like current LLMs, they are very fast compared to humans, so they quickly discover remaining major innovations of both kinds.

In principle, even if innovations that enable superintelligence (at scale feasible with human efforts in a few years) don't exist at all, extremely fast autonomous research and engineering still lead to superintelligence, because they greatly accelerate scaling. Physical infrastructure might start scaling really fast using pathways like macroscopic biotech even if drexlerian nanotech is too hard without superintelligence or impossible in principle. Drosophila biomass doubles every 2 days, small things can assemble into large things.

See "Zero Sum" is a misnomer, shifting and rescaling of utility functions breaks formulations that simply ask to take a sum of payoffs, but we can rescue the concept to mean that all outcomes/strategies of the game are Pareto efficient.

"Positive sum" seems to be about Kaldor-Hicks improvement, an outcome that in principle admits a subsequent redistribution of resources that would turn the outcome into a Pareto improvement (over some original situation of "not playing"), but there is no commitment or possibly even practical feasibility to actually perform the redistribution. This hypothetical redistribution step takes care of comparing utilities of different players. A whole game/interaction/project would then be "positive-sum" if each outcome/strategy is equivalent via a redistribution to some hypothetical "outcome" that is a Pareto improvement over the status quo of not engaging in the game/interaction/project. In actuality, without the hypothetical redistribution step, some players can end up worse off.

a story that's in conflict with itself

The story involves phase changes. Just scaling is what's likely to be available to human developers in the short term (a few years), it's not enough for superintelligence. Autonomous agency secures funding for a bit more scaling. If this proves sufficient to get smart autonomous chatbots, they then provide speed to very quickly reach the more elusive AI research needed for superintelligence.

It's not a little speed, it's a lot of speed, serial speedup of about 100x plus running in parallel. This is not as visible today, because current chatbots are not capable of doing useful work with serial depth, so the serial speedup is not in practice distinct from throughput and cost. But with actually useful chatbots it turns decades to years, software and theory from distant future become quickly available, non-software projects get to be designed in perfect detail faster than they can be assembled.

it seems scaffolding tricks haven't really improved the baseline performance of models that much. Overwhelmingly, the capability comes down to whether the rlfhed base model can do the task.

That's what I'm also saying above (in case you are stating what you see as a point of disagreement). This is consistent with scaling-only short timeline expectations. The crux for this model is current chatbots being already close to autonomous agency and to becoming barely smart enough to help with AI research. Not them directly reaching superintelligence or having any more room for scaling.

Obligation to answer makes questions/criticism cause damage unrelated to their content, so they are marginally more withheld or suppressed. If they won't be withheld or suppressed, like in a debate, they still act as motivation to avoid getting into that situation. The cost has a use for signaling that you can easily provide answers, but it's still a cost, higher for those who can't easily provide answers or don't value conveying that particular signal.

With scale, there is visible improvement in difficulty of novel-to-chatbot ideas/details that is possible to explain in-context, things like issues with the code it's writing. If a chatbot is below some threshold of situational awareness of a task, no scaffolding can keep it on track, but for a better chatbot trivial scaffolding might suffice. Many people can't google for a solution to a technical issue, the difference between them and those who can is often subtle.

So modest amount of scaling alone seems plausibly sufficient for making chatbots that can do whole jobs almost autonomously. If this works, 1-2 OOMs more of scaling becomes both economically feasible and more likely to be worthwhile. LLMs think much faster, so they only need to be barely smart enough to help with clearing those remaining roadblocks.

Questions are not a problem, obligation to answer is a problem. Criticism is not a problem, implicit blame for ignoring it is a problem. Availability of many questions makes it easier to find one you want to answer.

Incidentally, understanding/verifying/getting-used-to/starting-to-track the answer (for those who care about the question) is often harder than writing an answer (for those whose mind was already prepared to generate it).

See minimality principle:

the least dangerous plan is not the plan that seems to contain the fewest material actions that seem risky in a conventional sense, but rather the plan that requires the least dangerous cognition from the AGI executing it

If the transcoders are used to predict next tokens, they may lose interpretability

Possibly. But there is no optimization pressure from pre-training on the relationship between MLPs and transcoders. The MLPs are the thing that pre-training optimizes (as the "full-precision" master model), while transcoders only need to be maintained to remain in sync with the MLPs, whatever they are (according to the same local objective as before, which doesn't care at all about token prediction). The search is for MLPs such that their transcoders are good predictors, not directly for transcoders that are good predictors.

Substituting multiple transcoders at once is possible, but degrades model performance a lot compared to single-transcoder substitutions.

Unclear given the extreme quantization results, where similarly post-training replacement would degrade model performance a lot, yet quantization-aware pre-training somehow doesn't.

We don't really know how transcoders (or SAEs, to the best of my knowledge) behave when they're being trained to imitate a model component that's still updating

This seems to be the main technical hurdle to do the experiment, updating transcoders both efficiently and correctly, as underlying MLPs gradually change. (I'm guessing some discontinuous jumps in choice of transcoders might be OK.)

Load More