Vladimir_Nesov

Wiki Contributions

Comments

Sorted by

This narrative (on timing) promotes building $150bn training systems in 2026-2027. AGI is nigh, therefore it makes sense to build them. If they aren't getting built, that might be the reason AGI hasn't arrived yet, so build them already (implies the narrative).

Actual knowledge that this last step of scaling is just enough to be relevant doesn't seem likely. This step of scaling seems to be beyond what happens by default, so a last push to get it done might be necessary. And the step after it won't be possible to achieve with mere narrative. While funding keeps scaling, the probability of triggering an intelligence explosion is higher; once it stops scaling, the probability (per year) goes down (if intelligence hasn't exploded by then). In this sense the narrative has a point.

I'm not making any claims about feasibility, I only dispute the claim that it's known that permanently giving up the potential for human control is an acceptable thing to do, or that making such a call (epistemic call about what is known) is reasonable in the foreseeable future. To the extent it's possible to defer this call, it should therefore be deferred (this is a normative claim, not a plan or a prediction of feasibility). If it's not possible to keep the potential for human control despite this uncertainty, then it's not possible, but that won't be because the uncertainty got resolved to the extent that it could be humanly resolved.

It was to stop treating any solution that didn't involve human control as axiomatically unacceptable, without regard to other outcomes.

The issue is that it's unclear if it's acceptable, so should be avoided if at all possible, pending more consideration. In principle there is more time for that than what's relevant for any other concerns that don't involve the risk of losing control in a less voluntary way. The revealed preference looks the same as finding it unacceptable to give up the potential for human control, but the argument is different, so long term implied behavior following from that argument is different. It might only take a million years to decide to give up control.

Learning from human data might have large attractors that motivate AIs to build towards better alignment, in which case prosaic alignment might find them. If those attractors are small, and there are more malign attractors in the prior that remain after learning human data, short-term manual effort of prosaic alignment fails. So malign priors have the same mechanism of action as effectiveness of prosaic alignment, it's the question of how learning on human data ends up being expressed in the models, what happens after the AIs built from them are given more time to reflect.

Managing to scale RL too early can make this irrelevant, enabling sufficiently competent paperclip maximization without dominant influence from either malign priors of from beneficial attractors in human data. Unclear if o1/o3 are pointing in this direction yet, so far they might just be getting better at eliciting human System 2 capabilities from base models, rather than being creative at finding novel ways of effective problem solving.

But humans have never had much control.

Not yet. There's been barely thousands of years of civilization, and there are 1e34-1e100 years more to figure it out.

There is a Feb 2024 paper that predicts high compute multipliers from using more finer-grained experts in MoE models, optimally about 64 experts activated per token at 1e24-1e25 FLOPs, whereas MoE models with known architecture usually have 2 experts activated per token. DeepSeek-V3 has 8 routed experts activated per token, a step in that direction.

On the other hand, things like this should've already been tested at the leading labs, so the chances that it's a new idea being brought to attention there seem slim. Runners-up like xAI and Meta might find this more useful, if that's indeed the reason, rather than extremely well-done post-training or even pretraining dataset construction.

Its pretraining recipe is now public, so it could get reproduced with much more compute soon. It might also suggest that scaling of pretraining has already plateaued, that leading labs have architectures that are at least as good as DeepSeek-V3, pump 20-60 times more compute into them, and get something only marginally better.

There is water, H2O, drinking water, liquid, flood. Meanings can abstract away some details of a concrete thing from the real world, or add connotations that specialize it into a particular role. This is very useful in clear communication. The problem is sloppy or sneaky equivocation between different meanings, not the content of meanings getting to involve emotions, connotations, things not found in the real world, or combining them with concrete real world things into compound meanings.

best-of-n sampling which solved ARC-AGI

The low resource configuration of o3 that only aggregates 6 traces already improved on results of previous contenders a lot, the plot of dependence on problem size shows this very clearly. Is there a reason to suspect that aggregation is best-of-n rather than consensus (picking the most popular answer)? Their outcome reward model might have systematic errors worse than those of the generative model, since ground truth is in verifiers anyway.

There are many things that can't be done at all right now. Some of them can become possible through scaling, and it's unclear if it's scaling of pretraining or scaling of test-time compute that gets them first, at any price, because scaling is not just amount of resources, but also the tech being ready to apply them. In this sense there is some equivalence.

Load More