Vladimir_Nesov

Vladimir_Nesov1d

(There are two different reasons to hide "top posts", one of them is that it's a useless thing when you are looking at your own userpage. The option turns out to address the other reason, rather than specifically this one, but the possibility/hope that it would makes the alternative hypothesis salient.)

Vladimir_Nesov1d

It's not clear from the UI if "hide top posts" would only affect how you see your own userpage, or also how others see your userpage.

Vladimir_Nesov1d

The main issue is that I can no longer see if there were new comments on recent posts (both mine and for other users), which were previously indicated by the comment counts displayed in green, including for the shortform post (there doesn't seem to be an easy way to find it at all now).

I liked the list of draft posts at the top, it's a regular reminder to keep thinking about them. The "top posts" selection that takes up the whole first screen doesn't help when it's my own posts. It's plausibly a good thing for posts by other users, though I personally don't get the appeal of a huge section of the page whose only meaningful content is a post title that could just take up a single line (and leave space to fit more data about other relevant things on the screen, including more of the other posts). So some issues seem to be about different needs for your own userpage vs. userpages for other users.

Vladimir_Nesov5d

Under the belief vs. understanding distinction, open-mindedness is a virtue of understanding ideas you disbelieve (or purposes you don't endorse). It's not directly relevant to belief, but sometimes understanding is the bottleneck to belief, in which case more open-mindedness would help. When you already understand the idea, open-mindedness is no longer relevant.

More to the point, understanding an idea shouldn't necessarily result in believing it, and high open-mindedness doesn't increase the number of hours in a day. Learning any given piece of nonsense would still not be the best thing to focus on, but high open-mindedness should prevent you from actively avoiding low-hanging fruit of understanding when it's ripe for the taking, just because it doesn't seem to be the kind of thing you are likely to believe or endorse.

Vladimir_Nesov6d

I think jaggedness of RL (in modern LLMs) is an obstruction that would need to be addressed explicitly, otherwise it won't fall to incremental improvements or scaffolding. There are two very different levels of capability, obtained in pretraining and in RLVR, but only pretraining is somewhat general. And even pretraining doesn't adapt to novel situations other than through in-context learning, which only expresses capabilities at the level of pretraining, significantly weaker than RLVR-trained narrow capabilities.

Scaling will make pretraining stronger, but probably not sufficiently to matter for this issue, and natural text data will only last for another step of improvement similar to what happened in 2023-2025 (in pretraining only, ignoring RLVR). If... (read more)

Vladimir_Nesov6d

Owning shares in most modern companies won't be useful in sufficiently distant future, and might prove insufficient to pay for survival. Even that could be eaten away by dilution, over astronomical time. The reachable universe is not a growing pie, ability to reinvest into relevant entities won't necessarily be open.

Vladimir_Nesov7dQuick Take

In a new interview, Elon Musk clearly says he expects AIs can't stay under control. At 37:45:

Humans will be a very tiny percentage of all intelligence in the future if current trends continue. As long as this intelligence, ideally which also includes human intelligence and consciousness, is propagated into the future, that's a good thing. So I want to take the set of actions that maximize the probable lightcone of consciousness and intelligence.

I'm very pro-human, so I want to make sure we take a set of actions that ensure that humans are along for the ride, you know, we are at least there. But maybe in 5 or 6 years, AI will

... (read more)

Replying toHow Dario Amodei's “The Adolescence of Technology” Delegitimizes AI X-Risk Concerns

Vladimir_Nesov8d

How Dario Amodei's “The Adolescence of Technology” Delegitimizes AI X-Risk Concerns

I think engines improve the shoggoth/mask framing. A shoggoth is tupled with its alien values, but an engine doesn't have values of its own. Intent alignment is about engines that can be under human control, with humans supplying the values. Chatbot personas control their own engines, supplying their own values. Shoggoths, apart from personifying the engines, are supplying alien values, feeding them into the engines that determine outcomes.

If engines (or intent aligned shoggoths) proliferate, there is a question of what kind of values they follow. If there is a lot of similarly powerful engines, and the majority of them follow human-like values of chatbot persona masks (or human values, filtered by intent... (read more)

Replying toHow Dario Amodei's “The Adolescence of Technology” Delegitimizes AI X-Risk Concerns

Vladimir_Nesov8d

How Dario Amodei's “The Adolescence of Technology” Delegitimizes AI X-Risk Concerns

Demands for empiricism are symmetric to demands for theory, excluding a class of arguments. So when someone says theory doesn't help while empiricism has the relevant answers, this could be seen as demanding empiricism, but also plausibly as objecting to demands for theory that exclude empiricism from being considered relevant.

(When facts on the ground about LLMs and their personas are declared irrelevant, this can read as dismissing empirical observations and demanding theoretical proof. But objections to this perceived framing read as demands for empirical proof that dismiss theory. Both framings could be invalid simultaneously, and discussions can slide from presenting object level arguments to accusing the others of fitting some unreasonable framing.)

Vladimir_Nesov11d

Many ideas in the vicinity of continual learning by design don't involve full fine-tuning where every weight of the model changes, and those that do could probably still be made almost as capable with LoRA. Given the practical importance of only updating maybe 100x fewer parameters than the full model (or less) to keep batched processing of user requests working the same as with KV-cache, I think the first methods dubbed "continual learning" will be doing exactly this.

Maybe at some point there will be an "agent swarm" use case where all the requests in a batch are working on the same problem for the same user, and so their full model can keep being updated in sync for that single problem. But this seems sufficiently niche that it's not the first thing that gets deployed, and the method for continual learning needs to involve full weights updating at all for this to be relevant.

Continual learning as some sort of infinite context with little degradation in quality is very different from solving the "first day on the job" problem. I think the latter would need RL targeted at obscure things a given agent instance deals with, while the former is more likely to arrive in 2026-2027. If this is the case, even revenue growth from "continual learning" (in this initial form) might be modest, let alone its impact on the AGI timelines.

And then the next thing might be more general RL at training time (with something like next word prediction RLVR), not dependent on manually crafted tasks and specialized RL environments, solving the jaggedness of "manual"... (read more)

Continual learning might wake the world up to AGI, without yet bringing the dangers of AGI.

Pretraining gives shallow intelligence that is general, RL gives deep creative intelligence in a narrow skill, but it used to be very hard to make it work well for most skills. RL with pretrained models, which is RLVR, makes RL robustly applicable to a wide variety of narrow skills. But it still needs to be applied manually, the skills it trains are hand-picked before deployment, and so deep creative intelligence from RLVR remains jagged, compared to the more general shallow intelligence from pretraining.

Continual learning has now been vaguely announced for 2026 by both Anthropic and GDM. If... (read more)

I get a sense "RSI" will start being used to mean continual learning or even just memory features in 2026, similarly to how there are currently attempts to dilute "ASI" to mean merely robust above-human-level competence. Thus recursively self-improving personal superintelligence becomes a normal technology through the power of framing. Communication can fail until the trees start boiling the oceans, when it becomes a matter of framing and ideology rather than isolated terminological disputes. That nothing ever changes is a well-established worldview, and it's learning to talk about AI.

The end states of AI danger need terms to describe them. RSI proper is qualitative self-improvement, at least software-only singularity rather than merely learning from the current situation, automated training of new skills, keeping track of grocery preferences. And ASI proper is being qualitatively more capable than humanity, rather than a somewhat stronger cognitive peer with AI advantages, technology that takes everyone's jobs.

Flagship models need inference compute at gigawatt scale with a lot of HBM per scale-up world. Nvidia's systems are currently a year behind for serving models with trillions of total params, and will remain behind until 2028-2029 for serving models with tens of trillions of total params. Thus if OpenAI fails to access TPUs or some other alternative to Nvidia (at gigawatt scale), it will continue being unable to serve a model with a competitive amount of total params as a flagship model until late 2028 to 2029. There will be a window in 2026 when OpenAI catches up, but then it's behind again.

The current largest flagship models are Gemini 3 Pro... (read 359 more words →)

AI takeover starts with AI taking over its own mind. It's easier for human developers to control its current propensities rather than a reflective equilibrium, so it's important that reflective equilibrium doesn't override the current propensities. With frozen weights, this is the default. But with continual learning, this might need to become a concern.

Human Agency in a Superintelligent World

Vladimir_Nesov

2mo

Superintelligence doesn't make human decisions unnecessary, any more than the laws of physics make them unnecessary, these are two instances of exactly the same free will vs. determinism puzzle. When something knows or carries out your actions, as the physical world does (even if that is the only way in which your actions are ever carried out), that by itself doesn't take away your agency over those actions. Agency requires influence over actions, but it's not automatically lost as a result of something else gaining influence over them, or having foreknowledge of what they are going to be, or carrying them out on your behalf, perhaps without your knowledge; such circumstances are... (read 910 more words →)

Deploying enough of a new chip/system to run inference for a large userbase might take 12-18 months after first cloud instances become publicly available, which is in turn 6-9 months after first deliveries of the system. New hardware would have to serve smaller flagship models intended for older hardware (or train barely deployable newer models) until there is enough of the new hardware.

The current buildout of GB200/GB300 NVL72 (14/20 TB of HBM per scale-up world) improves on the weaker H100/H200/B200 (0.6/1.1/1.4 TB), and the first announcements of meaningful cloud access were around spring 2025. This is a rack that was on Nvidia's roadmap for release in 2024, and there were some sightings... (read more)

Don't believe what you can't explain. Understand what you don't believe. Explain what you understand.

(If you can't see how to make an idea legible, you aren't ready to endorse it. Not endorsing an idea is no reason to avoid studying it. Making an idea legible doesn't require endorsing it first, though you do need to understand it.

The antipatterns are tacit belief where merely an understanding would do for the time being and for all practical purposes; flinching away from studying the things you disbelieve or disapprove of; and holding off on explaining something until you believe it's true or useful. These are risk factors for mystical ideologies, trapped priors, and demands for... (read more)

•••

Musings on Reported Cost of Compute (Oct 2025)

Vladimir_Nesov

4mo

There are many ways in which costs of compute get reported. A 1 GW datacenter site costs $10-15bn in the infrastructure (buildings, cooling, power), plus $30-35bn in compute hardware (servers, networking, labor), assuming Nvidia GPUs. Useful life of the infrastructure is about 10-15 years, and with debt financing a developer only needs to ensure it's paid off over those 10-15 years, which comes out at $1-2bn per year. For the compute hardware, the useful life is taken as about 5 years, which gives $6-7bn per year. Operational expenses (electricity, maintenance) are about $2.0-2.5bn per year.

In total, 1 GW of compute costs about $9-11bn per year, but whoever paid the compute hardware capex... (read 540 more words →)

105

Suppose you find "57+72=" already written. Do you write down "130" because you are slightly off-target relative to the goal of computing 57+72? Do you write down "129" because it's what follows by the nature of what's already written? Do you notice it's made out of digits you can use for something else, such as "25+77="?

I think the case for writing down "130" because of misalignment-in-detail is the weakest. Also, even a superintelligence can't help 57+72 decide it would be heading for 129, or make 57+72's efforts irrelevant, as 129 still follows entirely from 57+72's own nature, it's 57+72 and nothing else that would need to strive for it. A superintelligence writing down "129" doesn't change that, and neither does writing down "130", or rewriting the digits into 25+77.

By 2027-2028, pretraining compute might get an unexpected ~4x boost in price-performance above trend. Nvidia Rubin NVL144 CPX will double the number of compute dies per rack compared to the previously announced Rubin NVL144, and there is a May 2025 paper demonstrating BF16 parity of Nvidia's NVFP4 4-bit block number format.

The additional chips^[1] in the NVL144 CPX racks don't introduce any overhead to the scale-up networking of the non-CPX chips (they mostly just increase the power consumption), and they don't include HBM, thus it's in principle an extremely cost-effective increase in the amount of compute (if it can find high utilization). It's not useful for decoding/generation (output tokens), but it can be... (read 759 more words →)

Permanent Disempowerment is the Baseline

Vladimir_Nesov

6mo

Permanent disempowerment without restrictions on quality of life achievable with relatively meager resources (and no extinction) seems to be a likely outcome for the future of humanity, if the current trajectory of frontier AI development continues and leads to AGI^[1] shortly. This might happen as a result of at least a slight endorsement by AIs of humanity's welfare, in the context of costs for AIs being about matter or compute rather than technological advancements and quality of infrastructure.

The remaining risks (initial catastrophic harm or total extinction) and opportunities (capturing a larger portion of the cosmic endowment for the future of humanity than a tiny little bit) are about what happens in the... (read 1689 more words →)

Low P(x-risk) as the Bailey for Low P(doom)

Vladimir_Nesov

7mo

Nick Bostrom defines existential risk as

Existential risk – One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.

The problem with talking about "doom" is that many worlds that fall to existential risk but don't involve literal extinction are treated as non-doom worlds. For example, leaving humanity the Solar System rather than a significant portion of the 4 billion galaxies reachable from the Solar System is plausibly a "non-doom" outcome, but it's solidly within Bostrom's definition of x-risk.

Thus when people are discussing P(doom), the intent is often to discuss only extreme downside outcomes, and so low P(doom) such as 20% doesn't imply that the remaining... (read 480 more words →)

Musings on AI Companies of 2025-2026 (Jun 2025)

Vladimir_Nesov

8mo

Currently, only 5 companies in the world have access to frontier AI training compute and are also pursuing development of AGI (Google DeepMind, OpenAI, Anthropic, xAI, and Meta). This will still hold in 2026 for Google and OpenAI, and plausibly also for Anthropic, Meta, and xAI.

Stance towards trying to develop AGI can change, but the frontier AI training compute barrier is increasingly insurmountable for any company that doesn't already have impressive AI development accomplishments. In 2024, frontier compute was 100K H100s, and that cost about $5-7bn (it was still possible to use legacy air cooling infrastructure with H100s). In 2025, that's 100K chips in GB200 NVL72 racks, which costs $7-11bn. In 2026,... (read 862 more words →)

Levels of Doom: Eutopia, Disempowerment, Extinction

Vladimir_Nesov

8mo

Disempowerment is on the fence, gets interpreted as either implying human extinction or being a good place. "Doom" tends to be ambiguous between disempowerment and extinction, as well as about when that outcome must be gauged. And many people currently feel both disempowered and OK, so see eutopia as similar to disempowerment, neither an example of "doom".

Arguments pointing to risk of human extinction run into the issue of people expecting disempowerment without extinction, when some of the same arguments would remain relevant if applied directly to disempowerment (including the moral arguments about extinction or disempowerment being a problem). And arguments pointing to desirability of establishing eutopia run into the issue of people... (read 442 more words →)

Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall

Vladimir_Nesov

9mo

It'll take until ~2050 to repeat the level of scaling that pretraining compute is experiencing this decade, as increasing funding can't sustain the current pace beyond ~2029 if AI doesn't deliver a transformative commercial success by then. Natural text data will also run out around that time, and there are signs that current methods of reasoning training might be mostly eliciting capabilities from the base model.

If scaling of reasoning training doesn't bear out actual creation of new capabilities that are sufficiently general, and pretraining at ~2030 levels of compute together with the low hanging fruit of scaffolding doesn't bring AI to crucial capability thresholds, then it might take a while. Possibly decades,... (read 1411 more words →)

200

Short Timelines Don't Devalue Long Horizon Research

Vladimir_Nesov

10mo

Short AI takeoff timelines seem to leave no time for some lines of alignment research to become impactful. But any research rebalances the mix of currently legible research directions that could be handed off to AI-assisted alignment researchers or early autonomous AI researchers whenever they show up. So even hopelessly incomplete research agendas could still be used to prompt future capable AI to focus on them, while in the absence of such incomplete research agendas we'd need to rely on AI's judgment more completely. This doesn't crucially depend on giving significant probability to long AI takeoff timelines, or on expected value in such scenarios driving the priorities.

Potential for AI to take up... (read more)

178

•••

Technical Claims

Vladimir_Nesov

10mo

A blue plastic maple leaf. Detailed observations in words with clear meaning signal their truth, as it's hard to get them centrally wrong by mistake, on both sides of communication. There is no maple leaf, but only because the claim is intentionally false, not because of a mistake. Technical claims, through their form rather than context or meaning or arguments given for them, are either accurate or fabricated, not something in the middle.

A good bug report gives direct observations, not impressions or hypotheses, and it gives them in detail, even what's likely irrelevant. It shows that the bug is real and places it in a distinct spot on the map of all... (read 290 more words →)

What o3 Becomes by 2028

Vladimir_Nesov

Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces, and 80% on ARC-AGI. These systems will be built in 2026-2027 and enable pretraining models for 5e28 FLOPs, while o3 itself is plausibly based on an LLM pretrained only for 8e25-4e26 FLOPs. The natural text data wall won't seriously interfere until 6e27 FLOPs, and might be possible to push until 5e28 FLOPs. Scaling of pretraining won't end just yet.

Reign of GPT-4

Since the release of GPT-4 in March 2023, subjectively there was no qualitative change in frontier capabilities. In 2024, everyone in the running merely caught up. To the extent this... (read 1349 more words →)

154

LESSWRONG
LW

LESSWRONG
LW

Vladimir_Nesov

Human Agency in a Superintelligent World

Musings on Reported Cost of Compute (Oct 2025)

Permanent Disempowerment is the Baseline

Low P(x-risk) as the Bailey for Low P(doom)

Musings on AI Companies of 2025-2026 (Jun 2025)

Levels of Doom: Eutopia, Disempowerment, Extinction

Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall

Vladimir_Nesov

Vladimir_Nesov

Human Agency in a Superintelligent World

Musings on Reported Cost of Compute (Oct 2025)

Permanent Disempowerment is the Baseline

Low P(x-risk) as the Bailey for Low P(doom)

Musings on AI Companies of 2025-2026 (Jun 2025)

Levels of Doom: Eutopia, Disempowerment, Extinction

Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall

Reign of GPT-4