mishka — LessWrong

Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).

Yes, my strong suspicion is that it’s not fully solved yet in this sense.

The assumption that it can be made “good enough for internal use” is unsurprising. From what we have known for some time now, an expensive installation of this kind which needs a lot of ongoing quality assurance and alignment babysitting as it evolves is quite doable. And moreover, if it’s not too specific, I can imagine releasing access to frozen snapshots of that process (after suitable delays, and only if the lab believes this would not leak secrets (if a model accumulates secrets during continual learning, then it’s too risky to share a snapshot)).

But I don’t think there is any reason to believe that “unattended continual learning” is solved.

On one hand, one would hope they are capable of resisting this pressure (these continual learners are really difficult to control, and even mundane liability might be really serious).

But on the other hand, it might be “not releasable” for purely technical reasons. For example, it might be the case that each installation of this kind is really expensive and requires support of a dedicated competent “maintenance crew” in order to perform well. So, basically, it might be technically impossible without creating a large “consulting division” within a lab in question, with dedicated teams supporting clients, and the labs are likely to think this is too much of a distraction at the moment.

There is a notable slowdown in that progress, however we should note the following (so that we don't overinterpret it):

A lot of gains in this particular competition come from adaptations of the pre-existing research literature (it's not clear how much of non-yet-adopted acceleration is in the pre-existing literature, and it might be quite a lot, but (by definition) the pre-existing literature is a fixed size resource, with its use being subject to saturation, and the "true software intelligence explosion mode" would presumably include creation of novel research, and not just re-use of pre-existing research).
Organizationally, the big slowdown around 3 min coincides with the project organizer being hired by OpenAI, and then no longer contributing (and, for some time, not even reviewing record breaking pull requests). So for a while the project looked dormant. Now it is active again, but it's difficult to say if the level of participation is back to the pre-slowdown level.
One thing which should not be considered "pre-existing" literature is Muon optimizer (which is the child of the project organizer in collaboration with his colleagues and which is probably the most exciting event in the space of gradient-based optimizers since the invention of Adam in 2014; see e.g. https://jeremybernste.in/writing/deriving-muon for a more in-depth look and also Kimi K2 paper, https://arxiv.org/abs/2507.20534, and, in particular, its remarkable Figure 3 Page 5 learning curve). But an event of this magnitude is not a part of a series (it is not an accident that this improvement comes from the project organizer, and not from the "field").

So, yes, it is possible that this curve points to the presence of some saturation effects, but it's difficult to be certain.

That might depend on the use case.

E.g. some software engineers want models to imitate their style and taste closely (and it’s rather difficult at the moment; I think most of Andrej Karpathy’s complaints about relative uselessness of models for the core of his “nanochat” project at https://www.lesswrong.com/posts/qBsj6HswdmP6ahaGB/andrej-karpathy-on-llm-cognitive-deficits boils down to that; here the model needs not just to be smart, it actually needs to “think like Karpathy” in order to do what he wants in that particular case.)

Or if I want a research collaborator, I might want a model to know the history of my thoughts (and, instead, of taking a raw form of those thoughts, I might ask a model to help me to distill them into a resource first, and have the same or a different model to use that resource).

But sometimes I might want a collaborator who is not like me, but like someone else, or a mixture of a few specific people. That requires giving the model a rather different context.

Yes, anthropocentric approaches to a world with superintelligent systems distort reality too much. It’s very difficult to achieve AI existential safety and human flourishing using anthropocentric approaches.

Could one successfully practice astronomy and space flight using geocentric coordinates? Well, it’s not quite impossible, but it’s very difficult (and also aliens would “point fingers at us”, if we actually try that).

More people should start looking for non-anthtopocentric approaches to all this, for approaches which are sufficiently invariant. What would it take for a world of super capable rapidly evolving beings not to blow their planet up? That’s one of the core issues, and this issue does not even mention humans.

A world which is able to robustly avoid blowing itself up is a world which has made quite a number of steps towards being decent. So that would be a very good start.

Then, if one wants to adequately take human interests into account, one might try to include humans into some natural classes which are more invariant. E.g. one can ponder a world order adequately caring about all individuals, or one can ponder a world order adequately caring about all sentient beings, and so on. There are a number of possible ways to have human interests represented in a robust, invariant, non-anthropocentric fashion.

We do see a slowly accelerating takeoff. We do notice the acceleration, and I would not be surprised if this acceleration gradually starts being more pronounced (as if the engines are also gradually becoming more powerful during the takeoff).

But we don’t yet seem to have a system capable of non-saturating recursive self-improvement if people stop interfering into its functioning and just retreat into supporting roles.

What’s missing is mostly that models don’t yet have sufficiently strong research taste (there are some other missing ingredients, but those are probably not too difficult to add). And this might be related to them having excessively fragmented world models (in the sense of https://arxiv.org/abs/2505.11581). These two issues seem to be the last serious obstacles which are non-obvious. (We don’t have “trustworthy autonomy” yet, but this seems to be related to these two issues.)

One might call the whole Anthropic (models+people+hardware+the rest of software) a “Seed AI equivalent”, but without its researchers it’s not there yet.

It sure looks like Metaspeed is smuggling tens of thousands Blackwell chips worth billions of dollars straight into China, or at least they’re being used by Chinese firms, and that Nvidia knew about this. Nvidia and Metaspeed claim this isn’t true throughout the post, but I mean who are you kidding.

MegaSpeed, actually, not "Metaspeed": https://megaspeed.ai/

They seem to be relatively big, but no Wikipedia page. Their ownership history seems to be quite complicated (it looks like they have been created as a Singapore-based subsidiary of a Chinese org in 2023, and then they were transferred from that Chinese org elsewhere, also in 2023). Right now visiting their website triggers a pop-up denying the allegations; other than that it's a rather shiny site of a data center provider.

On one hand, this is an astute observation: cancer (and also aging and mortality in general) are used in a similar fashion as “think about the children” (to justify things which would be way more difficult to justify otherwise).

That’s definitely the case.

However, there are two important object-level differences, and those differences make this analogy somewhat strained. Both of these differences have to do with the “libertarian dimension” of it all.

The opposition to “think about the children” is mostly coming from libertarian impulses, and as such this opposition notes that children are equally hurt (or possibly even more hurt) by “think of the children” measures. So the “ground case” for “think of the children” is false, those measures are not about protecting the children, but about establishing authoritarian controls over both children and adults.

Here is the first object-level difference. Unlike “think about the children”, “let’s save ourselves from cancer” is not a fake goal. Most of us are horrified and tired of seeing people around us dying from cancer, and are rather unhappy about their own future odds in this sense. (And don’t even let me start expressing what I think about aging, and about our current state of anti-aging science. We absolutely have to defeat obligatory aging ASAP.)

And that’s a rather obvious difference. But there is also another difference, also along the dimension of “libertarian values”. “Think of the children” is about imposing prohibition and control, about not letting people (children and adults) do what they want.

Here we are not talking about some evil AI companies trying to prohibit people from doing human-led research. We are talking about people wanting to restrict and prohibit creation of AI scientists.

So, in this sense, it is a false analogy. Mentioning the badness of the “think of the children” approach does first of all appeal to the libertarian impulse within us, the libertarian impulse which reminds us how bad those restrictive measures are, how costly they are for all of us.

The same libertarian impulse reminds us that in this case the prohibitionist pressure comes from the other side. And yes, a case, and perhaps even a very strong case, can be made for the need to restrict certain forms of AI. But I don’t think it makes sense to appeal to our libertarian impulse here.

Yes, it might be necessary to impose restrictions, but let’s at least not pretend that imposing those restrictions is somehow libertarian. (And no, we have to find a way to impose those restrictions in such a fashion that they are consistent with rapid progress against cancer and aging. Sorry to say, but it’s intolerable to keep having so much of both cancer and aging around us. We really can’t agree to postpone progress in these two areas, the scale of ongoing suffering and loss of life is just too much.)

I think this is probably deliberate, even if a bit weird.

This tweet about Mistral is in this thread: https://x.com/teortaxesTex/status/1996801926546313473

This way one can point not just to the root tweet, but also to a particularly relevant branch of the discussion under it (Zvi seems to be using this reference pattern fairly often).

In that particular case, it happened because I wanted to respond to someone with views different from mine (I am a fairly strong proponent of the "merge", of non-invasive brain-computer interfaces, and so on), but at the same time I happened to do it in an open-ended fashion, inviting a dialog and not a confrontation, and so it ended up being quite fruitful, we learned a lot from it and generated plenty of food for thought. This was my comment which started it:

https://www.lesswrong.com/posts/je5BwKe8enCq8DLrm/ai-40-a-vision-from-vitalik?commentId=yjFSoPjrNrtouDano

That should work for topics which are already discussed, at least occasionally.

If the particular views in question are sufficiently non-standard, so that they are not even discussed (or, at least, the angle in question is not even discussed), then it requires a more delicate treatment (and one might not be in a rush to generate a debate; novel, non-standard things need time to mature; moving the "Overton window" is tricky). For example, with my first post on LessWrong, I went through a bunch of drafts, was showing drafts to people around me, cut some things from a version I ended up publishing in order to make it considerably shorter and to improve readability.

It was not immediate big success and did not generate a debate, but it worked as a foundation for a number of my subsequent efforts, and was serving as an important reference point (https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential).

Now, if I want to continue this line of exploration and discussion, I would need to ponder how to go about it (I have written a number of draft texts recently outside of LessWrong, as part of the October-December https://www.lesswrong.com/posts/7axYBeo7ai4YozbGa/halfhaven-virtual-blogger-camp, but the topic of AI existential safety is very delicate, so it's not obvious what is the "correct way" to proceed).

If what you have in mind is as non-standard as this, then how to proceed is fairly non-trivial...

Ah, I see that you are pointing to a specific post, https://www.lesswrong.com/posts/SHryHTDXuoLuykgcu/universities-as-rocket-engines-and-why-they-should-be-less.

OK, I've seen this post before, I skimmed it, I have not voted on it.

My thinking (relevant for pre-AI times, of course) was, "no, specialization is the answer; yes, they are rocket engines, at least in hard sciences, and they work the best when one cuts unnecessary 'mandatory courses' from unrelated disciplines, while leaving students with enough freedom to explore widely if they want to; but mostly, the researchers are often most productive in hardcore disciplines like math and physics while they are young, so help them focus, push higher, more specialized courses, more specialized efforts, diversity within math, within physics, but not by reaching out to humanities". So I seem to be a plausible debate counter-part in this sense.

So, why did not I respond? For several reasons, but, in part, because the turmoil around education is very strong already, with politics, with AI, with questions about relevance, and so on. The AI timelines are short (I think), and education-related timelines are long, so it does not look like we can affect this area too much. It's such a mess already, there are plenty of locally optimal actions available, but a restructuring effort as global as this?

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments