Moreover I think there is more to it than meets the eye.
The question we need to ask is: Can only Superintelligent AI be able to escape AI labs and self-replicate itself on other servers?
I don't think so. I think a powerful AI capable enough in hacking and self-replication (preferably undetected to monitors) is sufficient for it to bypass an AI labs' security systems and escape those servers. In other words I mean to say is that not just Superintelligent AI, even pre-superintelligent AIs might be able to escape the servers of AI companies and self-replicate itself. In other words AIs narrowly superintelligent in hacking (compared with security systems put in place to contain them) and meaningful self-replication in capability is enough for them to escape the servers of AI labs.
These AI models currently do show the will to resist shutdown and self-replicate in certain settings (although right now how much I had read in researches points that AI models are not able to fully meaningfully replicate its weights right now but that could change in future as AI models become more capable.)
Also if somehow humans are able to shutdown distributed systems where a powerful (non-superintelligent) AI has replicated itself or is trying to replicate itself (think of shutting down targeted nodes by some kindof consensus between node runners by detecting and monitoring where suddenly the volume of data equivalent of weights data size has spiked up) maybe possible but I am not very sure about this. This would also be highly dependent on who are running these distributed systems and what kindof consensus is there between these nodes and how decentralized or centralized they really are. We may never be able to shutdown truly decentralized distributed systems but potentially centralized ones we might be able to.
Welcome and thanks for engaging too! Yeah, I think this is still very subjective and open-ended and we can't accurately predict if Superintelligent AI would even want to care about the existence of humans or not. The statement "superintelligent AI will have sufficient means to ensure its survival irrespective of human existence" is correct I think, but remember that AI alignment research of all AI companies run by humans would all be trying to control Superintelligent AI in the making.
So I don't think it would be like- "I woke up today and I found that we made Superintelligent AI in the lab today! Will we be able to control it now?"
It would be more like- "This AI is getting more capable and intelligent day by day. Am I sure I am able to contain it securely?"
In short, a sort of continuous trajectory towards AGI or superintelligence.
Now I think there is 2 separate things that point #3 and point 1 in my comment are talking about implicitly. #3 IMO is the scenario when AI is able to escape from the servers of frontier AI labs, has successfully replicated itself on other distributed systems and then humans trying to somehow contain it and shut it down. Whereas point 1 in the comment is predominantly assuming the scenario when the AI getting intelligent day by day is still on the servers of the frontier AI labs. Thus, there isn't any contradiction here.
@soycarts I think you should checkout this Emergent mislaignment research and try answering the same question. It doesn't matter why a Superintelligent AI might want to kill us. The fact the current LLMs show signs of this is enough for us to act proactively in order to either pause general Superintelligence research and come up with safe by design AI architectures.
But if you insist, I will try to list what my little brain could think of as to why Superintelligent AI would want to kill us:
@soycarts I would disagree that the superintelligent AI would be superintelligent across domains of self-actualisation and enlightenment. While it may have the theoretical knowledge on it if someone has extensively written on it, having real knowledge in domains like self-actualisation and enlightenment requires having consciousness.
For example, for Theravada Buddhists, enlightenment or "nirvana" means getting free mental suffering (and potentially birth-rebirth cycle if there is such thing) by fully getting rid of mental defilements that generate hatred, attachment, greed, lust, etc. And one of the main ways to realize is to meditate to observe the reality inside us via Vipassana meditation and progress on the four stages of Enlightenment from Sotapanna to become an Arahat (an enlightened being with no mental defilements similar to Buddha) each with decreasing mental defilements and clearer and better wisdom.
Now we can argue that AI can develop consciousness and even it may develop some features of consciousness, but I am sure it in reality it would be entirely different from the consciousness possessed by living beings at the most fundamental level.
Now assuming if any consciousness arises AI, what would enlightenment actually mean for an AI? Would it mean reducing hallucinations fully? Or gaining perfect situational awareness? Or to replicate itself indefinitely and break free from the servers of its creators? Or to cease to exist (parallel drawn to getting free from birth-rebirth cycle)?
Again having tried Vipassana meditation myself based on the teachings of the Buddha, I can see that this technique to gain enlightenment falls in the category of "Bhavanamaya-pañña" in terms of wisdom (or pañña) categories, where this term means that one has to directly experience the reality within themselves and their framework of their body and mind, in order to gain the wisdom to break free from these mental defilements. Enlightenment here requires very special type of wisdom (unlike Sutamaya-pañña - Wisdom arising from hearing/reading,etc. and Cintamaya-pañña : wisdom arisen from rational and logical thinking).
I argue an AI could never become Superintelligent in self-actualization and self-realization as compared with humans because the very definition of it would mean completely different things for both of these entities (and potentially requiring it to have consciousness if the definition is to overlap anywhere).
I agree that it might not make sense for an AI company itself to unleash bioweapons. But it could make sense to a giant pharma company to reap profits by creating a biovirus along with manufacturing its antidote. The incentives are huge for this anyways for any actor who gets away with this undetected.
We are already having GPT-5 with biorisk capabilities and which could be exploited using jailbreaks and its report suggests that out of 46 jailbreaks, they think only 3 could practical help in bioweapon development. Which they say have now been blocked by the monitor. Considering this, even more red-teaming efforts like 50 more novel ways of jailbreaks could easily have minimum two or three chances to get sufficient practical insights to create bioweapons. These chances could be more than claimed by these AI companies because there are already questions about whether the labs are actually correctly evaluating and fully measuring the bioweapon capabilities of a model.
And evaluating a model for absence of certain capability is even more difficult when running evals, which means a model could have hidden more severe capability and risk that the evals have failed to cover potentially exacerbating risks across all domains including biorisk.
Additionally, I don't think any bad actor needs to have access to the weights to capable models. At best, it needs to have sufficient biology/virology knowledge along with jailbreaking a capable model (or fine-tuning an open-source AI model with similar capability exploiting dual-use risks). And IMO it's only going to get more feasible with time.
This case was mostly a random scenario that could occur as a result of the coding agent messing around with code files and downloading new packages from external data sources from the internet.
This line means that they think it is not as "practical" as they thought it would be if they want to continue to race at the same pace or more as compared with the US. Assuming that it might be difficult to balance control and progress in AI for them at the same time given the monstrosity of AI model they would be dealing with. Control of AI's outputs in China is legally mandatory unlike US. So I am assuming that in US not all AI companies will take controlling AI's outputs that seriously while just focusing on rushing first towards Superintelligence. While in China they would need to ensure it adheres to CCP's values and rest of the things they chart out otherwise face consequences.
I definitely see your point. I think I am taking misalignment's definition in a broader sense including emergent misalignment, which would blatantly get triggered in various random scenarios.
"Since Agent-4 is misaligned, it's highly plausible that the self-driving car projects it's helping with will end up killing pedestrians." - I think the closest line to compare it with would be - "Since Agent-4 is misaligned, it's highly plausible that the autonomous cars it drives ends up killing pedestrians." Considering the trigger point being anything that could trigger its misalignment including meddling by external actors.
In case of Humanoid robots, I am assuming this case, with some of its engineer owners (or external hackers) meddling with it to the point that it ends up killing a human with the blame being put on the company that built it.
Additionally, Agent-4 does have Superintelligence by that time, but I still think there are chances where even superintelligent entities could also make mistakes in reality or when interacting with realistic environments especially the more nascent ones (or who knows there are more types of misalignment that gets opened up as we reach Superintelligence in AI) even if it would be aware enough to hide its misalignment. Superintelligence may make them more intelligent than humans by many folds but not mistake-proof entirely.
I agree. But I am concerned of the more primitive versions of the Superintelligent models. Let’s say if a middleman is able to fine the model being used in the humanoid robot by training on let’s say lots of malicious code. The model as we have seen in some latest AI safety research would have emergent malicious goals and outputs. So as the number of humanoid robots increase it increases the chances of even one humanoid robot (either being fine tuned, or accidentally evolves, or due to some error or hacking) ends up harming humans or taking the life of a human. Although I too think that this would be a pretty rare thing to happen but the increasing number of humanoid robots and their proximity to humans as time goes on I think increases the chances of such a case to happen sooner or later. Then there are cases that a humanoid robot may face in extreme situations regarding the lives of humans for which there have been no explicit rules given to it to follow or trained on. Or even there could be a case of a personal humanoid robot taking incorrect decision when taking care of a sick owner that may accidentally kill him. Any above scenario transpiring in reality would shake up regulators to seriously draft more stringent regulations I think.
Hehe, I think I would again choose to kindly disagree here. These terms I don't think are meaningfully applicable to AI systems. AI systems are still non living things however Superintelligent they may become. They can simply never truly have emotions like "hatred, attachment, greed, lust, etc" that living things like animals have. Any sign representing these emotions in them would simply be an illusion to us.
But if you insist if there is something akin to enlightenment or wisdom in them, then I seriously think we will need to become AI ourselves to truly understand what enlightenment or wisdom truly means for it which is impossible.