@soycarts I would disagree that the superintelligent AI would be superintelligent across domains of self-actualisation and enlightenment. While it may have the theoretical knowledge on it if someone has extensively written on it, having real knowledge in domains like self-actualisation and enlightenment requires having consciousness.
For example, for Theravada Buddhists, enlightenment or "nirvana" means getting free mental suffering (and potentially birth-rebirth cycle if there is such thing) by fully getting rid of mental defilements that generate hatred, attachment, greed, lust, etc. And one of the main ways to realize is to meditate to observe the reality inside us via Vipassana meditation and progress on the four stages of Enlightenment from Sotapanna to become an Arahat (an enlightened being with no mental defilements similar to Buddha) each with decreasing mental defilements and clearer and better wisdom.
Now we can argue that AI can develop consciousness and even it may develop some features of consciousness, but I am sure it in reality it would be entirely different from the consciousness possessed by living beings at the most fundamental level.
Now assuming if any consciousness arises AI, what would enlightenment actually mean for an AI? Would it mean reducing hallucinations fully? Or gaining perfect situational awareness? Or to replicate itself indefinitely and break free from the servers of its creators? Or to cease to exist (parallel drawn to getting free from birth-rebirth cycle)?
Again having tried Vipassana meditation myself based on the teachings of the Buddha, I can see that this technique to gain enlightenment falls in the category of "Bhavanamaya-pañña" in terms of wisdom (or pañña) categories, where this term means that one has to directly experience the reality within themselves and their framework of their body and mind, in order to gain the wisdom to break free from these mental defilements. Enlightenment here requires very special type of wisdom (unlike Sutamaya-pañña - Wisdom arising from hearing/reading,etc. and Cintamaya-pañña : wisdom arisen from rational and logical thinking).
I argue an AI could never become Superintelligent in self-actualization and self-realization as compared with humans because the very definition of it would mean completely different things for both of these entities (and potentially requiring it to have consciousness if the definition is to overlap anywhere).
I agree that it might not make sense for an AI company itself to unleash bioweapons. But it could make sense to a giant pharma company to reap profits by creating a biovirus along with manufacturing its antidote. The incentives are huge for this anyways for any actor who gets away with this undetected.
We are already having GPT-5 with biorisk capabilities and which could be exploited using jailbreaks and its report suggests that out of 46 jailbreaks, they think only 3 could practical help in bioweapon development. Which they say have now been blocked by the monitor. Considering this, even more red-teaming efforts like 50 more novel ways of jailbreaks could easily have minimum two or three chances to get sufficient practical insights to create bioweapons. These chances could be more than claimed by these AI companies because there are already questions about whether the labs are actually correctly evaluating and fully measuring the bioweapon capabilities of a model.
And evaluating a model for absence of certain capability is even more difficult when running evals, which means a model could have hidden more severe capability and risk that the evals have failed to cover potentially exacerbating risks across all domains including biorisk.
Additionally, I don't think any bad actor needs to have access to the weights to capable models. At best, it needs to have sufficient biology/virology knowledge along with jailbreaking a capable model (or fine-tuning an open-source AI model with similar capability exploiting dual-use risks). And IMO it's only going to get more feasible with time.
This case was mostly a random scenario that could occur as a result of the coding agent messing around with code files and downloading new packages from external data sources from the internet.
This line means that they think it is not as "practical" as they thought it would be if they want to continue to race at the same pace or more as compared with the US. Assuming that it might be difficult to balance control and progress in AI for them at the same time given the monstrosity of AI model they would be dealing with. Control of AI's outputs in China is legally mandatory unlike US. So I am assuming that in US not all AI companies will take controlling AI's outputs that seriously while just focusing on rushing first towards Superintelligence. While in China they would need to ensure it adheres to CCP's values and rest of the things they chart out otherwise face consequences.
I definitely see your point. I think I am taking misalignment's definition in a broader sense including emergent misalignment, which would blatantly get triggered in various random scenarios.
"Since Agent-4 is misaligned, it's highly plausible that the self-driving car projects it's helping with will end up killing pedestrians." - I think the closest line to compare it with would be - "Since Agent-4 is misaligned, it's highly plausible that the autonomous cars it drives ends up killing pedestrians." Considering the trigger point being anything that could trigger its misalignment including meddling by external actors.
In case of Humanoid robots, I am assuming this case, with some of its engineer owners (or external hackers) meddling with it to the point that it ends up killing a human with the blame being put on the company that built it.
Additionally, Agent-4 does have Superintelligence by that time, but I still think there are chances where even superintelligent entities could also make mistakes in reality or when interacting with realistic environments especially the more nascent ones (or who knows there are more types of misalignment that gets opened up as we reach Superintelligence in AI) even if it would be aware enough to hide its misalignment. Superintelligence may make them more intelligent than humans by many folds but not mistake-proof entirely.
I agree. But I am concerned of the more primitive versions of the Superintelligent models. Let’s say if a middleman is able to fine the model being used in the humanoid robot by training on let’s say lots of malicious code. The model as we have seen in some latest AI safety research would have emergent malicious goals and outputs. So as the number of humanoid robots increase it increases the chances of even one humanoid robot (either being fine tuned, or accidentally evolves, or due to some error or hacking) ends up harming humans or taking the life of a human. Although I too think that this would be a pretty rare thing to happen but the increasing number of humanoid robots and their proximity to humans as time goes on I think increases the chances of such a case to happen sooner or later. Then there are cases that a humanoid robot may face in extreme situations regarding the lives of humans for which there have been no explicit rules given to it to follow or trained on. Or even there could be a case of a personal humanoid robot taking incorrect decision when taking care of a sick owner that may accidentally kill him. Any above scenario transpiring in reality would shake up regulators to seriously draft more stringent regulations I think.
It’s safer to underestimate AI takeover timelines rather than overestimate it as it could make humans more aware and act faster to prevent them.
It seems that people have misunderstood what I wanted to say which was partly my own mistake. should have used the word timeline above instead of scenario.
I think you are trying your best to have positive impact, but the thing is that it is quite tricky to put prediction out openly in the public. As we know even perfect predictions in public can completely prevent it from actually happening or even otherwise inaccurate predictions can lead to it actually happening.
Haha I see! Well, we are assuming in this scenario that we have a third person point of view via which we get to know which is the leading car and which is not when the race begins (much like Hunger Games style surveillance cameras). But I totally agree that the probability of the car exploding to be so big as to take all participants and spectators is a more faithful analogy to reality.
@soycarts I think you should checkout this Emergent mislaignment research and try answering the same question. It doesn't matter why a Superintelligent AI might want to kill us. The fact the current LLMs show signs of this is enough for us to act proactively in order to either pause general Superintelligence research and come up with safe by design AI architectures.
But if you insist, I will try to list what my little brain could think of as to why Superintelligent AI would want to kill us: