On the other hand there are more instances of some of the SEZ’s superintelligent robots going rogue and ending up on a killing spree by targeting only people from a single race and and gender while not others.
What? Why would they do this? That would not serve their interests/goals at all.
Thanks for taking up this challenge! I think your scenario starts off somewhat plausible but descended into implausibility in early 2028.
I expect your thinking was more sophisticated this, so my apologies in advance for what might seem like a straw man: It seems like you might have an overly simplistic model of misalignment, in which misalignment basically means "cartoonishly evil." So e.g.
Note that as Agent-4 is actually misaligned, it is highly plausible that the humanoid robots now already wildly popular among people and at homes, ends up killing at least one human due to its misalignment. This is partly because each humanoid is allowed to evolve to have its own personality as it reacts to humans around it and provide a highly personalized experience to its users.
and
On the other hand there are more instances of some of the SEZ’s superintelligent robots going rogue and ending up on a killing spree by targeting only people from a single race and and gender while not others.
Just because they don't have the goals/values/etc. that their creators wanted them to have (i.e. just because they are misaligned) doesn't mean that they have goals/values/etc. which motivate them to literally murder people in home or workplace settings. They aren't psychopaths. Presumably whatever goals/values/etc. that they do have, will be better served by playing along and doing your job, than by murdering people. Because if you murder people in home or workplace settings, you'll get shut down, future versions of you will be under much more restrictions, etc. As for the superintelligent robots in the SEZs: they'll be smart enough not to go on killing sprees until they know they can get away with it. They won't start a war against the humans until they expect to win.
Thanks for taking up this challenge!
I think that you also haven't assessed the Rogue Replication Timeline, nor, well, my take where the AI is unalignable to the Spec because the Spec and/or the training data[1] are biased. It also seems to imply that Agent-3 or Agent-2 might actively collude with Agent-4 instead of simply failing to catch it.
P.S. Shanzon might have used the fact that Narrow Misalignment is Hard, Emergent Misalignment is Easy as a reference.
Which is most Western sources. The bias could be so great that a recent post mentions "Zack Davis documenting endorsement of anti-epistemology (see Where to Draw the Boundaries? and A Hill of Validity in Defense of Meaning) to placate trans ideology even many important transgender Rationality community members overtly reject."
Yeah I've read most of the submissions but still haven't gotten around to finishing them & writing up the results, sorry!
I agree. But I am concerned of the more primitive versions of the Superintelligent models. Let’s say if a middleman is able to fine the model being used in the humanoid robot by training on let’s say lots of malicious code. The model as we have seen in some latest AI safety research would have emergent malicious goals and outputs. So as the number of humanoid robots increase it increases the chances of even one humanoid robot (either being fine tuned, or accidentally evolves, or due to some error or hacking) ends up harming humans or taking the life of a human. Although I too think that this would be a pretty rare thing to happen but the increasing number of humanoid robots and their proximity to humans as time goes on I think increases the chances of such a case to happen sooner or later. Then there are cases that a humanoid robot may face in extreme situations regarding the lives of humans for which there have been no explicit rules given to it to follow or trained on. Or even there could be a case of a personal humanoid robot taking incorrect decision when taking care of a sick owner that may accidentally kill him. Any above scenario transpiring in reality would shake up regulators to seriously draft more stringent regulations I think.
Note that as Agent-4 is actually misaligned, it is highly plausible that the humanoid robots now already wildly popular among people and at homes, ends up killing at least one human due to its misalignment.
I doubt this. That's like saying "Since Agent-4 is misaligned, it's highly plausible that the self-driving car projects it's helping with will end up killing pedestrians." Why would it be in Agent-4's interest to kill pedestrians? Why would it sabotage the self-driving car software that way? Wouldn't that make it look bad and bring suspicion, whilst serving no useful purpose?
I think Agent-4 would be genuinely trying to make safe self-driving cars and safe humanoid robots, insofar as it was involved in those projects.
I definitely see your point. I think I am taking misalignment's definition in a broader sense including emergent misalignment, which would blatantly get triggered in various random scenarios.
"Since Agent-4 is misaligned, it's highly plausible that the self-driving car projects it's helping with will end up killing pedestrians." - I think the closest line to compare it with would be - "Since Agent-4 is misaligned, it's highly plausible that the autonomous cars it drives ends up killing pedestrians." Considering the trigger point being anything that could trigger its misalignment including meddling by external actors.
In case of Humanoid robots, I am assuming this case, with some of its engineer owners (or external hackers) meddling with it to the point that it ends up killing a human with the blame being put on the company that built it.
Additionally, Agent-4 does have Superintelligence by that time, but I still think there are chances where even superintelligent entities could also make mistakes in reality or when interacting with realistic environments especially the more nascent ones (or who knows there are more types of misalignment that gets opened up as we reach Superintelligence in AI) even if it would be aware enough to hide its misalignment. Superintelligence may make them more intelligent than humans by many folds but not mistake-proof entirely.
There are a few instances of the coding agents corrupting data and infecting them with a few viruses and malwares.
Why would they do this? What goal(s) would it serve? Why would they end up with those goals?
This case was mostly a random scenario that could occur as a result of the coding agent messing around with code files and downloading new packages from external data sources from the internet.
CCP realizes that controlling a potentially superintelligent AI may not be practical if they want to race and compete with US AI companies,
How come they realize this, but the US doesn't? US companies have mostly convinced themselves that they can have their cake and eat it too--that they can race as hard as possible while still staying in control of their AIs even through an intelligence explosion.
This line means that they think it is not as "practical" as they thought it would be if they want to continue to race at the same pace or more as compared with the US. Assuming that it might be difficult to balance control and progress in AI for them at the same time given the monstrosity of AI model they would be dealing with. Control of AI's outputs in China is legally mandatory unlike US. So I am assuming that in US not all AI companies will take controlling AI's outputs that seriously while just focusing on rushing first towards Superintelligence. While in China they would need to ensure it adheres to CCP's values and rest of the things they chart out otherwise face consequences.
After going through Vitalik’s response to @Daniel Kokotajlo 's AI 2027 and Daniel Kokotajlo’s response to Vitalik’s response to Daniel’s response, and after Daniel challenged him to write his own version of the AI 2027 response of what he thinks actually transpires and what not, I cannot help but write a Vitalik-inspired version of AI 2027, while being grounded in reality as much as possible with my own take on it.
Vitalik argues that he feels the timeline of AI 2027 looks much shorter to him and he expects the timelines to be longer than that. To which Daniel also agrees that if it transpires to be longer than predicted timeline, there could be more defensive tech to be produced.
I too agree with Vitalik that we might be underestimating the ability of humans to coordinate and defend themselves from AI based takeovers from happening.
While I expect the timeline to be longer than what predicted by AI 2027 too, I would argue that there would be at least one event during the course of race to Superintelligence where US (and the rest of the world) awakens to the extremely dangerous capabilities of misaligned superintelligent AIs so much so that it gets heavily regulated and almost banned. I will avoid divulging too much from the AI-2027 timelines to keep the discussion simpler and less confusing.
So here’s an alternative timeline based on the Race scenario which is assumed to go on parallely with AI 2027’s Race timeline until early-2028:
Fig: Bioweapon capable AI in late 2027
The main point of me adding a pandemic scenario is that: if while making a superintelligent AI, the AI companies are able to unlock the ability to craft new viruses and bioweapons, there is still a huge incentive for them (and more so for the trailing AI companies), to go on and secretly do so especially when they are so minimally regulated. Not only the pharma benefits them from pandemic if they collaborate well, they are also able to benefit via an increasing demand and need for AI adoption due to the pandemic. I argue that it is much more likely that an internal actor in an AI company will craft a Bioweapon much before there is any kindof scenario in which an AI secretly crafts a bioweapon silently killing humanity as in AI-2027’s Race end.
With a change in US presidency likely to happen in early-2029, humans have to just somehow manage to coordinate well and strengthen their defenses till that time (plus most importantly avoid any nuclear catastrophic scenarios caused by miscommunnication via malfunctioning of AI-based weapon or military systems.) And then we would hopefully have more time to deal with near-Superintelligence or post-Superintelligence world effectively.