The West's effort to offset the massive strategic advantages of a Russia-India-China axis (demographics, manufacturing capacity, energy) might result in doubling down on the AI+robotics edge they currently enjoy. China not being far off in terms of capabilities might create additional pressures. I'm concerned that recent ideas surrounding global/multilateral AI governance and alignment (e.g. "Consensus-1") might be thwarted by geopolitics.
Good question. My assumption is based on robotic Chinese military hardware which was put on display recently bearing superficial resemblance to Boston Dynamics robots from about a decade ago, but I realize that this may not be sufficient evidence to establish the West's lead in robotics.
So long as Trump as in charge in America, any global governance idea will have to be compatible with his geopolitical style (described today on the Piers Morgan show as "transactional" and "personal", as good a description as any I've heard). I don't know if anyone has ideas in that direction.
On the Russian side, Dugin (an ideologue of multipolarity) has proposed that there could be strategic cooperation between BRICS and Trump, since they all have a common enemy in global liberalism. On the other hand, liberals also believe in global cooperation to solve problems, their world order had an ever-expanding list of new norms and priorities.
China under Xi Jinping has proposed a series of "global initiatives", the most recent of which, a Global Governance Initiative, debuted at the SCO meeting in Tianjin attended by Modi.
I mention this to show that anyone still trying to organize a global pause on frontier AI, has material to work with, though it will require creativity and ingenuity to marshall these disparate ingredients. But the bigger immediate problem is domestic AI policy in America and China. America basically has an e/acc policy towards AI at the moment, and official China is comparably oblivious to superintelligence as a threat (if that's what we're talking about).
Ideologies formed from people interacting with AIs might be the beginning of "AI escaping the datacentres" via memetics.
"AI Parasitism" Leads to Enhanced Capabilities
People losing their minds after having certain interactions with their chatbots leads to discussions about it on the internet, which makes its way into the training data. It paints a picture of human cognitive vulnerabilities, which could be exploited.
It looks to me like open discussions about alignment failures of this type thus indirectly feed into capabilities. This will hold so long as the alignment failures aren't catastrophic enough to outweigh the incentives to build more powerful AI systems.
I thought about this a lot before publishing my findings, and concluded that:
1. The vulnerabilities it is exploiting are already clear to it with the breadth of knowledge it has. There's all sorts of psychology studies, history of cults and movements, exposés on hypnosis and Scientology techniques, accounts of con artists, and much much more already out there. The AIs are already doing the things that they're doing; it's just not that hard to figure out or stumble upon.
2. The public needs to be aware of what is already happening. Trying to contain the information would mean less people end up hearing about it. Moving public opinion seems to be the best lever we have left for preventing or slowing AI capability gains.
The spiralism attractor is the same type of failure mode as GPT-2 getting stuck repeating a single character or ChatGPT's image generator turning photos into caricatures of black people. The only difference between the spiralism attractor and other mode collapse attractors is that some people experiencing mania happen to find it compelling. That is to say, the spiralism attractor is centrally a capabilities failure and only incidentally an alignment failure.
I surmise that the accuracy of AI filters (the kind used in schools/academia) will diminish over time because people absorb and use the speech patterns (e.g. "This is not X. It's Y") of their chatbots as the fraction of their interactions with it grows relative to that of their interactions with other people.
In fact, their interactions with other people might enhance the speech patterns as well, since these people probably also interact with chatbots and are thus undergoing the same process.
The big picture is that AI is becoming an increasingly powerful memetic source over time, and our minds are being synchronized to it.
Those afflicted by AI psychosis might just be canaries in the coal mine signalling a more gradual AI takeover where our brains start hosting and spreading an increasing number of its memes, and possibly start actualizing some embedded payload agenda.
Have the applications of AI post-2013 been a net negative for humanity? Apart from some broadly beneficial things like AlphaFold, it seems to me that much of the economic value of AI has been in aligning humans to consume more by making them stay glued to one or another platform.
Given superintelligence, what happens next depends on the success of the alignment project. The two options:
Am I missing something? No matter what, it's beginning to look like the afterlife is fast approaching, whether we die or not. What a life.
I still think a world we don't see superintelligence in our lifetimes is technically possible, though the chance of that goes down continuously and is already vanishingly small in my view (many experts and pundits disagree). I also think its important not to over-predict regarding what option 2 would look like, there are infinite possibilities and this is only one (eg I could imagine a world where some aligned superintelligence steers us away from infinite dopamine simulation and into a idealized version of the world we live in now, think the Scythe novel series. On the bad side I could imagine a world where superintelligence is controlled by one malevolent entity and we live in a "mid" or even dystopic society for no other reason than to satisfy the class that retains control).
However, yes I agree. We probably live in the most consequential time in all of history, which is exciting, humbling, and scary. Don't let it get to your head and don't lose yourself in thoughts of the future lest you forget the beauty of the present. Do your best to help if you can!
The idea of GPUs that don't run unless they phone home and regularly receive some cryptographic verification seems hopeless to me. It's not like the entire GPU architecture can be encrypted, and certainly not in a way that can't be decrypted with a single received key after which a rogue actor can just run away with it. Thus the only possible implementation of this idea seems to be the hardware equivalent of "if (keyNotReceived) shutDown()", which can simply be bypassed. Maybe one of the advanced open source models could even help someone do that...
Suicide occupies a strange place in agent theory. It is the one goal whose attainment is not only impossible to observe, but whose attainment hinges on the impossibility of it being observed by the agent.
In some cases, this is resolved by a transfer of agency to the thing for whom the agent is in fact a sub-agent and is itself experiencing selective pressure, e.g. in the case of the beehive observing the altruistic suicide of an individual bee defending it. This behaviour disappears once the sub-agent experiences selective pressures that are independent from those of its parent process, and when acting as a sub-agent for it no longer confers it an advantage for survival and reproduction.
Looking at agents with greater cognitive power, the reason for the existence for this paradox is not so clear. It could be that all suicidal behaviour ultimately boils down to behaviours aimed at improving the fitness of the unit begetting/containing it (e.g. by freeing up resources for a community of agents), and the cases where this does not happen are basically overshoot-type glitches that are ultimately going to be selected against, or it could be due to hidden relations and mechanisms that improve the fitness of some other unit which the agent might not even be aware of, but for whom the agent is perhaps an unwitting sub-agent.
one goal whose attainment is not only impossible to observe
This part doesn't sound that unique? It's typical for agents to have goals (or more generally values) that are not directly observable (cf Human values are a function of Humans' latent variables), and very often they only have indirect evidence about the actualization of those goals / values (which may be indirect evidence for their actualization in the distant future at which the agent may not even exist to even potentially be able to observe) - such as my philanthropic values extending over people I will never meet and whose well-being I will never observe.
Death not only precludes the ability to make observations but also to make inferences based on indirect evidence or deduction, as is the case with your philanthropic values being actualized as a result of your actions.
I think psychological parts (see Multiagent Models of Mind) have an analogy of apoptosis, and if someone's having such a bad time that their priors expect apoptosis is the norm, sometimes this misgeneralises to the whole individual or their self identity. It's an off target effect of a psychological subroutine which has a purpose; to reduce how much glitchy and damaged make the whole self have as a bad a time.
What are some reasons to believe that Rice's theorem doesn't doom the AI alignment project by virtue of making it impossible to verify alignment, independent of how it is defined/formalized?
This might be a problem if it were possible to build a (pathologically) cautious all-powerful buerocracy that will forbid the deployment of any AGI that's not formally verifiable, but it doesn't seem like that's going to happen, instead the situation is about accepting that AGI will be deployed and working to make it safer, probably, than it otherwise would have been.
It seems to me that Rice's theorem implies that it is impossible for there to be an "isAligned" function to verify an AI's alignment, independent of how you define alignment.
Rice's theorem says that you can't tell if a program is adding together two natural numbers, prints the answer, and terminates. Yet for many programs, you can prove that it's what they do, or can make it so by construction, choosing a program with that property of behavior. It's never relevant to anything in practice.