Edit: this post currently has more downvotes than upvotes. The title, though broad, is uncontroversial: everything has its limits. Please point out flawed reasoning or need for further clarification in the comments instead of downvoting.

Let's agree that the first step towards AI alignment is to refrain from building intelligent machines that are designed to kill people. Very simple. As a global community, we need to agree completely on this topic.

Some will provide arguments in favor of intelligent lethal machines, such as the following:

That intelligent weapons kill with more precision, saving innocent lives.

That intelligent weapons do the most dangerous work, saving soldiers' lives.

Both of the above points are clearly valid. However, they do not justify the associated risk: that these machines turn against the humans they were designed to protect.

Currently, leading militaries around the world are developing and using:

-Drone swarms

-Suicide drones

-Assassin drones

-Intelligent AI pilots for fighter jets

-Targeting based on facial recognition

- Robot dogs with mounted guns

Given some of the unpredictable emergent behavior seen in studies and tests, malicious behavior could emerge from future artificial intelligence. If it did, it would have a clear vector of attack through a coordinated takeover of intelligent weapons. Let's agree as a global community to limit our development of intelligent weapons, thus limiting the potential damage from an out-of-control AI.

New Comment
36 comments, sorted by Click to highlight new comments since:

Mod note. (LW mods are trying out moderating in public rather than via PMs. This may feel a bit harsh if you're not used to this sort of thing, but we're aiming for a culture where feedback feels more natural. I think is important to do publicly for a) accountability and b) so people can form a better model of how the LW moderators operate)

I do think globally banning autonomous weapons is a reasonable idea, but the framing of this post feels pretty off.

I downvoted for the first paragraph, which makes an (IMO wrong) assumption that this is the first step towards AI alignment. The paragraph also seems more like it is trying to build social consensus rather than explain information to me. I don't think this is never appropriate on LW but I think it requires a lot more context than this post seems to imply (i.e. this post doesn't mention anything about how autonomous weapons relate to existential risk. I think there's a plausible connection between the two but this post doesn't spell it out or address potential failure modes of confusing the two)

My very similar post had a somewhat better reception, although certainly people disagreed. I think there are two things going on. Firstly, Lucas's post, and perhaps my post, could have been better written.

Secondly, and this is just my opinion, people coming from the orthodox alignment position (EY) have become obsessed with the need for a pure software solution, and have no interest in shoring up civilization's general defenses by banning the most dangerous technologies that an AI could use. As I understand, they feel that focus on how the AI does the deed is a misconception, because the AI will be so smart that it could kill you with a butter knife and no hands.

Possibly the crux here is related to what is a promising path, what is a waste of time, and how much collective activism effort we have left, given time on the clock. Let me know if you disagree with this model.

Yes, the linked post makes a lot of sense: wet labs should be heavily regulated.

Most of the disagreement here is based on two premises:

A: Other vectors (wet labs, etc.) present a greater threat. Maybe, though intelligent weapons are the most clearly misanthropic variant of AI.

B: AI will become so powerful, so quickly, that limiting its vectors of attack will not be enough.

If B is true, the only solution is a general ban on AI research. However, this would need to be a coordinated effort across the globe. There is far more support for halting intelligent weapons development than for a general ban. A general ban could come as a subsequent agreement.

  1. How is the framing of this post "off"? It provides an invitation for agreement on a thesis. The thesis is very broad, yes, and it would certainly be good to clarify these ideas.
  2. What is the purpose of sharing information, if that information does not lead in the direction of a consensus? Would you have us share information simply to disagree on our interpretation of it?
  3. The relationship between autonomous weapons and existential risk is this: autonomous weapons have built-in targeting and engagement capabilities.  If we could make an analogy to a human warrior, in a rogue AI scenario, any autonomous weapons to which the AI gained access would serve as the 'sword-arm' of the rogue AI, while a reasoning model would provide the 'brains' to direct and coordinate it.  The first step towards regaining control would be to disarm the rogue AI, as one might disarm a human, or remove the stinger on a stingray.  The more limited the weaponry that the AI has access to, the easier it would be to disarm.

A high level thing about LessWrong is that we're primarily focused on sharing information, not advocacy. There may be a later step where you advocate for something, but on LessWrong the dominant mode is discussing / explaining it, so that we can think clearly about what's true. 

Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what's true, and pushing for consensus for the sake of coordination regardless of whether you've actually found the right thing to coordinate on.

"What is the first step towards alignment" isn't something there's a strong consensus on, but I don't think it's banning autonomous weapons, for a few reasons:

  • banning weapons doesn't help solve alignment, it just makes the consequences of one particular type of mis-alignment less bad. The first biggest problem with AI alignment is that it's a confusing domain we haven't dealt with before, and I think many first steps are more like "become less confused" than do a particular thing.
  • from the perspective of "hampering the efforts of a soft takeoff", it's not obvious you'd do autonomous weapons vs "dramatically improving security computer systems" or "better controlling wetlabs that the AI could hire to develop novel pathogens". If you ban autonomous weapons the AI can still just hire mercenaries – killer robots help but are neither necessary nor sufficient for an AI takeover.

I bring this up to highlight that we're nowhere near a place where it's "obvious" that this is the first step, and that you can skip to building consensus towards it.

My intent here is to communicate some subtle things about the culture and intent LessWrong, so you can decide whether you want to stick around and participate. This is not a forum for arbitrary types of communication, it's meant to focus on truthseeking first. Our experience is that people who veer towards advocacy-first or consensus-first tend to subtly degrade truthseeking norms in ways that are hard to reverse.

I also think there are a number of object level things about AI alignment you're missing. I think your argument here is a reasonable piece of a puzzle but I wouldn't at all call it "the first step towards AI alignment". If you want to stick around, expect to have a lot to learn.

"Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what's true, and pushing for consensus for the sake of coordination regardless of whether you've actually found the right thing to coordinate on."

  1. Simplifying (abstracting) ideas allows us to use them efficiently.
  2. Coordination allows us to combine our talents to achieve a common goal.
  3. The right thing is the one which best helps us achieve our cause.
  4. Our cause, in terms of alignment, is making intelligent machines that help us.
  5. The first step towards helping us is not killing us.
  6. Intelligent weapons are machines with built-in intelligence capabilities specialized for the task of killing humans.
  7. Yes, a rogue AI could try to kill us in other ways: bioweapons, power grid sabotage, communications sabotage, etc. Limiting the development of new microorganisms, especially with regards to AI, would also be a very good step. However, bioweapons research requires human action, and there are very few humans that are both capable and willing to cause human extinction. Sabotage of civilian infrastructure could cause a lot of damage, especially the power grid, which may be vulnerable to cyberattack. https://www.gao.gov/blog/securing-u.s.-electricity-grid-cyberattacks 
  8. Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.
  9. The more human action that an AI requires to function, the more likely a human will notice and eliminate a rogue AI. Unfortunately, the development of weapons which require less human action is proceeding rapidly.
  10. Suppose an LLM or other reasoning model were to enter a bad loop, maybe as the result of a joke, in which it sought to destroy humanity. Suppose it wrote a program which, when installed by the unsuspecting user, created a much smaller model, and this model used other machines to communicate with autonomous weapons, instructing them to destroy key targets. The damage which arises in this scenario would be proportional to the power and intelligence of the autonomous weapons. Hence, the need to stop developing them immediately.

Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.

I'm wondering how you can hold that position given all the recent social disorder we've seen all over the world where social media driven outrage cycles have been a significant accelerating factor. People are absolutely willing to "take orders from a machine" (i.e. participate in collective action based on memes from social media) in order to "harm their communities" (i.e. cause violence and property destruction).

These memes have been magnified by the words of politicians and media. We need our leaders to discuss things more reasonably. 

That said, restricting social media could also make sense. A requirement for in-person verification and limitation to a single account per site could be helpful.

The existential danger from AI is unrelated to access to weapons, so in that context a focus on restricting access to weapons would be security theater (the same as a lot of don't-say-bad-words "AI safety"). It might save lives, but it doesn't help with ensuring that AIs notkilleveryone.

Existential danger is very much related to weapons. Of course,  AI could pose an existential threat without access to weapons. However, weapons provide the most dangerous vector of attack for a rogue, confused, or otherwise misanthropic AI. We should focus more on this immediate and concrete risk before the more abstract theories of alignment.

If an AI does something with weapons that its operators don't want it to be doing, they will attempt to stop it. If they eventually succeed, then this doesn't literally killeveryone, and the AI probably wasn't the kind that can pose existential threat (even if it did cause a world-shaking disaster). If they can't stop the AI, at all, even after trying for as long as they live, then it's the kind of AI that would pose existential threat even without initially being handed access to weapons (if it wants weapons, it would be able to acquire them on its own). So the step of giving AI access to weapons is never a deciding factor for notkilleveryoneism, it's only a deciding factor for preventing serious harm on a scale that's smaller than that.

We should focus more on this immediate and concrete risk before the more abstract theories of alignment.

"Focus" suggests reallocation of a limited resource that becomes more scarce elsewhere as a result. I don't think it's a good thing to focus less on making sure that the outcome is not literally everyone dying than we are doing now. It's possible to get to that point, where too much focus is on that, but I don't think we are there.

  1. Focus means spending time or energy on a task. Our time and energy is limited, and the danger of rogue AI is growing by the year. We should focus our energies on by forming an achievable goal, making a reasonable plan, and acting according to the plan.
  2. Of course, there is a spectrum to the possible outcomes caused by a hypothetical rogue AI (rAI), ranging from insignificant to catastrophic. Any access the rAI might gain to human-made intelligent weapons would amplify the rAI's power to cause real-world damage.

Of course, there is a spectrum to the possible outcomes

The problem is that with AI, facing existential risk eventually is a certainty, the capability of unbounded autonomous consequentialist agency is feasible to develop (humans have that level of capability, and humans are manifestly feasible, so AIs would merely need to be at least as capable). Either there is a way of mitigating that risk, or it killseveryone. At which point, no second chances. This is different from world-shaking disasters, which do allow second chances and also motivate trying to do better next time.

So this specifically is a natural threat level to consider on its own, not just as one of the points on a scale. And it's arguably plausible in startlingly near future. And nobody has a reliable plan (or arguably any plan), including the people building the technology right now.

Yes, in the long term we will need a complete alignment strategy, such as permanent integration with our brains. However, before that happens, it would be prudent to limit the potential for a misaligned AI to cause permanent damage.

And, yes, we are in need of a more concrete plan and commitment from the people involved in the tech, especially with regards to lethal AI.

in the long term

I'm thinking one or two years in the future is a plausible lower bound on time when a (technological) plan would need to be enacted to still have an effect on what happens eventually, or else in four years (from now) a killeveryone arrives (again, as an arguable lower bound, not as a median forecast).

Unless it's fine by default, on its own, for reasons nobody reliably understands in advance, not because anyone had a plan. I think there is a good chance this is true, but betting the future of humanity on that is insane. Also, even if the first AGIs don't killeveryone, they might fail to establish strong coordination that prevents other misaligned AGIs from getting built, which do killeveryone, including the first AGIs.

I think probably it's more like 6 and 8 years, respectively, but that's also not a lot of time to come up with a plan that depends on having fundamental science that's not yet developed.

Best to slow down the development of AI in sensitive fields until we have a clearer understanding of its capabilities.

However, weapons provide the most dangerous vector of attack for a rogue, confused, or otherwise misanthropic AI.

I'm not sure why you think that. Human weapons, as horrific as they are, can only cause localized tragedies. Even if we gave the AI access to all of our nuclear weapons, and it fired them all, humanity would not be wiped out. Millions (possibly billions) would perish. Civilization would likely collapse or be set back by centuries. But human extinction? No. We're tougher than that.

But an AI that competes with humanity, in the same way that Homo sapiens competed with Homo neanderthalis? That could wipe out humanity. We wipe out other species all the time, and only in a small minority of cases is it because we've turned our weapons on them and hunted them into extinction. It's far more common for species to go extinct because humanity needed the habitat and other natural resources that that species needed to survive, and outcompeted that species for access to those resources.

Entities compete in various ways, yes. Competition is an attack on another entities' chances of survival. Let's define a weapon as any tool which could be used to mount an attack. Of course, every tool could be used as a weapon, in some sense. It's a question of how much risk our tools pose to us, if they were to be used against us.

Let’s define a weapon as any tool which could be used to mount an attack.

Why? That broadens the definition of "weapon" to mean literally any tool, technology, or tactic by which one person or organization can gain an advantage over another. It's far broader than and connotationally very different from the implied definition of "weapon" given by "building intelligent machines that are designed to kill people" and the examples of "suicide drones", "assassin drones" and "robot dogs with mounted guns".

Redefining "weapon" in this way turns your argument into a motte-and-bailey, where you're redefining a word that connotes direct physical harm (e.g. robots armed with guns, bombs, knives, etc) to mean any machine that can, on its own, gain some kind of resource advantage over humans. Most people would not, for example, consider a superior stock-trading algorithm to be a "weapon", but your (re)definition, it would be.

It is a broad definition, yes, for the purpose of discussing the potential for the tools in question to be used against humans.

My point is this: we should focus first on limiting the most potent vectors of attack: those which involve conventional 'weapons'. Less potent vectors, (those that are not commonly considered as weapons) such as a 'stock trading algorithm', are of lower priority, since they offer more opportunities for detection and mitigation. 

An algorithm that amasses wealth should eventually set off red flags (maybe banks need to improve their audits and identification requirements).  Additionally, wealth is only useful when spent on a specific purpose. Those purposes could be countered by a government, if the government possesses sufficient 'weapons' to eliminate the offending machines.

If this algorithm takes such subtle actions that cannot be detected in time to prevent catastrophe, then we are doomed. However, there is also the likelihood that the algorithm will have weaknesses which allow it to be detected.

My point is this: we should focus first on limiting the most potent vectors of attack: those which involve conventional ‘weapons’.

That's exactly where I disagree. Conventional weapons aren't all that potent compared to social, economic, or environmental changes.

Social, economic, or environmental changes happen relatively slowly, on the scale of months or years, compared to potent weapons, which can destroy whole cities in a single day. Therefore, conventional weapons would be a much more immediate danger if corrupted by an AI. The other problems are important to solve, yes, but first humanity must survive its more deadly creations. The field of cybersecurity will continue to evolve in the coming decades. Hopefully world militaries can keep up, so so that no rogue intelligence gains control of these weapons.

To repeat what I said above: even a total launch of all the nuclear weapons in the world will not be sufficient to ensure human extinction. However, AI driven social, economic, and environmental changes could ensure just that.

If an AI got hold of a few nuclear weapons and launched them, that would, in fact, probably be counterproductive from the AI's perspective, because in the face of such a clear warning sign, humanity would probably unite and shut down AI research and unplug its GPU clusters.

Most actions by which actors increase their power aren't directly related to weapons. Existential danger comes from one AGI actor getting more power than human actors. 

Which kinds of power do you refer to? Most kinds of power require human cooperation. The danger that an AI tricks us into destroying ourselves is small (though a false detection of nuclear weapons could do it). We need much more cooperation between world leaders, a much more positive dialogue between them.

Yes, you need human cooperation but human cooperation isn't hard. You can easily pay people money to get them to do what you want. 

With time more processes can use robots instead of humans for doing physical work and if the AGI already has all the economic and political power there's nothing to stop the AGI from doing that.

The AGI might then reuse land that's currently used for growing food for other purposes and step by step reduce the amount of food that's available and there never needs to be a point where a human thinks that they are working for the destruction of humanity. 

More stringent (in-person) verification of bank account ownership could mitigate this risk.

Anyways, the chance of discovery for any covert operation is proportional to the size of the operation and the time that it takes to execute. The more we pre-limit the tools available to a rogue machine to cause harm immediate harm, the more likely we will catch it in the act.

There's no need for anything being covert. NetDragon Websoft is already having a chatbot as CEO. That chatbot can get funds wired by giving orders to employees. 

If the chat bot would be a superintelligence, that would allow it to outcompete other companies. 

While I agree with the overall sentiment, I think the most important claims within this post are poorly supported and factually false.

The first step toward AI alignment is to stop pushing the capability frontier, immediately. Then we might have enough time to find a way to design AIs that are aligned, or determine that this problem is too hard for use to solve in a suitably reliable way.

Whether or not any particular military equipment has more computing power than last year is of essentially zero relevance to AI alignment and outcomes from misaligned AI. It's not like a superintelligence that would otherwise kill everyone will be stopped because some military hardware doesn't have target recognition built in (or whatever other primitive algorithms humans happen to write).

I'm against humans developing most weapons - 'intelligent' or not - for human reasons, not because I think they would make future superintelligent entities more dangerous. Superintelligence is inherently dangerous since it can almost certainly devise strategies (which may not even involve weapons as we think of them) against which we have no hope of conceiving a defence in time for it to matter.

Superintelligence is inherently dangerous, yes. The rapid increase in capabilities is inherently destabilizing, yes. However, practically speaking, we humans can handle and learn from failure, provided it is not catastrophic. An unexpected superintelligence would be catastrophic. However, it will be hard to convince people to abandon currently benign AI models on the principle that they could spontaneously create a superintelligence. A more feasible approach would start with the most dangerous and misanthropic manifestations of AI: those that are specialized to kill humans.

What is an "intelligent" machine? What is a machine that is "designed" to kill people? Why should a machine with limited intelligence that is "designed" to kill, such as an AIM-9 be more of a threat than a machine with vast intelligence that is designed to accomplish a seemingly innocuous goal, that has the destruction of humanity as an unintended side-effect.

Currently, leading militaries around the world are developing and using:

  • Drone swarms
  • Suicide drones
  • Assassin drones
  • Intelligent AI pilots for fighter jets
  • Targeting based on facial recognition
  • Robot dogs with mounted guns

None of these things scare me as much as GPT-4. Militaries are overwhelmingly staid and conservative institutions. They are the ones that are most likely to require extensive safeguards and humans-in-the-loop. What does scare me is the notion of a private entity developing a superintelligence, or an uncontrolled iterative process that will lead to a superintelligence and letting it loose accidentally.

Items of response:

  1. An intelligent lethal machine is one which chooses and attacks a target using hardware and software specialized for the task of identifying and killing humans.
  2. Clearly, there is a spectrum of intelligence. We should define a limit on how much intelligence we are willing to build into machines which are primarily designed to destroy us humans and our habitat.
  3. Though militaries take more thorough precautions than most organizations, there are many historical examples of militaries suffering defeat, which, with better planning, could have been avoided.
  4. An LLM like GPT which hypothetically escaped its safety mechanisms is limited in the amount of damage it could do, based on what systems it could compromise. The most dangerous rogue AI is one that could gain unauthorized access to military hardware. The more intelligent that hardware, the more damage a rogue AI could cause with it before being eliminated. In the worst case, the rogue AI would use that military hardware to cause a complete societal collapse.
  5. Once countries adopt weaponry, they resist giving it up, though it would be in the better interests of the global community. There are some places we've made progress. However, with enough foresight, we (the global community) could plan ahead by placing limits on intelligent lethal machines sooner, rather than later.

I don't think limiting autonomous weapons helps prevent an AI from building autonomous weapons in secret. But building autonomous weapons does little to make the situation better either.

Suppose an AI was building autonomous weapons in secret. This would involve some of the most advanced technology currently available. It would need to construct a sophisticated factory in a secret location, or else hide it in a shell company. The first would be very unlikely, the second is plausible, though still less likely. Better regulation and examination of weapons manufacturers could help mitigate this problem.

I don't think this is core to alignment, though it's probably a good idea overall.  Making it easier to kill or threaten to kill people is anti-human-friendly on it's own, even if all the agency involved is from a subset of humanity.

More importantly, I don't know that anyone who disagrees is likely to engage here - "let's agree" doesn't move very far forward unless you can identify those who need to agree and why they don't.  I'd start with how to overcome the argument that some humans (which ones depends on who you ask) need to be stopped in their harmful actions, and killing or threatening to kill is the most expeditious way to do so.  Without that underlying agreement, it's hard to argue that safer (for "us") mechanisms are wrong.

Yes, sometimes we need to prevent humans from causing harm. For sub-national cases, current technology is sufficient for this. On the scale of nations, we should agree to concrete limits on the intelligence of weapons, and have faith in our fellow humans to follow these limits. Our governments have made progress on this issue, though there is more to be made.

For example:


"With such loud public support in prominent Chinese venues, one might think that the U.S. military need only ask in order to begin a dialogue on AI risk reduction with the Chinese military.

Alas, during my tenure as the Director of Strategy and Policy at the DOD Joint Artificial Intelligence Center, the DOD did just that, twice. Both times the Chinese military refused to allow the topic on the agenda.

Though the fact of the DOD’s request for a dialogue and China’s refusal is unclassified—nearly everything that the United States says to China in formal channels is—the U.S. government has not yet publicly acknowledged this fact. It is time for this telling detail to come to light.

...(Gregory C. Allen is the director of the Artificial Intelligence (AI) Governance Project and a senior fellow in the Strategic Technologies Program at the Center for Strategic and International Studies in Washington, D.C)"

On such a vital topic to international welfare, officials from these two countries should have many discussions, especially considering how video-conference technology has made international discussion much more convenient.

Why then, have we heard of so little progress in this matter? To the contrary, development of lethal AI weapons continues at a brisk pace.