Epistemic Status: Take with a grain of salt. This post was written relatively quickly and it heavily relies on making an analogy between AI and human behaviour. This post kind of just takes a sledgehammer to these concerns and tries to reason it out using these analogies anyway. I'd encourage others to consider whether I've fallen into the trap of mistakenly anthropomorphising AI.

  • The less "autistic"[1] we make AI the more powerful it is in terms of capabilities:
  • Suppose you tell an autistic AI to steal a diamond. Maybe it's mostly been trained on only a few ways of stealing diamonds, so innovative methods, such as teleportation, score poorly as the algorithm is uncertain as to whether they are in scope.
  • Powerful autistic AIs can be used for attacks despite not being completely reliable. Suppose you tell the AI to take down an adversaries network. It might do want you want and inflict long-lasting damage, such as by erasing all their machines. Or it might not do what you want, such as it if just turned their machines off and such that they would be back online again five minutes later. It may even hurt you as well, such as if it released a virus that took down everyone's networks. So while there are good reasons why you might not want to use it for an attack, and indeed why this is in fact dangerous, they still can be used for attacks if you're willing to bear the risk. This is worrying as even if you have ten enemies and nine of them think it would be too dangerous to use such an AI against you, you could still suffer attacks from your most risk-taking adversary.
  • Autistic AI is very difficult to use for defence. Maybe you tell it to defend your network and it assumes that you mean to physically defend it only, so it doesn't even try to stop hackers. Maybe it locks you out of using the network so no-one can accidentally bring in a virus. Sure without the AI, you'd be vulnerable to an adversary's AI, but with it you could be taken out even if no one ever attacks you.
  • For a constant level of power, making AI more autistic favours attackers over defenders, so we want it to be as little autistic as possible. However, reducing autism increases capabilities, so pushing hard on this may be net-negative if it sufficiently reduces the timeline to recursive self-improving AI. This is concerning because it suggests that the optimal policy will involve maintaining a fragile balance between accelerating timelines and enabling attackers.
  • A more nuanced discussion would consider different levels of autism rather than just autistic/non-autistic. Arguably we should expect future AI's to be no more autistic than PaLM - ie. not very autistic at all - so these examples could be criticised for being unrealistic. This is a reasonable critique, but framing the problem in this way allowed a clearer exposition.

Update:

  1. ^

    Given the stakes of the alignment problem, I made the decision to emphasise clarity over political correctness.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 9:28 AM

I don't think your use of "autistic" in this post was very clarifying. Do you just mean that the AI doesn't consider the context of the problem we give it in order to deduce the actual problem? If so, it's not clear to me that an AI with greater capabilities will necessarily be "less autistic".

I meant that it takes instructions a bit too literally since it doesn't fully understand implicit instructions.