Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Some papers, or ideas for papers, that I'd loved to see published in ethics journals like Minds and Machines or Ethics and Information Technology.[1] I'm probably going to submit one of these to a 2023 AI ethics conference myself.

Why should we do this? Because we want today's grad students to see that the ethical problems of superhuman AI are a cool topic that they can publish a cool paper about. And we want to (marginally) raise the waterline for thinking about future AI, nudging the AI ethics discourse towards more matured views of the challenges of AI.

Secondarily, it would be good to leverage the existing skillsets of some ethicists for AI safety work, particularly those already working on AI governance. And having an academic forum where talking about AI safety is normalized bolsters other efforts to work on AI safety in academia.

The Ideas:

Explain the basic ideas of AI safety, and why to take them seriously.

  • Iason Gabriel already had a pretty good paper like this. But it's plausible that, right now, what the ethics discourse needs is more basic explanations of why AI safety is a thing at all.
  • This paper might start out by making a case that superhuman AI is going to change the world, likely in the next 10-60 years (definitely unintuitive to many, but there are AI Impacts surveys and recent results to illustrate the point). Then the basic arguments that superhuman AI will not be automatically benevolent (easy rhetorical trick is to call it "superhuman technology," everyone knows technology is bad). Then the basic arguments that to get things to go well, the AI has to know a whole lot about what humans want (and use that knowledge the way we want).
  • One  issue with this might be that it presents the problem, but doesn't really point people towards solutions (this may be a problem that can be solved with quick citations). It also doesn't really motivate why this is an ethics problem. It also doesn't explain why we want the solution to the key "ethics-genre" problems to use a technical understanding of the AI, rather than a human- or society-centric view.

A more specific defense of the validity of transformative-AI-focused thinking as a valid use of ethicists' time.

  • The core claim is that getting AIs to want want good things and not bad things is an unsolved ethics problem. Ethics, not engineering, because the question isn't "how do we implement some obvious standard," the question is "what is even a good standard in the first place?"
  • But almost as important are secondary claims about what actual progress on this question looks like. The end goal is a standard that is connected to technical picture of how the AI will learn this information about humans, and how it will use it to make decisions.
  • So the overall thrust is "given that AI safety is important, there is a specific sort of ethics-genre reasoning that is going to be useful, and here are some gestures towards what it might look like."
  • You can put more than one of these ideas into a paper if you want. This particular idea feels to me like it could benefit from being paired with another topic before or after it.
  • Dunking on specific mistakes, like talking about "robots" rather than "optimization processes," should probably be done with care and tact.

A worked example of "dual use" ethics - a connection between thinking about present-day problems and superhuman AI.

  • I expect most of the examples to be problems that sound relevant to the modern day, but that sneakily contain most of the alignment problem.
  • E.g. Xuan's AI that takes actions in response to laws that we really want to follow the spirit of the law. Although that's a bit too futuristic, actually, because we don't have much present-day concern about AI systems that themselves interpret laws.
  • Probably something about recommender systems is the most clear-cut example. But non-clear-cut examples (e.g. drawing a connection between rules for industrial robots and desiderata for superhuman AI) are also likely interesting.
  • One thing that makes a good AI ethics paper, in my opinion, is that it doesn't merely point out a single problem, it makes a useful generalization from a single (typically obvious) moral problem to a general family of problems, and then uses a technical understanding of the problems to formulate guidelines to help ameliorate the entire family. (E.g. "We were building some delivery drones. One potential misuse of these drones would be spying on customers or passers-by, so we took some specific steps to avoid that, and also formulated a general principle that says to downgrade the drone's camera so that it doesn't have excess capabilities not needed for its intended job."[2]) I'm not saying it's necessary to do this in every paper, but you'll want to lay out ideas with enough clarity that such a thing is possible.

Attempt to defend some specific ethics question as being an open problem relevant to superhuman AI.

  • It's not obvious to me that we know many examples in this category. But it feels like it would stimulate some interesting work.
  • For me, the closest would be some elaboration of "How do we pick a state for a value learning AI to bootstrap from?" This is a thing we don't know how to do and it sure seems like there are some ethics questions in there - ones you can write a much better essay about than just "which monkey gets the banana?[3]".

Reports on AI Safety progress.

  • The basic idea is to relay not-too-technical AI safety research in a way that's of interest to ethicists.
  • Posts fall along a spectrum in terms of fit. I'm best at remembering my own posts, so for a good example my first thought is me trying to develop some rules to avoid Relative Goodhart. Looking through recent posts, Alex's Reward is not the optimization target might be interesting - you'd have to spell out the relevance to AI safety (and therefore ethics) more clearly for a broad audience, but it's there. On the opposite hand, Neel and Tom's recent mechanistic analysis of grokking wouldn't be a good fit for that audience.
  • Bonus points if you demonstrate how some way of thinking developed for use in AI safety has interesting applications to present-day ethics questions.

Talk about principles for AI governance, informed by an awareness that transformative AI is important.

  • I did say in the introduction that AI governance is a field where leveraging the knowledge of practicing ethicists is a specifically good idea. It's just so not in my wheelhouse that I can't say much about it.
  • Most ethics papers like this are mediocre, but the ones that are good tend to be influential.[4]
  • One issue is that actual concrete AI governance ideas are hard to come by. But broad principles are probably easier to talk about than concrete ideas.
  1. ^

    For my overview of the state of the field, see the two Reading the Ethicists posts (1, 2).

  2. ^

    Actual example from Cawthorne and van Wysnberghe, Science and Engineering Ethics, 2020

  3. ^

    A genre of essay characterized by dishy, nontechnical speculation about which humans will benefit from superhuman AI.

  4. ^

    E.g. AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations, by Floridi et al., Minds and Machines, 2018

New to LessWrong?

New Comment