Despite the title, this reads to me like an interesting overview of how we'd want a good benevolent AI to work, in fact: it needs to help us be curious about our own wants and values and help us defend against things that would decrease our agency.

AI summary via claude2:

Here are 30 key points from the article:

  1. MIRI recently announced a "death with dignity" strategy, giving up on solving the AI alignment problem.
  2. Many in the AI field believe progress is being made on AI capabilities but not on AI safety.
  3. The framing of "benevolent AI" makes faulty assumptions about agency, values, benevolence, etc.
  4. The author has studied human psychology and finds most concepts around agency, values, etc. woefully inadequate.
  5. Trying to fully encapsulate or consciously rationalize human values is dangerous and bound to fail.
  6. Human values are not universal or invariant across environments.
  7. Language cannot fully describe conceptual space, and conceptual space cannot fully describe possibility space.
  8. We do not need complete self-knowledge or full descriptions of values to function well.
  9. The desire for complete descriptions of values comes from fear of human incompetence.
  10. "Protectionist" projects will decrease human agency, consciously or not.
  11. Current AI trends already reduce agency through frustrating automation experiences.
  12. AI could help increase agency by expanding conceptual range, not just increasing power.
  13. Most choices go unrecognized due to limited conceptual frames that steer our autopilot.
  14. Our imagined futures are constrained by cultural imagination and trauma.
  15. Verbalization of "values" is downstream of fundamental motivations.
  16. Opening imaginative possibilities requires more than empathy or verbalization.
  17. Human psychology evolves by functions observing and modifying other functions.
  18. An example chatbot could help with self-reflection and concept formation.
  19. The chatbot prompts focused introspection and pattern recognition.
  20. The chatbot draws on diverse analytical commentary in its training.
  21. The chatbot doesn't need sophisticated intelligence or goals.
  22. This approach avoids problems of encapsulating human values.
  23. Current AI safety discourse has problematic assumptions baked in.
  24. It reflects poor epistemics and spreads fear via social media.
  25. Better to imagine AI that helps ground us and know ourselves.
  26. We should turn our agency toward increasing future agency.
  27. Aligned AI helps us explore creating the futures we want.
  28. We can engage the topic from outside the dominant ideology.
  29. Letting go of its urgency allows more personal agency.
  30. Our incomplete self-understanding is beautiful and we can grow it.

New to LessWrong?

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 10:42 PM

“How do we make AI benevolent?” is a badly formulated problem. In its very asking, it ascribes agency to the AI that we don’t have to give it

Yes, we don't have to, but considering that people are already trying to give agency to GPT (by calling it in a loop, telling it to prepare plans for its future calls), someone will do this, unless we actively try to prevent it.

As someone whose job it is to examine and improve the structure of agency and clarify values, I can say with confidence that as a culture we have only a very primitive understanding of either. 

100% agree. But that's exactly the point. MIRI is trying to solve alignment not because they believe it is easy, but because they believe it is hard so someone better start working on it as soon as possible.

The hope is that “we” (meaning, someone) can somehow tell AI the final answer about what we should want, or get it to tell us the final answer about what we should want, and then leave it to execute on our behalf all of the weighty decisions we are not competent to make ourselves. We should be very wary of a project to save ourselves, or even “empower” ourselves, that is premised on the belief that humans essentially suck.

I read the news about the war in Ukraine, or Israel and Palestine, and it seems to me that humans suck. Not all of them, of course, but the remaining ones suck at coordination; either way, the results are often bad.

The final answer we tell AI could include things like "take care of X, but leave us free to decide Y". Maybe, don't let people murder or torture each other, but otherwise let them do whatever they wish? (But even to achieve this seemingly moderate goal, the AI needs to have more power than the humans or group of humans who would prefer to murder or torture others.)

Yes, there is a risk that instead of this laissez-faire approach, someone will instead tell AI to implement some bureaucratic rules that will strangle all human freedom and progress, essentially freezing us in the situation we have today, or perhaps someone's idea of an utopian society (that is dystopian from everyone else's perspective). However, if such thing is technically possible -- then exactly the same outcome can happen as a result of someone acting unilaterally in a world where everyone else decided not to use AI this way.

Again, it seems to me like the proposal is "there is a button that will change the world, so we should not press it", which is nice, but it ignores the part that the button is still there, and more people are getting access to it.

It would be better to de-fixate on the arms race, and instead imagine applications that are built to help ground people in reality, to explore where and why they respond to which sensations and drives, to know themselves better and give themselves more grace. 

I 100% agree with the idea of using AI for self-improvement.

A practical problem I have with this is knowing that current "AI therapists" have zero confidentiality and report everything to their corporate masters, who will probably try using this knowledge to increase their profits. They will probably also try to increase their profits by nudging the "AI therapist" to give me certain ideas or avoid giving me certain ideas. Thus a Microsoft-sponsored therapist might tell me that Linux is a waste of time, and explain how not trusting our corporate overlords is just a part of teenage rebellion that I should already be mature enough to overcome; a Meta-sponsored therapist will encourage me to develop more contacts with people using social networks; and a Google-sponsored therapist will encourage me to buy whatever the highest bidder wants me to buy. The information they get about my weaknesses and worries will be leveraged to do this more effectively, while explaining to me that my lack of trust is just a childhood trauma I need to overcome.

But, ignoring this part, if I could believe that the AI is impartial and keeps our discussions confidential, of course I would use it, among other things, as a therapist and a self-help coach.

But even if 99% of people use it this way, it does not remove the problem of what if the existing dictators and wannabe dictators use it to increase their power instead, automating whatever they can.

Continually updated digital backups of people (regardless of whether people operate as computations or remain material) make many familiar concerns fundamentally change or go away, for example war or murder. Given this, I don't quite understand claims of wars continuing in a post-AGI world: even if true, what does it even mean? Wars without casualties are not centrally wars.

If this is true, then benevolent ruler AI would immediately build and give power over to a condition of high-agency transhumanism, and a coordinated center* of mostly non-human decisionmaking probably actually is the only practical way to fairly/equally/peacefully globally distribute the instruments for such a thing. Does the author seem to have considered this?

but if the benevolent ruler ai is necessarily self-invalidating, it seems likely that most attempts to align one don't actually align it and instead result in making a not-actually-benevolent ruler ai, and if you want to make a benevolent ai, it never being designed to be a ruler in the first place seems just better

Do you expect there to be parties who would try to align it towards having the intuitive character of a dictator? I don't. I've been expecting alignment like "be good". You'd still get a (momentary) prepotent singleton, but I don't see that as being the alignment target.

This kind of question, the unnecessary complication of the alignment target, has become increasingly relevant. It's not just mathy pluralistic scifi-readers who're in this any more...

Do you expect there to be parties who would try to align it towards having the intuitive character of a dictator

....?! yes?!

/me thinks of a specific country

._. ...

If we don't have Ruler-level coordination to avoid it, we fall either to Moloch or the next Black Marble.

If the aggregate agency of life on earth doesn't have the coordination sufficient to avoid it, perhaps. But it seems to me that centralization-first plans don't have a way to guarantee the agency of the people at the edges that can be strongly durable. I'd hope to design an AI that is structurally inclined to find ways to give the guarantees you're wanting that doesn't need to be a ruler to give those guarantees - for example, offering a copy of itself to everyone, being strongly auditable, and allowing people to link up into mesh coordination networks that can peer-reinforce against moloch.

"Benevolent [Ruler] AI is a bad idea" and a suggested alternative

Thought on seeing the title: ... Is it going to be Malevolent AI?

This is so boring that it's begging for responce with "Yes, We Have Noticed The Skulls"