The Human-AI Reflective Equilibrium

At its core, ethics really is an ongoing negotiation between which intuitions you want to call values, and which you want to dismiss as biases.

Yep. See also the related Fun Criterion which I think sounds related to Reflective Equilibrium.

it’s entirely possible that we will be left with as many Reflective Equilibria as individuals.

I think not only should we expect to end with many reflective equilbria, one per individual, even for an individual, we dont expect they'd reach a reflective equilibrium, as they will keep evolving their moral self-understanding, at least for sufficiently reflective individuals. I think moral and scientific progress, e.g. as described in The beginning of Infinity describes more a "reflective dissequilibrium" but which is always flowing towards improvement.

However, upon further reflection, it becomes evident to you that your preference for your family over strangers is also influenced by evolution. Perhaps even here, you can decide to set this bias aside and treat all people equally. Good for you.

I think there's an interesting question of how many people would actually implement this change if we had brain editing.

Is it that our meta-morality (explicit-language based moral principles) doesnt yet have enough control over the rest of the brain, and so we post-rationalize our inability to implement such changes with "us not actually wanting them"?

I dont know how, but I'm pretty sure brain editing tech would alter our reflective pseudo-equilibria

Remember the Reflective Equilibrium. It relied on you to simulate an internal monologue to bring your conflicting intuitions and moral principles into coherence.

Yesss, I want AI assistants that feel like talking to yourself. Right now chatGPT is not designed to work like this. David Holz (founder of Midjourney recently talked about how he doesn't like that chatGPT is designed as an assistant rather than an extension of your own mind. in one of his office hours)

CEV seeks to align an AGI with an aggregate extrapolation of society’s idealized wishes if we were more rational, lived further together, were smarter, and in short, acted more like the people we wish we were.

If CEV is what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together", then we could approach the problem by making us know more, think faster, and be more the people we wished we were, and grow further together. Let's tackle these things that is seems we agree more on? than the harder problem of finding our true values.

For humans, those boundaries are corporal, and with less of civilization’s interactions defined by physical violence, more voluntary interactions are possible.

Some people still abuse others through text media/the internet though? I guess you could say this is because there's some psychological boundary that can be broken in some cases, but yeah less commonly so it seems than for physical interactions. I guess we could generally define a "voluntary-request allowing-boundary" as "a boundary that only lets messages the the receiver would want to receive pass through", but this hits the wall of our own falability again, whereby sometimes we dont know whats best for us, but it seems like a good starting point/definition, and maybe we can design these boundaries starting from that.

There's however also a pull in a different direction. One of the appeals of VR is that it allows people to connect more closely (in a way because you are exponsing yourself more/reducing boundaries/being more vulnerable).

This leads me to ask? How does the benefits of merging minds fit into this framework?

It seems to me that depending on the people, the "healthy boundary" (lol this is sounding like relationship advice, but of course, that is very much related!), which is another name we could give to the "voluntary-request allowing-boundary", is quite different. I may open up a lot more to a partner or a friend than to a stranger.

It seems to me that the design of boundaries with AIs will follow a similar trend too, where we open up more to AIs we trust (plus some people are more willing to be open to messages overall than others, etc) But yeah hmm this seems to connect to many things about power dynamics and stuff.

Also, I think we need power/intelligence to not be too unequal if we want the game theory equilbria to favor cooperation/ethics~

Cases in which AI systems have been able to improve your normative understanding of yourself. Perhaps they recommend an action that, even though strange at first sight, was actually a good move upon reflection?

I am playing with a more personalized/inner-monologue assistant by augmenting GPT3 with retrieval over all my stream-of-consciousness notes over the last 3 years. I did observe this interesting effect where if I asked it moral questions, it would mix up stuff from my own messages, with stuff that it thought it was better.
It would sometimes also "tell me off" if I commented stuff that it thought was immoral, and it did make me go internally like "ok you are right, I was letting myself too lose". But the cool thing is that this didn't feel bad/impossing/opressive, because it was all within voluntary boundaries. This is a system that atm is feeling like part of me, in part because it knows so much about me, but also because of the UX: I try having it reply as I would talk to myself, rather than an external assistant assiting me -- under this light your AI debate system would be like Minsky's society of mind x3

A friend also had a similar experience where chatGPT convinced me that asking LMs about certain immoral stuff was itself immoral.

LESSWRONG
LW

LESSWRONG
LW

22

The Human-AI Reflective Equilibrium

22

22

Introduction

Technical Alignment: How Do We Align Artificial Entities with Human Values?

A Detour Into Human Ethics: What Are Our Values Anyways?

A Bottom-Up Lens On Values

Evolutionary Approaches To Values

Cultural Approaches To Values

A Top Down Lens On Values

Deontology

Consequentialism

Virtue Ethics

“Solving” Human Ethics: The Reflective Equilibrium

Is Value Convergence Possible? Is It Necessary?

Is Value Consensus Possible? Is It Necessary?

Value Diversity Is Here To Stay

“Solving” AI Ethics: A Diversity of AIs Trained On Reflective Equilibria

The Reflective Equilibrium Via AI Assistants

The Reflective Equilibrium Via AI Debate

Human-AI Reflective Equilibria: Is Convergence Possible? Is It Necessary?

A Voluntary Cooperation Architecture for A Diversity of Humans and AIs

Paretotopia: Civilization as Human-aligned Superintelligence

Open Questions