Please, Don't Roll Your Own Metaethics

What are you supposed to do other than roll your own metaethics?

"More research needed" but here are some ideas to start with:

Try to design alignment/safety schemes that are agnostic or don't depend on controversial philosophical ideas. For certain areas that seem highly relevant and where there could potentially be hidden dependencies (such as metaethics), explicitly understand and explain why, under each plausible position that people currently hold, the alignment/safety scheme will result in a good or ok outcome. (E.g., why it leads to a good outcome regardless of whether moral realism or anti-realism is true, or any one of the other positions.)
Try to solve metaphilosophy, where potentially someone could make a breakthrough that everyone can agree is correct (after extensive review), which can then be used to speed up progress in all other philosophical fields. (This could also happen in another philosophical field, but seems a lot less likely due to prior efforts/history. I don't think it's very likely in metaphilosophy either, but perhaps worth a try, for those who may have very strong comparative advantage in this.)
If 1 and 2 look hard or impossible, make this clear to non-experts (your boss, company leaders/board, government officials, the public), don't let them accept a "roll your own metaethics" solution, or a solution with implicit/hidden philosophical assumptions.
Support AI pause/stop.

Hmm, I like #1.

#2 feels like it's injecting some frame that's a bit weird to inject here (don't roll your own metaethics... but rolling your own metaphilosophy is okay?)

But also, I'm suddenly confused about who this post is trying to warn. Is it more like labs, or more like EA-ish people doing a wider variety of meta-work?

[-]Wei Dai29mΩ220

#2 feels like it's injecting some frame that's a bit weird to inject here (don't roll your own metaethics... but rolling your own metaphilosophy is okay?)

Maybe you missed my footnote?

To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.

and/or this part of my answer (emphasis added)?

Try to solve metaphilosophy, where potentially someone could make a breakthrough that everyone can agree is correct (after extensive review)

But also, I'm suddenly confused about who this post is trying to warn. Is it more like labs, or more like EA-ish people doing a wider variety of meta-work?

I think I mostly had alignment researchers (in and out of labs) as the target audience in mind, but it does seem relevant to others so perhaps I should expand the target audience?

[-]Garrett Baker2h84

I think this fails to say how the analogy of cryptography transfers to metaethics. What properties of cryptography as a field make it such that you cannot roll your own. Is it just that many people have the experience of trying to come up with a croptographic scheme and failing, meanwhile there are perfectly good libraries nobody has found exploits to yet?

That doesn't seem very analogous with metaethics. As you say, it is hard to decisively show a metaethical theory is "wrong", and as far as I know there is no well-studied metaethical theory which has no exploits yet.

So what exactly is the analogy?

[-]Wei Dai36m40

The analogy is that in both fields people are by default very prone to being overconfident. In cryptography this can be seen by the phenomenon of people (especially newcomers who haven't learned the lesson) confidently proposing new cryptographic algorithms, which end up being way easier to break than they expect. In philosophy this is a bit trickier to demonstrate, but I think can be seen via a combination of:

people confidently holding positions that are incompatible with other people's confident positions
tendency to "bite bullets" or accepting implications that are highly counterintuitive to others or even to themselves, instead of adopting more uncertainty
the total idea/argument space being exponentially vast and underexplored due to human limitations, therefore high confidence being unjustified in light of this

[-]lemonhope3hΩ042

Please just write the standard library!

[-]Lukas_Gloor2h20

By "metaethics," do you mean something like "a theory of how humans should think about their values"?

I feel like I've seen that kind of usage on LW a bunch, but it's atypical. In philosophy, "metaethics" has a thinner, less ambitious interpretation of answering something like, "What even are values, are they stance-independent, yes/no?"

And yeah, there is often a bit more nuance than that as you dive deeper into what philosophers in the various camps are exactly saying, but my point is that it's not that common, and certainly not necessary, that "having confident metaethical views," on the academic philosophy reading of "metaethics," means something like "having strong and detailed opinions on how AI should go about figuring out human values."

(And maybe you'd count this against academia, which would be somewhat fair, to be honest, because parts of "metaethics" in philosophy are even further removed from practicality, as they concern the analysis of the language behind moral claims, which, if we compare it to claims about the Biblical God and miracles, it would be like focusing way too much on whether the people who wrote the Bible thought they were describing real things or just metaphores, without directly trying to answer burning questions like "Does God exist?" or "Did Jesus live and perform miracles?")

Anyway, I'm asking about this because I found the following paragraph hard to understand:

Behind a veil of ignorance, wouldn't you want everyone to be less confident in their own ideas? Or think "This isn't likely to be a subjective question like morality/values might be, and what are the chances that I'm right and they're all wrong? If I'm truly right why can't I convince most others of this? Is there a reason or evidence that I'm much more rational or philosophically competent than they are?"

My best guess of what you might mean (low confidence) is the following:

You're conceding that morality/values might be (to some degree) subjective, but you're cautioning people from having strong views about "metaethics," which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we'd want for ourselves and others.

Is that roughly correct?

Because if one goes with the "thin" interpretation of metaethics, then "having one's own metaethics" could be as simple as believing some flavor of "morality/values are subjective," and it feels like you, in the part I quoted, don't sound like you're too strongly opposed to just that stance in itself, necessarily.

^{^}

To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.

LESSWRONG
LW

LESSWRONG
LW

42

Please, Don't Roll Your Own Metaethics

42

Ω 20

42

Ω 20