52 Can We Naturalize Moral Epistemology?

21st May 2025

7 min read

52

Epistemic status: shower thought quickly sketched, but I do have a PhD in this.

As we approach AGI and need to figure out what goals to give it we will need to find tractable ways to resolve moral disagreement. One of the most intractable moral disagreements is between the moral realists and the moral antirealists.

There's an oversimplified view of this disagreement that goes:

If you're a moral realist, you want to align AGI to the best moral-epistemic deliberative processes you can find to figure out what is right
If you're a moral antirealist and you're a unilateralist, you want to stage a coup and tile the world with your values
If you're a moral antirealist and you're cooperative, you want to align AGI to a democratic process that locks in whatever values people have today forever

This oversimplified picture gets a lot of play, but it turns moral disagreement into a wholly intractable process of philosophical methodology which we haven't been able to progress beyond Plato's Euthyphro in the last 2400 years. The most important lesson from the positivists and the history of science is that the way you make progress on philosophical disagreements is to make them empirically tractable.

The moral realist / antirealist debate really runs together two distinct questions:

Is there a REAL TRUTH about morality? (An intractable "external" question)
Empirically, what moral values would humans accept if we each went through a deliberative process of encountering lots of arguments and updating our moral values in the direction that aligns with our (broadly construed) theoretical preferences? (A tractable "internal" question)

To (1), the moral realist answers "yes" and the moral antirealist says "I don't know what you're talking about, can you please make this clearer?"

But (1) is practically inert. It doesn't make a difference to what practical actions we take, just whether we baptize them with capital letters.

(2) is empirically tractable, practically relevant, and something that realists and antirealists could in principle agree upon. For example:

An antirealist Kantian and a naturalist realist could agree that all humans will converge on the same moral ideals at the end of a certain kind of reflection, but disagree about whether this is the REAL TRUTH.
An antirealist Humean and a non-naturalist realist could agree that, empirically, human values are way too incommensurable to achieve any kind of convergence, but the realist thinks that one set is still REALLY TRUE and the Humean disagrees.

If we set aside the external question, we can arrive at a set of value-neutral, philosophically ecumenical empirical hypotheses about moral epistemology that allow us to make tractable progress on what we should align AGI to without having to make any progress on the realism vs antirealism debate whatsoever. You can just Taboo Your Words.

Moral epistemology naturalized

Here are some empirical psychological hypotheses we could consider, building on one another:

MORAL REASONING IS REAL: Are humans such that they can be brought through a series of arguments and endorse all of the premises in the arguments and end up with radically different moral views from where they started from that they are satisfied with?

If you think that, empirically, humans tend to intrinsically value things in state space, then you'll think no. If you think that, empirically, humans tend to intrinsically value actioning deliberative processes, you'll think yes.

FAULTLESS CONVERGENCE IS POSSIBLE: Could we find such a series of arguments that is convincing to everyone, such that we all arrive in the same place?

If you think that, empirically, we all share enough of the same values about deliberative processes, then you'll think yes. If you think that at least some of us don't share those values about deliberative processes, you'll think no.

UNIQUENESS: Is there one unique such series of arguments? (As opposed to the view that there is at least one series of arguments we would all agree on, but also other series of arguments that we would happily accept but would make us diverge — a kind of non-uniqueness thesis.)

If you think that the arguments humans would find acceptable would, empirically, have only one direction and not have multiple stable equilibria, then you'll accept Uniqueness. Otherwise you'll accept Non-Uniqueness.

SEMI-UNIQUENESS: If non-uniqueness, is there a unique series of arguments that would maximally satisfy everyone's preferences over theoretical choices, broadly construed?

If you think that there are multiple stable equilibria for human moral reasoning, some of these paths have a higher degree of theoretical preference satisfaction, and one of these paths has the highest degree of theoretical preference satisfaction for absolutely everyone, you'll accept Semi-Uniqueness. (This is fairly value-laden, and we'd need to be more precise about what "theoretical preference satisfaction" amounts to and how to aggregate it if we wanted to make this empirically tractable.)

UNIFICATION: Can this set of arguments be described coherently as a unified "process" that we could understand and write down, or is it merely an incoherent hodge podge of ideas?

This is again fairly value-laden, and we'd need to be more precise to make this an empirically tractable question.

I'm not necessarily saying that these are highly tractable questions, but (made suitably precise) they are questions that have empirical answers we could find out with a sufficiently advanced study of empirical psychology, and they are the kind of hypotheses that we can update on based on empirical data, unlike realism and antirealism. This also makes them the kinds of questions we know how to solve, and could solve with the help of AI, unlike the external question whether morality is REALLY TRUE.

Implications

Depending on the choice points you take, you'll adopt different views on what we exact process we should align AGI to. For example:

If you reject Moral Reasoning is Real, so you think people value things rather than deliberative processes, and you also think people value different things, and you also want to cooperate with other humans, then you'll want to find an alignment procedure that gets all humans as much of what they want as possible. Something like an idealized form of quadratic voting that fully captures the cardinality in people's moral preferences.
If you accept Moral Reasoning is Real, but reject all of the other principles, then you'll at least want deliberative processes to be part of the thing we align AGI to. The best version of this might be an idealized form of quadratic voting that is over not only states of the world but also ways to reason about value. Or perhaps you'll want to use AGI to help people speed-run their own moral reasoning processes before doing (1) above, depending on which is more tractable.
If you accept that Moral Reasoning is Real and Faultless Convergence is Possible but reject the other claims, then, depending on the other details, you might think that we should just go ahead and converge to one of the sets of moral rules that we'd all happily converge to, since this would massively ease coordination. Or if this would lead to too much loss in people's theoretical preference loss landscape, you might go back to (2).
If you accept everything above (including either Uniqueness or Semi-Uniqueness), you'll want to use AGI to find the one unique deliberative process that all of us are going to find we like the best.
If you accept everything above (including either Uniqueness or Semi-Uniqueness) but you reject Unification, then you'll want to do (4), but you'll think this is a much messier and more complicated process than finding a single well-described way of reasoning about morality, which will have implications for how you go about it.

How to reject my view

The main reason I could see someone rejecting my view goes as follows: "man, moral epistemology is so deeply pre-paradigmatic that all we can do is wander around in the wilderness until we figure out what's going on; it's like if medieval peasants tried to think about physics from the armchair!"

If you have that view, then you're not going to want to hang your hat on a particular way of carving up the moral-epistemic landscape and you'll probably be skeptical of my attempt to naturalize the question. But this is still a broadly empirical view that is susceptible to broadly empirical evidence from the history of science and successful theorizing — just look at the analogy you made!

Moreover, even with a view this despairing, there may well still be tractable, action-guiding implications for what you should align AGI to! If you think morality is this pre-paradigmatic, you might worry that we use AGI to lock in the wrong reasoning processes before we have the requisite million years to wander around in the wilderness and gaze long into the Abyss and stumble around in darkness for the right way to answer the question. And then perhaps your best bet is to try to create emulations of all human brains and run them very quickly to speed-run the trip through the desert. That's what I would do if I was this confused, and it seems like a very robust process regardless of your views on all of the above.

I suppose a different sort of response I might hear from a moral realist is "I don't think people would converge to my view on ideal reflection, but they should because it's right." But if there is no broadly convincing argument for you position then it's hard to see how you are really doing something different from the antirealist who wants to stage a coup.

ETA: I should also add I've heard people express views like "p(moral nihilism | ~moral realism) ≈ 1, so we should act on the assumption that we'll converge to the right thing. I wholeheartedly reject this view, but:

If people are assuming this I really wish they would state this explicitly since otherwise it's so hard to model what you actually believe is true. Your entire account is conditionalizing on a Pascal's Wager!
This doesn't get you out of placing your bets on what kind of moral updating process is going to be the one to lead to convergence! So you're still going to have to commit to some empirical predictions, conditional on moral realism! Otherwise anything could be licensed, including erasing all human moral knowledge, which presumably you don't think. But I never hear these predictions made explicitly.

Appreciate the comments so far, looking forward to engaging and precissifying this as I do!

Ethics & MoralityAI

Frontpage

52

New Comment

22 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:04 AM

[-]Satya Benson7mo80

What's the state of existing empirical evidence on whether Moral Reasoning is Real?

My own observations tell me that it is not. Certainly, some people engage in moral reasoning and are satisfied with their results to varying degrees, but it appears to me that this is a small proportion of humans.

My preliminary investigation into the research confirms my existing belief that most moral reasoning is post-hoc, and that while human values can change it is almost never due to reasoned arguments and instead a social and emotional process. When moral reasoning seems to work, endorsement is often shallow and attitudes can revert within days.

I am frequently reminded that I underestimate the degree to which my own view on this is not universally held, however.

[-]TAG7mo40

My own observations tell me that it is not. Certainly, some people engage in moral reasoning and are satisfied with their results to varying degrees, but it appears to me that this is a small proportion of humans

Something can be real and scarce.

[-]tylermjohn7mo41

I wanted to answer as OP, but I don't have a particularly informed view on this. And my view is biased by having spent a lot of time around moral philosophers who are hugely, hugely disproportionately likely to fall into the "moral reasoning is real" camp — and by having evolved from conservative Evangelical ethics to classical utilitarianism!

One fairly robust observation in psychology is that people do feel pressure to engage in moral consistency reasoning. One example of this comes from the literature on the Meat Eating Paradox. Let me reproduce an overview from pp. 573-574 here:

In a series of five studies, Brock Bastian and colleagues have demonstrated a link between seeing animals as food, on one hand, and seeing animals as having diminished mental lives and moral value, on the other hand. We will here describe three.
In a first study, participants were asked to rate the degree to which each of a diverse group of thirty-two animals possessed ten mental capacities, and then were asked how likely they would be to eat the animal and how wrong they believe eating that animal is. Perceived edibility was negatively associated with mind possession (r = –.42, p < .001), which was in turn associated with how the perceived wrongness of eating the animal (r = .80, p < .001).
In a second study, participants were asked to eat dried beef or dried nuts and then judge a cow’s cognitive abilities and desert of moral treatment on two seven-point scales. Participants in the beef condition (M = 5.57) viewed the cow as significantly less deserving of moral concern than those in the control condition (M = 6.08).
In a third study, participants were informed about Papua New Guinea’s tree kangaroo and informed variably that tree kangaroos have a steady population, that they are killed by storms, that they are killed for food, or that they are foraged for food. Bastian and colleagues found that categorizing tree kangaroos as food and no other features of these cases led participants to attribute less capacity for suffering and less moral concern.
Additionally, a sequence of five studies from Jonas Kunst and Sigrid Hohle demonstrates that processing meat, beheading a whole roasted pig, watching a meat advertisement without a live animal versus one with a live animal, describing meat production as “harvesting” versus “killing” or “slaughtering,” and describing meat as “beef/pork” rather than “cow/pig” all decreased empathy for the animal in question and, in several cases, significantly increased willingness to eat meat rather than an alternative vegetarian dish.
Psychologists involved in these and several other studies believe that these phenomena occur because people recognize an incongruity between eating animals and seeing them as beings with mental life and moral status, so they are motivated to resolve this cognitive dissonance by lowering their estimation of animal sentience and moral status.

I think studies like these at least imply that people are driven to resolve local tensions in their moral intuitions. (E.g. tensions between views about moral status, consciousness, right action, and belief about one's own virtues.) How often would resolving these tensions lead to radical departures in moral views? I'm not sure! It seems to mostly depend on how much psychological pressure this creates and how interconnected people's web of moral intuitions are. In people with highly interconnected moral webs and a lot of psychological pressure towards consistency, you'd expect much bigger changes.

[-]RogerDearnaley7mo10

Evolutionary ethics is a real, scientific field of biology — it makes actual, falsifiable predictions. However, that's not what most people think of when they say "moral reasoning".

[-]Satya Benson7mo101

Evolutionary ethics aims to help people understand why we value the things we do. It doesn't have the ability to say anything about what we ought to value.

[-]RogerDearnaley7mo*30

Evolutionary ethics provides a solution to the "ought-from-is" problem — in a cold uncaring universe governed by physical laws, where does the preference ordering/utility function of human values come from? That is a question about humans, and evolutionary ethics is the name of the scientific field that studies and answers it.

In order to decide "what we ought to value", you need to create a preference ordering on moral systems, to show that one is better than another. You can't use a moral system to do that — any moral system (that isn't actually internally inconsistent) automatically prefers itself to all other moral systems, so using a moral system to select a moral system is just a circular argument — the same logic applies to any moral system you plug in to such an argument.

So to discuss "what we ought to value" you need to judge moral systems and their consequences using something that is both vaguer and more practical than a moral system. Such as psychology, or sociology, or political expedience, or some combination of these. All of which occur in the context of human nature and human moral intuitions and instincts — which is exactly what evolutionary ethics studies and provides a theoretical framework to explain.

[-]Satya Benson7mo91

Thanks for explaining.

So to discuss "what we ought to value" you need to judge moral systems and their consequences using something that is both vaguer and more practical than a moral system. Such as psychology, or sociology, or political expedience, or some combination of these.

I think this is tempting but ultimately misguided, because the choice of a 'more practical and vague' system by which to judge moral systems is just a second order moral system in itself which happens to be practical and vague. This is metanormative regress.

The only coherent solution to the "ought-from-is" problem I've come across is normative eliminativism - 'ought' statements are either false or a special type of descriptive statement.

[-]RogerDearnaley7mo10

I encourage you to look into evolutionary ethics (and evolutionary psychology in general): I think it provides both a single, well-defined (though vague) ethical foundation and an answer to the "ought-from-is" problem. It's a branch of science, rather than philosophy, so we are able to do better than just agreeing to disagree.

[-]Said Achmiz7mo113

I’ve looked into these things, and as far as I can tell, all such fields or theories either do not attempt to solve the is-ought problem (as e.g. evo psych does not), or attempt to do so but (absolutely unsurprising) completely fail.

What am I missing? What’s the answer?

[-]RogerDearnaley7mo20

Humans are living, evolved agents. They thus each individually have a set of goals they attempt to optimize: a preference ordering on possible outcomes. Evolution predicts that, inside the distribution the creature evolved in, this preference ordering will be nearly as well aligned to the creature's evolutionary fitness as is computationally feasible for the creature.

This is the first step in ought-from-is: it gives us a preference ordering, which if approximately coherent (i.e. not significantly Dutch-bookable — something evolution seems likely to encourage) implies an approximate utility function — a separate one for each human (or other animal). As in "this is what I want (for good evolutionary reasons)". So, using agent fundamentals terminology, the answer to the ought-from-is question "where does the preference ordering on states of the world come from?" is "every evolved intelligent animal is going to have a set of evolved and learned behaviors that can be thought of as encoding a preference ordering (albeit one that may not be completely coherent, to the extent that it only approximately fulfills the criteria for the coherence theorems). " [It even gives us a scale on the utility function, something a preference ordering doesn't give us, in terms of the approximat effect on the evolutionary fitness of the organism: which ought correlate fairly well with the effort the organism is willing to put in to optimizing the outcome. This solves things like the utility monster problem.]

So far, that's just Darwinism, or arguably the subfield Evolutionary Psychology, since it's about the evolution of behavior. And so far the preference ordering "ought" is "what I want" rather than an ethical system, so arguably doesn't yet deserve the term "ought" — I want to have a billion dollars, but saying that I thus "ought" to have a billion dollars is a bit of a stretch linguistically. Arguably so far we've only solved "want-from-is".

Evolutionary Ethics goes on to explain why humans, as an intelligent social animal, are evolved to have a set of moral instincts that lets them form a set of conventions for compromises between the preference orderings of all the individual members of a tribe or other society of humans, in order to reduce intra-group conflicts by forming a "social compact" (to modify Hobbes' terminology slightly). For example, the human sense of fairness encourages sharing of food from successful hunting or gathering expeditions, our habit of forming friendships produces alliances, and so forth. The results of this are not exactly a single coherent preference ordering on all outcomes for the society in question , let alone a utility function, more a set of heuristics on how the preference orderings of individual tribal members should be reconciled ('should' is here being used in the sense that, if you don't do this and other members of the society find out, there are likely to be consequences). In general, members of the society are free to optimize whatever their own individual preferences are, unless this significantly decreases the well-being (evolutionary fitness) of other members of the society. My business is mine, until it intrudes on someone else: but then we need to compromise.

So now we have a single socially agreed "ought" per society — albeit one fuzzier and with rather more internal structure than people generally encode into utility functions: it's a preference ordering produced by a process whose inputs are many preference orderings, (and might thus be less coherent). This moral system will be shaped both by humans' evolved moral instincts (which are mostly shared across members of our species, albeit less so by socipaths), as is predicted by evolutionary ethics, and also by sociological, historical and political processes.

So, in philosophical terminology:

moral realism: no (However, human evolved moral instincts do tend to provide some simple consistent moral patterns across human societies, as long as you qualify all your moral statements with the rider "For humans, …". So one could argue for a sort of 'semi-realism' for some simple moral statements, like "incest is bad" — that has a pretty clear evolutionary basis, and is pretty close to socially universal.)
moral relativism: yes — per society, and for some basic patterns/elements for the entire human species, but with no guarantees that these would apply to a very different intelligent social species (though there might well be commonalities for good evolutionary reasons — anything with sexual reproduction and deleterious recessives is likely to evolve an incest taboo.).

Given Said Achmiz's comment already has 11 upvotes and 2 agreement points, should I write a post explaining all this? I had thought it all rather obvious to anyone who looks into evolutionary ethics and thinks a bit about what this means for moral philosophy (as quite a number of moral philosophers have done), but perhaps not.

[-]Satya Benson7mo30

This comment does really help me understand what you're saying better. If you write a post expanding it, I would encourage you to address the following related points:

Can you have some members of a society who don't share some of the consistent moral patterns which evolved, or do you claim that every member reliably holds these morals?
Can someone decide what they ought to value using this system? How?
Is it wrong if someone simply doesn't care about what society values? Why?
How can we tell that your story tells us what we ought to value rather than simply explaining why we value the things we do?
Do you make a clear distinction between normative ethics and descriptive ethics? What is it?

[-]RogerDearnaley7mo*41

Thanks, I'll keep that in mind when deciding what to cover in the post when I write it.

Briefly for now, just to continue the discussion a bit:

Can you have some members of a society who don't share some of the consistent moral patterns which evolved, or do you claim that every member reliably holds these morals?
The former (sociopaths, for example, are genetically predisposed to be less moral, and it has often been suggested this behavior is an adapted form of social opportunism, in game theory terms a different strategy, perhaps one with a stable equilibrium frequency, rather than being simply a genetic disease) — though they may get punished or shunned as a result, if their morality is different in a way that other members of the society disapprove of.
Can someone decide what they ought to value using this system? How?
How a person wants to make decisions is up to them. Most people make these decisions in a way that is influenced by their own moral instincts, social pressures, their circumstances and upbringing, their personality, expedience, and so forth. Generally, acting contrary to your instincts and impulses is challenging to do and stressful — it's probably easier to go against them only when there's a clear rational need. For example, if you're rationally aware that they are maladaptive or antisocial in modern society.
Is it wrong if someone simply doesn't care about what society values? Why?
In the context of their society of humans, yes, it is considered wrong (in almost all societies). Note that this is a morally relative statement, not a morally realist one. However, simply not caring at all is pretty atypical behavior under human moral intuitions, and is generally also pretty maladaptive (unless, say, you have absolute power). So from an evolutionary ethics point of view, it seems likely to be maladaptive behavior that will often get you imprisoned, exiled or killed. So as relative statements go, this is a pretty strong one.
How can we tell that your story tells us what we ought to value rather than simply explaining why we value the things we do?
The point of evolutionary ethics is that there is no meaningful, uniquely defined, separate sense of "ought" much stronger than "according to most common moral systems for this particular social species, or most similar species". So the best you can do is explain why we, or most societies of a certain type, or most societies of a certain species, believe that that's something you "ought" to do. This approach isn't a form of moral realism.
Do you make a clear distinction between normative ethics and descriptive ethics? What is it?
Normative ethics describes my opinion about what I think people should do. Descriptive ethics describes what many people think people should do. In a society that has a social compact, the latter carries a lot more weight. However, I'm perfectly happy to discuss ethical system design: if we altered the ethics of our (or some other) society in a certain way, then the effects on the society would be this or that, which would or wouldn't tend to increase or decrease things like human flourishing (which is itself explained by evolutionary psychology). That sounds a lot like normative ethics, but there's a key difference: the discussion is based on a (hopefully mutually agreed) assessment of the relative merits of the predicted consequences, not "because I said so" or "because I heard God say so".

[-]Said Achmiz7mo30

Given Said Achmiz’s comment already has 11 upvotes and 2 agreement points, should I write a post explaining all this? I had thought it all rather obvious to anyone who looks into evolutionary ethics and thinks a bit about what this means for moral philosophy (as quite a number of moral philosophers have done), but perhaps not.

I’m afraid that what you’ve written here seems… confused, and riddled with gaps in reasoning, unjustified leaps, etc. I do encourage you to expand this into a post, though. In that case I will hold off on writing any detailed critical reply, since the full post will be a better place for it.

[-]RogerDearnaley7mo20

Fair enough — then I'll add that to my list of posts to write.

[-]Davidmanheim7mo50

Another question to ask, even assuming faultless convergence, related to uniqueness, is whether the process of updates has a endpoint at all.

That is, I could imagine that there exists series of arguments that would convince someone who believes X to believe Y, and a set that would convince someone who believes Y to believe X. If both of these sets of arguments are persuasive even after someone has changed their mind before, we have a cycle which is compatible with faultless convergence, but has no endpoint.

[-]Kaarel7mo*102

I'd rather your "that is" were a "for example". This is because:

It's also possible for the process of updates to not be getting arbitrarily close to any endpoint (with a notion of closeness that is imo appropriate for this context), without there being any sentence on which one keeps changing one's mind. If we're thinking of one's "ethical state of mind" as being given by the probabilities one assigns to some given countable collection of sentences, then here I'm saying that it can be reasonable to use a notion of convergence which is stronger than pointwise convergence. For math, if one just runs a naive proof search and assigns truth value 1 to proven sentences and 0 to disproven sentences, one could try to say this sequence of truth value assignments is converging to the assignment that gives 1 to all provable sentences and 0 to all disprovable sentences (and whatever the initialization assigns to all independent sentences, let's say), but I think that in our context of imagining some long reflection getting close to something in finite time, it's more reasonable to say that one isn't converging to anything in this example — it seems pretty intuitive to say that after any finite number of steps, one hasn't really made much progress toward this kinda-endpoint (after all, one will have proved only finitely many things, and one still has infinitely many more things left to prove). Bringing this a tad closer to ethical reality: we could perhaps imagine someone repeatedly realizing that projects they hadn't really considered before are worth working on, infinitely many times, with what they are up to thus changing [by a lot] [infinitely many times]. To spell out the connection of this to the math example a bit more: the common point is that novelty can appear in the sentences/things considered, so one can have novelty even if novelty doesn't keep showing up in how one relates to any given sentence/thing. I say more about these themes here.

[-]Davidmanheim7mo30

Thank you, that is a great point.

[-]Charlie Steiner7mo52

I feel sad that your hypotheses are almost entirely empirical, but seem like they include just enough metaethically-laden ideas that you have to go back to describing what you think people with different commitments might accept or reject.

My checklist:

Moral reasoning is real (or at least, the observables you gesture towards could indeed be observed, setting aside the interpretation of what humans are doing)

Faultless convergence is maybe possible (I'm not totally sure what observables you're imagining - is an "argument" allowed to be a system that interacts with its audience? If it's a book, do all people have to read the same sequence of words, or can the book be a choose your own adventure that tells differently-inclined readers to turn to different pages? Do arguments have to be short, or can they take years to finish, interspersed with real-life experiences?), but also I disagree with the connotation that this is good, that convergence via argument is the gold standard, that the connection between being changed by arguments and sharing values is solid rather than fluid.

No Uniqueness

No Semi-uniqueness

Therefore Unificiation is N/A

[-]tylermjohn7mo*41

I feel sad about this too. But this is common in impure scientific disciplines, e.g. medical studies often refer to value-laden concepts like proper functioning. The ideal would be to gradually naturalize all of this so we can talk to each other about observables without making any assumptions about interpretation of open-textured terminology. What I want to show here is primarily an existence proof that we can fully naturalize this discussion, but I haven't yet managed to do this.

I think this is a very good question about arguments. And I do think we will have to make value judgments about what kinds of moral deliberation processes we think are "good" otherwise we are merely making predictions about behaviour rather than proposing an approach to alignment. An end result I would like would be one where the moral realist and the antirealist can neutrally discuss empirical hypotheses about what kinds of arguments would lead to what kind of updating, and discuss this separately from the question of which kinds of updating we like. This would allow for a more nuanced conversation, where instead of saying "I'm a realist therefore keep the future open" or "I'm an antirealist therefore lock it down" we can say "Let's set aside capital letters and talk about what really motivated people in moral cognition. I think, empirically, this is how people reason morally and what people care about; personally, I want to make X intervention in the way people reason morally and would invite you to agree with me."

[-]WhatsTrueKittycat7mo24

This is very well put, and I think it drives at the heart of the matter very cleanly. It also jives with my own (limited) observations and half-formed ideas about how AI alignment also in some ways demands progress in ethical philosophy towards a genuinely universal and more empirical system of ethics.

Also, have you read C.S. Lewis' Abolition of Man, by chance? I am put strongly in mind of what he called the "Tao", a systematic (and universal) moral law of sorts, with some very interesting desiderata, such as being potentially tractable to empirical (or at least intersubjective) investigation, and having a (to my mind) fairly logical idea of how moral development could take place through such a system. It appears to me to be a decent outline of how your naturalized moral epistemology could be cashed out (though not necessarily the only way).

[-]Joseph Miller7mo*20

This is the best proposal I've read for making progress on the question of "What should we point the ASI at?"

Are humans such that they can be brought through a series of arguments and endorse all of the premises in the arguments and end up with radically different moral views

I don't feel quite satisfied with this operationalization. It could Goodharted (especially by an ASI) by finding adversarial inputs to humans.

I also think someone could decide that their meta-preference is to endorse their unconsidered preferences and that the act of thinking about moral arguments is actually against their morality. This may sound absurd but probably accurately describes many people's responses to drowning child thought experiments. Most people don't act as if they have utility functions and I don't see why people should be required to adopt one. To the extent that it matters a lot what people think about their preferences upon reflection, they will probably notice the inconsistencies in their preference once they start Dutch-booking themselves by manifesting inconsistent preferences with the help of ASI.

kinds of questions we know how to solve, and could solve with the help of AI

I predict that once people can talk to ASIs, many will quickly see the bizareness of the realist view and a lot more people will start saying "I don't know what you're talking about, can you please make this clearer?".

[-]RogerDearnaley7mo*-4-9

So many disagreements… Moral philosophy is not functionally useful: it just enumerates all the ways one could agree to disagree. It doesn't actually answer any questions, it just lists and categorizes the possible answers and arguments, including things like disagreeing whether a moral question even has an answer (that's Realism vs Antirealism for you).

Are there any more practical alternatives that might interest a rationalist? Yes, there are. I wrote an entire sequence about them. Rather than my trying to sketch them here, go read that, comment on and discuss it.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

52

Can We Naturalize Moral Epistemology?

52

Moral epistemology naturalized

Implications

How to reject my view

52