This post seems to be a duplicate of this one originally posted in 2019. Right? Sorry if I'm confused.
Yes. I didn't intentionally post this; it seems to have been automatically crossposted from my blog (but I'm not sure why).
I'm open to deleting it, but there are already a bunch of comments; not sure what the best move is.
RSS feed imports are a bit finicky and usually only show the last N posts or so. Maybe you deleted an old post of yours, which caused it to show up in the RSS feed again, and then we imported it on the assumption that it's a new post. Though we do have some sanity-checks to not double-import posts, so I wonder what happened.
You're not wrong, but that's only the beginning. Metamathematics shows you can't achieve every desideratum for.a formal proof system simultaneously. Similar, but less formal arguments apply to epistemology.
My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person
Then the project of ethics becomes about this subproblem of identifying a close enough idealization/tempering of a person.
Value aggregation seems solvable with this component and not at all solvable without it.
Are my values taken from the state of mind in which I'm most agentic, the state of mind when my awareness is as expansive as it gets, the things I would decide after a hundred years reflection, the person I wish I was in spite of the one that I am, or have I internalized the values of my community to such a deep extent that a portion of the decision as to who I will crystalize as lies with them? Where from our daily stream of desires do we sample, beginning middle or end? Should the (simulated) moral judgement interview room be brown or white or green? Should we be judging in prospective or retrospective? Alone, or with friends?
I think this may be the component of the specification of alignment or a single human's desires, that may actually be culturally specific, this may be the part of the job that humanities majors are supposed to do, if their art is real?
What's the logical and theoretical basis for the supposed existence of such a construct?
If there isn't, then the default assumption is that there's no such thing. Much like how we by default assume that there are no porcelain teapots orbiting the sun.
I feel like you're asking me the logical and theoretical basis for the existence of feet: I don't know what must be going on in your head to ask such a question, why you would have a blindspot for such an obvious thing, so I don't know how to help.
You have not sensed that humans are often like agents, or agency-pursuing, or that they have consistent enough desires, or that they aren't attached to their inconsistencies, or that the inconsistencies they're most attached to could be formalized as a kind of discontinuity in a utility function. To have passions that are taken seriously. I don't know what it means for a person to lack that sense.
You think the average person has, or believes in, a 'canonical “idealised version"' of themselves in some form?
Did you forget that your quoting Richard_Ngo who also has expressed reservations along the same lines?
I don't want to burst your bubble but the chances of your views being in the minority is not zero. Certainly far greater then what the certitude of the reply would suggest.
Has, doesn't believe in, but would after the right series of conversations.
Quoting? I see the reservations. I'm trying to engage with them.
Not a single extrapolation but exploration of the parameter space of valid extrapolations where the criteria for what counts as valid is part of what's being explored. Eg if each step of each extrapolation needs to be self authorizing (ie grant consent for its own existence) then that bounds a volume defined by all the paths that are so endorsed. One might then be tempted to ask about which chasms are worth crossing. But I think that question is better considered from the point of view of the cev of the endorsed volume first.
I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs.
I agree with this ("there is no canonical 'idealized version' of a person...") but don't actually see how it is an objection to the proposed grounding method?
CEV is an extrapolation, and I think it's likely that there are multiple valid ways to do the extrapolation when starting from humans. A being that results from one possible extrapolation may find the existence of a being that results from a different extrapolation morally horrifying, or at least way lower utility than beings like itself.
But (by definition of CEV), they should all be broadly acceptable to the original thing that was first extrapolated. The extrapolation process will probably require deciding on some tough questions and making tradeoffs where the answers feel unacceptable or at least arbitrary and unsatisfying to the original. But they probably won't feel arbitrary to the extrapolated beings that result - each possible being will be self-consistently and reflectively satisfied with the particular choices that were made in its history.
Another way of looking at it: I expect CEV() to be a lossy many-to-many map, which is non-value-destroying only in the forwards direction. That is, humans can be mapped to many different possible extrapolated beings, and different possible extrapolated beings reverse-map back to many different possible kinds of humans. But actually applying the reverse mapping to an extant mind is likely to be a moral horror according to the values of a supermajority (or at least a large coalition) of all possible beings. Applying the forwards map slightly incorrectly, or possibly even at all, might be horrifying to a lot of possible minds as well, but I expect the ratio to be tiny. Among humans (or at least LWers) I expect people to be mostly OK with having CEV() applied to them, but absolutely not want CEV^-1() applied afterwards.
I interpret the quote to mean that there's no guarantee that the reflection process converges. Its attractor could be a large, possibly infinite, set of states rather than a single point.
I think that's possible, but I'm saying we can just pick one of the endpoints (or pick an arbitrary, potentially infinitely-long path towards an endpoint), and most people (the original people, and the people who result from that picking) will probably be fine with that, even if involves making some tough and / or arbitrary choices along the way.
Or, if humans on reflection turn out to never want to make all of those choices, that's maybe also OK. But we probably need at least one person (or AI) to fully "grow up" into a coherent being, in order to actually do really big stuff, like putting up some guardrails in the universe.
That growing up process (which is hopefully causally descended from deliberate human action at some point far back enough) might involve making some arbitrary and tough choices in order to force it to converge in a reasonable length of time. But those choices seem worth making, because the guardrails are important, and an entity powerful enough to set them up is probably going to run into moral edge cases unavoidably. Better its behavior in those cases be decided by some deliberate process in humans, rather than left to some process even more arbitrary and morally unsatisfying.
CEV also has another problem that gets in the way of practically implementing it: it isn't embedded. At least in its current form, CEV doesn't have a way of accounting for side-effects (either physical or decision-theoretic) of the reflection process. When you have to deal with embeddedness, the distinction between reflection and action breaks down and you don't end up getting endpoints at all. At best, you can get a heuristic approximation.
I'd say your position holds even if someone is a moral realist because the problem of the criterion shows us that we cannot be certain about moral facts, just as we cannot be certain about any facts. To claim otherwise is to suppose unjustifiable special ability to know the truth, which is a position some people take but which, for anyone other than themselves, does nothing to resolve uncertainty.
Moral indefinability is the term I use for the idea that there is no ethical theory which provides acceptable solutions to all moral dilemmas, and which also has the theoretical virtues (such as simplicity, precision and non-arbitrariness) that we currently desire. I think this is an important and true perspective on ethics, and in this post will explain why I hold it, with the caveat that I'm focusing more on airing these ideas than constructing a watertight argument.
Here’s another way of explaining moral indefinability: let’s think of ethical theories as procedures which, in response to a moral claim, either endorse it, reject it, or do neither. Moral philosophy is an attempt to find the theory whose answers best match our intuitions about what answers ethical theories should give us (e.g. don’t cause unnecessary suffering), and whose procedure for generating answers best matches our meta-level intuitions about what ethical theories should look like (e.g. they should consistently apply impartial principles rather than using ad-hoc, selfish or random criteria). None of these desiderata are fixed in stone, though - in particular, we sometimes change our intuitions when it’s clear that the only theories which match those intuitions violate our meta-level intuitions. My claim is that eventually we will also need to change our meta-level intuitions in important ways, because it will become clear that the only theories which match them violate key object-level intuitions. In particular, this might lead us to accept theories which occasionally evince properties such as:
Of course, we're able to adjust our principles so that we are more satisfied with their performance on novel moral dilemmas. But I claim that in some cases this comes at the cost of those principles conflicting with the intuitions which make sense on the scales of our normal lives. And even when it's possible to avoid that, there may be many ways to make such adjustments whose relative merits are so divorced from our standard moral intuitions that we have no good reason to favour one over the other. I'll give some examples shortly.
A second reason to believe in moral indefinability is the fact that human concepts tend to be open texture: there is often no unique "correct" way to rigorously define them. For example, we all know roughly what a table is, but it doesn’t seem like there’s an objective definition which gives us a sharp cutoff between tables and desks and benches and a chair that you eat off and a big flat rock on stilts. A less trivial example is our inability to rigorously define what entities qualify as being "alive": edge cases include viruses, fires, AIs and embryos. So when moral intuitions are based on these sorts of concepts, trying to come up with an exact definition is probably futile. This is particularly true when it comes to very complicated systems in which tiny details matter a lot to us - like human brains and minds. It seems implausible that we’ll ever discover precise criteria for when someone is experiencing contentment, or boredom, or many of the other experiences that we find morally significant.
I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs. I explore this objection in much more detail in this post. (In fact, the more general idea that humans aren’t really “utility maximisers”, even approximately, is another good argument for moral indefinability.) And even if idealised reflection is a coherent concept, it simply passes the buck to your idealised self, who might then believe my arguments and decide to change their meta-level intuitions.
So what are some pairs of moral intuitions which might not be simultaneously satisfiable under our current meta-level intuitions? Here’s a non-exhaustive list - the general pattern being clashes between small-scale perspectives, large-scale perspectives, and the meta-level intuition that they should be determined by the same principles:
I suspect that many readers share my sense that it'll be very difficult to resolve all of the dilemmas above in a satisfactory way, but also have a meta-level intuition that they need to be resolved somehow, because it's important for moral theories to be definable. But perhaps at some point it's this very urge towards definability which will turn out to be the weakest link. I do take seriously Parfit's idea that secular ethics is still young, and there's much progress yet to be made, but I don't see any principled reason why we should be able to complete ethics, except by raising future generations without whichever moral intuitions are standing in the way of its completion (and isn't that a horrifying thought?). From an anti-realist perspective, I claim that perpetual indefinability would be better. That may be a little more difficult to swallow from a realist perspective, of course. My guess is that the core disagreement is whether moral claims are more like facts, or more like preferences or tastes - if the latter, moral indefinability would be analogous to the claim that there’s no (principled, simple, etc) theory which specifies exactly which foods I enjoy.
There are two more plausible candidates for moral indefinability which were the original inspiration for this post, and which I think are some of the most important examples:
The former may seem the obvious choice, until we take into account the problem of maximisation. Consider that a true (non-person-affecting) hedonic utilitarian would kill everyone who wasn't maximally happy if they could replace them with people who were (see here for a comprehensive discussion of this argument). And that for any precise definition of welfare, they would search for edge cases where they could push it to extreme values. In fact, reasoning about a "true utilitarian" feels remarkably like reasoning about an unsafe AGI. I don't think that's a coincidence: psychologically, humans just aren't built to be maximisers, and so a true maximiser would be fundamentally adversarial. And yet many of us also have strong intuitions that there are some good things, and it's always better for there to be more good things, and it’s best if there are most good things.
How to reconcile these problems? My answer is that utilitarianism is pointing in the right direction, which is “lots of good things”, and in general we can move in that direction without moving maximally in that direction. What are those good things? I use a vague conception of welfare that balances preferences and hedonic experiences and some of my own parochial criteria - importantly, without feeling like it's necessary to find a perfect solution (although of course there will be ways in which my current position can be improved). In general, I think that we can often do well enough without solving fundamental moral issues - see, for example, this LessWrong post arguing that we’re unlikely to ever face the true repugnant dilemma, because of empirical facts about psychology.
To be clear, this still means that almost everyone should focus much more on utilitarian ideas, like the enormous value of the far future, because in order to reject those ideas it seems like we’d need to sacrifice important object- or meta-level moral intuitions to a much greater extent than I advocate above. We simply shouldn’t rely on the idea that such value is precisely definable, nor that we can ever identify an ethical theory which meets all the criteria we care about.