Crosspost from the EA Forum.
Why think about goalsets?
Societies need many distinct systems: a transport system, a school system, etc. These systems cannot be justified if they are amoral, so they must serve morality. Each system cannot, however, achieve the best moral outcome on its own: If your transport system doesn’t cure cancer, it probably isn’t doing everything you want; if it does cure cancer, it isn’t just a “transport” system. So each system must have a bounded domain and, ideally, still be morally optimal. To solve this problem, systems can be assigned goals, which are statements that we want the system to satisfy. Goals weakly constrain a system’s domain because if a system satisfies its goals, there is no more for it to do.
The problem, then, is deciding which goals to pursue. Each goal should not be evaluated on its own. Instead, we should evaluate your entire set of goals, your goalset. There are two reasons for this. (1) Any problem with a goal will manifest in a goalset that contains it, so we can rephrase any criticisms of a goal in terms of goalsets. And (2) there are times when we have to look at the goalset as a whole before identifying an issue. For example, suppose two identical twins, Robinson and Crusoe, are stranded on an island. Suppose the best outcome occurs when either Robinson hunts for food and Crusoe builds a shelter or vice versa. So “Crusoe builds a shelter” is a goal that fits in one ideal outcome, and “Robinson builds a shelter” fits in another. However, if their goalset contains both those statements, the pair will starve (albeit in the comfort of an excellent shelter).
Clearly, your goalset should be related in some way to your morality. But how, precisely? Can goalsets even be a component of making ideal systems? Does it make sense to talk about “ideal” systems? What does it mean for a system to “satisfy” a goal? What are the “consequences” of a goalset? What is “equivalence” between goalsets? We’ll define all of these terms from the same four objects, which allows us to (1) see how these concepts relate to each other, (2) generate theorems about policy arguments, and (3) provide stronger foundations for informal reasoning.
Some of the later conclusions
- Vague goals have no place within a goalset. However, we can satisfy concrete goals that are related to the vague goals.
- If goals are “narrow” when they are not decomposable into smaller goals, it is invalid to criticize a goal for being narrow.
- It’s invalid to criticize a goalset for not including its consequences.
- Evaluating the goals that a system satisfies is not sufficient to determine whether the system is ideal or not.
- Criticizing a system because it doesn’t satisfy additional goals beyond what it was assigned is invalid, unless the system is not fully determined by its assigned goals.
- Ideal systems don’t necessarily do the greatest good at the margin. In other words, they don’t necessarily do constrained maximization of an objective function.
- We can’t fully evaluate systems when we evaluate them individually.
- Avoid selecting arbitrary numbers for your systems: Try to select a system whose goals imply that it will determine your optimal numbers for you.
The base concepts
Timelines and forecasts
A timeline can be represented as the set containing all statements that are empirically true at all times and places within that timeline. So “[Unique identifier for you] is reading [unique identifier for this sentence]” would not be in the representation of the timeline that we exist in, because once you begin the next sentence, it’s no longer true. Instead, the statement would have to say something like “[Unique identifier for you] read [unique identifier for that sentence] during [the time period at which you read that sentence]”. For brevity's sake, the remaining example statements won't contain unique identifiers or time periods, as the precise meaning of the statements should be clear from the context.
The trolley problem makes you choose between two timelines (). Our timeline representation allows us to neatly state whether something is true within a given timeline or not: “You pull the lever” , and “You pull the lever” . Timelines contain statements that are combined as well as statements that are atomized. For example, since “You pull the lever”, “The five live”, and “The one dies” are all elements of , you can string these into a larger statement that is also in : “You pull the lever, and the five live, and the one dies”. Therefore, each timeline contains a very large statement that uniquely identifies it within any finite set of timelines (i.e. any finite subset of ). Timelines won’t be our unit of analysis because the statements they contain have no subjective empirical uncertainty.
This uncertainty can be incorporated by using forecasts (), each of which contains all statements that are either empirically true at all times and places or false at all times and places, except each statement is amended with an associated credence. Though there is no uncertainty in the trolley problem, we could still represent it as a choice between two forecasts: guarantees (the pull-the-lever timeline) and guarantees (the no-action timeline). So, would contain the statement “The five live with a credence of 1”. Since each timeline contains a statement that uniquely identifies it within a finite set of timelines, each forecast can roughly be thought of as a probability distribution of timelines. So, the trolley problem reveals that you either morally prefer (denoted as ), prefer (denoted as ), or you believe that both forecasts are morally equivalent (denoted as ).
Rather than evaluate every moral dilemma with your intuitions, you could think of a moral rule that would give a moral ranking over forecasts. A moral rule does this by selecting the best forecast (or forecasts) from any given set of forecasts. Let’s say Jane believes that pulling the lever (forecast ) is morally better. Her first guess at a moral rule is “More people are better than fewer”. This rule selects the pull-the-lever forecast as best, which fits with her intuition; so far, so good. She then imagines another dilemma: Choose a forecast with 1,000,000 happy people in perpetuity, or one with 1,000,001 miserable people in perpetuity. Her rule selects the miserable forecast. This does not fit with her intuition, so she discards her hypothesized moral rule and thinks of a new rule to test.
You might wonder, if moral intuitions arbitrate which moral rules are “correct”, why not just use moral intuition to evaluate forecasts? What’s the point of a moral rule? Unless we’re born with moral intuitions that are perfectly accurate, we’re going to have some priors that are wrong. Sometimes our moral priors are outright contradictory. For example, “Violence is never moral” and “Culture determines morality” contradict because of the possibility of violent cultures. Sometimes our stated justifications for our moral intuitions are contradictory, but those intuitions could be represented with a non-contradictory moral rule. Often, our intuitive ranking will be incomplete, which makes moral decisions difficult even if we could perfectly predict the consequences. Finding a moral rule gives us a way to correct poor moral intuitions, explain our good moral intuitions with more clarity, and guide us where our intuitions don’t exist.
Figure 1: Moral intuitions (the left set) help us guess a moral rule (). Your moral rule implies a ranking of forecasts (the right set), which you compare to your moral intuitions. Whether you disregard your rule or update your intuition is dependent on how well the rule fits your intuitions, and how adamant you are about your conflicting priors.
Only some moral theories have a moral rule. Some consequentialist moral rules, such as total utilitarianism, assign each forecast a number: The higher the number, the better the moral standing of the forecast. A deontological theory has a moral rule if it resolves all dilemmas where prohibitions and imperatives are traded off against each other, either by prioritizing them over one another (lexicographic preferences), weighting the importance of each prohibition and imperative (which effectively assigns each forecast a number), or by assigning any violation “infinite badness” (i.e. any forecast in which you break any prohibition or imperative is unacceptable, and any other forecast is acceptable). A moral rule can relate to the actions and thoughts of all people (which is typical of consequentialism), or only to your own actions and thoughts (which is typical of deontology, e.g. whether your nonviolence causes violence in others has no effect on your moral rule).
Some arguments undermine a proposed moral rule with empirical facts. These arguments can show that a moral rule’s base concepts need to be better defined or are undefinable, such as the ideas of personal identity or causality. They can also show that a moral rule’s real-world application leads to ridiculous moral evaluations, such as the paralysis argument. Thus, fully analysing a moral rule requires us to have an empirical model.
Fitting the base concepts together with plans
Your empirical model () is used to make a forecast that is conditional on you attempting a given plan ():
Therefore, it can be used to determine the set of currently feasible forecasts ():
Forecasts can be evaluated by your moral rule:
The set of ideal forecasts () is the subset of feasible forecasts that are morally preferred to any other feasible forecast:
Note an ideal forecast is, with the right plan, possible. They are the best possible futures.
Goalsets and their properties
If each of your goals is satisfied by its respective system, you have satisfied your goalset (). This avoids any issues of assigning causality from a system to its goals. Goalsets are satisfied by the feasible forecasts that contain all of your goals:
The problem, then, is to show that your goals “match” your moral rule.
Ideal goalset: . There exists a feasible forecast that satisfies your goalset and all such forecasts are ideal.
Note that an ideal goalset is, by construction, possible.
The criterion for an ideal goalset shows us what we hope to achieve: Satisfying an ideal goalset will guarantee an ideal forecast.
Since we ultimately need to formulate a plan, the purpose of creating a goalset is to slice up an ideal forecast in such a way that each goal is solvable by a system, and provably so.
The problem with idealness as a criterion is that it can only differentiate between goalsets that have the best possible forecast and ones that don’t. It’s too restrictive: Rejecting a goalset because it isn’t ideal would be like rejecting a theorem because it doesn’t solve all of mathematics. We need a more realistic condition on which to judge policy arguments.
Aligned goalset: . There exists an ideal forecast that satisfies your goalset.
For each aligned goalset, there’s a corresponding superset that is ideal. That is, an aligned goalset can be turned into an ideal goalset by appending more goals—you don’t need to remove any. Thus, goals that belong to an aligned goalset are ideal, and the systems that satisfy those goals might be as well. You can determine whether your goalset is aligned by asking “Would systems in an ideal forecast satisfy this goalset?”
Feasible goalset: . Some feasible forecasts satisfy your goalset.
Feasibility is the weakest criterion by which to judge a goalset. And yet, it’s not always satisfied. Demanding an infeasible goalset is invalid since it’s an indiscriminate argument: Its users can criticize any possible policy, so it cannot be used to differentiate between our options, which is the very purpose of an argument. Thus, demonstrating a goalset is infeasible removes that goalset from our set of choices. An example of this is Arrow’s Impossibility Theorem. Vague goals, such as “Support veterans”, are also infeasible, because only predictions that can be empirically evaluated are contained within forecasts.
Equivalent goalsets: . Goalsets and are equivalent when a forecast will satisfy if and only if it satisfies .
An example of equivalent goals are the two statements “Person births are maximized” and “Person deaths are maximized”. They’re equivalent because every born person must die and every dead person must have previously been born. Of course, anyone who says they want to maximize human death would probably concern you quite a bit more, but that’s only because goals conventionally don’t have negative terminal value. But that’s not actually necessary: An ideal goal can be an “undesirable” side-effect. Because these two goals are equivalent, saying one of the goals is ideal while the other is not is contradictory.
Equivalent goalsets can show us that criticizing a goal for “narrow scope” is invalid. Suppose that a goal is narrow if it can’t be decomposed into an equivalent set of narrower goals. Take any goalset composed of non-narrow goals. Each goal can be replaced with its decomposition to generate an equivalent goalset. This process can be repeated on these decomposed goals until they’re not decomposable, i.e. until they are narrow. Thus, any goalset has an equivalent that contains narrow goals. If criticizing the narrowness of a goal were valid, this equivalent goalset is worse than the original. But equivalent goalsets are equivalent—they can’t be worse. Therefore, criticizing a goal for being narrow is invalid. Intuitively this makes sense: Systems can satisfy multiple goals, so it doesn’t matter whether or not a particular goal is narrow.
Consequences: . The set of all consequences of is the set of all statements that are true in all forecasts that satisfy .
Note that this includes statements of the form “ or ”. So if “” is true for some goalset-satisfying forecasts and “” is true for the remaining forecasts, “ or ” is a consequence, but neither “” nor “” is a consequence.
Goalsets losslessly compress their set of consequences: Each of your goalset-satisfying forecasts contains the consequences of your goalset ( for all ), so amending your goalset with its consequences produces an equivalent goalset (i.e. for any that is a superset of where is a subset of , ). Thus, it’s invalid to criticize a goalset for not containing some of its consequences because those consequences are, effectively, already part of your goalset. (None of this implies that explaining consequences has no value, just that consequences have no value in your goalset.)
Any number of desirable consequences can follow from the satisfaction of a single goal. Therefore, a system that has only one goal is not necessarily “doing less” than a competing system with many goals.
Fixed-value goalset: . All forecasts that satisfy your goalset are morally equivalent.
A fixed-value aligned goalset must be ideal, and vice versa. So, given that you don’t have ideal goalset, your aligned goalset can’t have a fixed moral value. For an extreme example, suppose your goalset is the null set, which is satisfied by all forecasts from the very best (thus an empty goalset is aligned) to the very worst. Since aligned goalsets can be satisfied by forecasts with such large moral differences, we shouldn’t be content by looking only at the goalset; we need to see how the goalset is satisfied. We need to look at the plan.
Figure 2: Goalset types and their relationships to each other.
Plans and their properties
Let be the forecast that results from attempting plan :
Valid plan: . Your plan is valid if its forecast satisfies your goalset.
Valid plans must have a feasible goalset, because infeasible goalsets can’t be satisfied. But invalid plans can happen even with feasible goalsets. An example of this is the story of the cobra effect, where the exact opposite of the goal was “achieved”.
Ideal plan: . Attempting your plan would engender an ideal forecast.
Given a valid plan, an ideal goalset implies an ideal plan—but an ideal plan only implies an aligned goalset, not an ideal one. You could, for instance, have an ideal plan without any specific goals whatsoever (note that an empty goalset is always aligned and not ideal).
I think of a plan as being a set of subplans (particularly where each subplan specifies how to implement and maintain a system). This allows us to define the set of forecasts where you attempt a superset of a given plan:
Aligned plan: . At least one of the forecasts in which you attempt a superset of your plan is ideal.
If your plan is aligned, each of your subplans is ideal. You can determine whether your plan is aligned by asking “Would systems in an ideal forecast be exactly like this?”
Given a valid plan, an aligned plan implies an aligned goalset—but an aligned goalset does not imply an aligned plan, because it’s possible you get what you asked for (i.e. you have a valid plan) which is what you wanted (i.e. your goalset is aligned), but you’re also getting something you definitely didn’t want (i.e. your plan is not aligned) because you didn’t say that you didn’t want it. For example, wanting a cup of tea and getting it, but breaking a vase in the process.
Figure 3: Assuming a valid plan, an ideal goalset implies an ideal plan, which implies an aligned plan, which implies an aligned goalset, and all of these implications are one-way. The smallest ellipse that encapsulates a word is the set to which the word refers (e.g. “Ideal plan” refers to the second-smallest ellipse) and each ellipse is a sufficient condition for any ellipse that encapsulates it.
As we shouldn’t evaluate goals individually, we shouldn’t evaluate systems individually. An error related to the Narrow Goal fallacy is demanding a single system handles additional goals beyond what it was assigned. Let’s call it “The Kitchen Sink” fallacy. The earlier example of this fallacy was the transport system that cures cancer, which is obviously ridiculous. But people often miss the same error when systems are closely related. For example, a prison system doesn’t need to determine what the laws of a society should be, nor does it need to prevent the crimes of first-time offenders, nor does it need to determine the punishments for inmates—these goals can be better handled by other systems. For example, punishments are better specified within the law rather than relying on the ad hoc cruelty of the prison officers or the other inmates.
A related but valid criticism may show that amending the goalset with higher-priority goals produces an infeasible goalset.
Another example of the fallacy is demanding that every system fixes inequality. Some people reject stable hospital-intern matching systems because “The best interns would go to the best hospitals, which are unaffordable to the poorest patients.” But equal health outcomes are not obviously incompatible with a stable hospital-intern matching system: You might be able to get stability and equality by, for instance, redistributing income via your taxation system. Whether that would work or not is beside the point: To show inequality is a consequence, you must show that no possible system can achieve equality in the presence of stable hospital-intern matching system. You should also show that equality, however you define it, is an ideal goal. The correct response to the fallacy is usually “Another system can handle that problem”.
The Kitchen Sink and Narrow Scope errors are not to be confused with the valid criticism that a system is not fully determined by its goals, i.e. the system is arbitrary. (One caveat is that the choice between two ideal forecasts is necessarily arbitrary, which is one reason why it’s probably good if you have a moral rule that is highly discriminatory, i.e. it doesn’t rank many pairs of forecasts as morally equivalent). For example, we might want a welfare payment to satisfy the goals where (1) it’s always in the interests of eligible customers to apply, and (2) more private income is always better. These goals can be represented mathematically, which allows us to show that certain payment types, like a max rate of payment with a hard income cut-off, fail the goals. However, these goals are not sufficient to determine exactly what the payment function should be and, in this case, some of the goalset-satisfying functions are clearly not ideal (e.g. “Pay everyone a million dollars each week”): Lots of different payment functions satisfy both goals, so choosing the winner from that set of functions can’t be based on those goals. The goals do not need to be changed; they’re aligned, so removing them can’t be beneficial, but we need more goals before the best payment function can be derived.
Besides alignment, there are other properties of plans that we’re interested in. You don’t want your plan to be improved if you remove subplans from it: If you can get a better forecast by simply removing some subset of your plan, you should do so (at least temporarily). To state this property, we should first define the set of forecasts where you attempt any subset of a given plan:
Subset-dominant plan: . Your plan’s forecast is at least as good as all the forecasts where you attempt a subset of your plan.
In other words, each subset of your plan has a non-negative marginal contribution to your moral ranking of the plan’s forecast. Subset dominance is a necessary condition for an ideal plan. And it leads us to a symmetrical property: plans that can’t be improved by including more subplans.
Superset-dominant plan: . Your plan’s forecast is at least as good as all the forecasts where you attempt a superset of your plan.
Superset dominance is worth mentioning for a few reasons: (1) it can be used to define idealness, i.e. a plan is ideal if and only if it is aligned and superset dominant, and (2) it can be used to define local optimality, i.e. a plan is locally optimal if and only if it is superset and subset dominant, which is useful to know about so that you don’t confuse it with idealness.
Locally optimal: . Your plan’s forecast is at least as good as all the forecasts where you attempt a subset or a superset of your plan.
People often confuse ideal plans with locally optimal ones. Let’s suppose that longer prison sentences cause inmates to be more likely to reoffend. Should we decrease sentence lengths? Not necessarily. In an ideal forecast, prisons probably rehabilitate their inmates, in which case, maybe ideal forecasts have longer prison sentences. Ideal systems don’t necessarily do the greatest good at the margin under all circumstances.
Then what is goal and plan alignment for? Shouldn’t you just generate some plans and choose one that results in a forecast selected by your moral rule? Yes, but if an ideal system is not the best at the margin, you can expand the set of things you want to reform until the set of systems you’re considering only does the greatest good at the margin when they are ideal. This way, you do not get stuck with locally optimal, non-ideal plans. Usually this isn’t a problem since ideal systems tend to produce forecasts that your moral rule ranks higher. For example, suppose you’re implementing a retirement savings system. If you think “maximizing returns” is an ideal goal, then some system that maximizes returns will probably do the most good. But ideal policy options are not always available. There are problems that, so far, lack ideal solutions but they have systems that seem to do well enough for the time being. So, I think it makes sense to divide your plan into a misaligned (but satisfactory) partition and an aligned partition that you amend over time.
The properties of plans can be thought of in terms of a moral landscape (termed by Sam Harris, under a different interpretation). Imagine a mountain range with many valleys and peaks. The higher the point on the territory, the better that place is. All peaks are locally optimal. Some peaks are ideal because they are second to none. Superset dominance means you can’t get higher by any combination of exclusively forward steps (i.e. amending subplans). Subset dominance means you can’t get higher by any combination of exclusively backward steps (i.e. removing subplans). Plan alignment means there’s some combination of exclusively forward steps that lead you to the highest peak.
Establishable: and . Your plan can become ideal by adding subplans without removing any, and cannot be improved by removing subplans without adding any.
Figure 4: Relationships between plan types. (The smallest convex shape that encapsulates a word is the set to which the word refers. E.g. according to the diagram, “Aligned” must refer to the entire left circle, so ideal and establishable plans must also be aligned).
The law of round numbers: 100 percent of the time, they’re wrong
Misaligned goals often contain numeric targets, for example aiming for rice to be between $3 and $4 per kilogram. One way to avoid picking numbers that are arbitrary (i.e. numbers that are not well justified) is to not select numbers at all. Instead, select a system whose goal implies that it will determine your optimal numbers for you. For example, rather than select a price for rice out of thin air, you could let a Pareto-efficient market determine prices (if you consider the goal of Pareto efficiency to be aligned). Note that markets have different prices for the exact same good at the same time (due to things like transportation costs), so having a goal for a universal rice price is the wrong approach from the outset. If a single number is the right approach, arbitrary numeric goals are not aligned when the goal’s specified range does not contain the ideal value. And even if you pick the right number upon the system’s implementation, the ideal value might move out of the range at a later time. Arbitrary numeric goals almost always seem to end in a zero or a five. People’s skepticism apparently disengages at the sight of a round number. For an example, a call for a fixed 15 percent fee on all student loans resulted in zero challenges in the comment section. If they had proposed 14.97 percent instead, I imagine many of the comments would be asking where they got the specific number from. Of course, round numbers aren’t wrong 100 percent of the time. But when round numbers for systems are proposed, you’ll tend to find that it’s not clear why increasing or decreasing those numbers by a small amount would lead to a worse system. This means you have no reason to believe these numbers are ideal.
Goalsets are personal
A person’s goalset is for their moral rule, not for yours, not for mine, and not for some hypothetical representative of humanity. Each person has a different idea of what is best for society. So arguing that someone’s proposed system is not aligned with your moral rule won’t change that person’s mind (though it could convince others who share your moral rule). Your moral preferences can affect someone else’s policy choices, but not beyond what is already incorporated into their moral rule. The valid modification of this argument is to try to change someone’s moral rule by showing them implications of their rule that (you expect) they will find troubling.
Identifying and refuting unsound feasibility arguments
During my government work in the Farm Household Allowance team, I demonstrated that our welfare payment could avoid the problems of the old and new payment functions, which the rest of the team thought was infeasible. The old payment function failed to ensure it was always in the interests of eligible customers to apply; the new payment function, a max payment rate with a hard income cut-off, failed to ensure that customers would always be better off if they earned more private income. I represented these goals mathematically and showed both goals were satisfied by my proposed system. Someone outside the team said that “Maybe we also want to pay the max rate of payment for farmers with higher income, so that’s why we can’t satisfy both goals”, which is an infeasibility argument. Rather than go down the rabbit hole of moral argument, I simply showed there was a function that satisfied both goals while still paying the max rate to everyone who currently received it.
You can, in a roundabout way, satisfy vague goals
For prisons, some people think that we should “Increase inmate wellbeing”. Okay, to what level? We probably shouldn’t maximize it, because it’s possible that doing so comes at the expense of total societal wellbeing. So how to we figure out the optimal amount of inmate wellbeing? What proxies do we have to measure it? The problem is that inmate wellbeing is hard to “goalify”. But this doesn’t mean we can’t have higher inmate wellbeing. A prison policy that maximizes the total societal contribution has better inmate wellbeing as a “consequence” for several reasons: (1) Societal contribution includes the crimes that occur within the prison, so prisons want to prevent their inmates from committing crimes against each other, and (2) societal contribution is probably maximized when former inmates integrate into society, so prisons want to their inmates to learn new skills and treat any psychological issues they have. When you can’t properly express a goal, select another goal that should have the consequences you want.
Originally posted August 21,2020 4:51 PM AEST.
Edited October 11, 2020 3:11 AM AEDT
Since our priors can be wrong, rather than say we prefer over , we should be assigning credences to the three possibilities: , , or . This representation is equivalent to the Parliamentary Model if we gave a credence to every possible moral rule. Alternatively, you might want to have a distribution over cardinal differences in your moral valuations (e.g. “I give a 3.0 percent credence that is 1.4 times as good as ”). ↩︎
As a side note, you could modify this set of tools to talk about goal alignment between multiple pre-existing agents. Start with a set of agents, , and have an empirical function that takes all those agents’ plans as exogenous, i.e. , and then go from there. ↩︎