Consider the following degenerate case: there is only one decision to be made, and your competing theories assess it as follows.
And suppose you find theory 2 just slightly more probable than theory 1.
Then it seems like any parliamentary model is going to say that theory 2 wins, and you choose option A. That seems like a bad outcome.
Accordingly, I suggest that to arrive at a workable parliamentary model we need to do at least one of the following:
As you might gather, I find the last option the most promising.
Yes, I think we need something like this veil of ignorance approach.
In a paper (preprint) with Ord and MacAskill we prove that for similar procedures, you end up with cyclical preferences across choice situations if you try to decide after you know the choice situation. The parliamentary model isn't quite within the scope of the proof, but I think more or less the same proof works. I'll try to sketch it.
Suppose:
Then in a decision between A and B there is no scope for negotiation, so as two of the theories prefer A the parliament will. Similarly in a choice between B and C the parliament will prefer B, and in a choice between C and A the parliament will prefer A.
My reading of the problem is that a satisfactory Parliamentary Model should:
Since bargaining in good faith appears to be the core feature, my mind immediately goes to models of bargaining under complete information rather than voting. What are the pros and cons of starting with the Nash bargaining solution as implemented by an alternating offer game?
The two obvious issues are how to translate delegate's preferences into utilities and what the disagreement point is. Assuming a utility function is fairly mild if the delegate has preferences over lotteries. Plus,there's no utility comparison problem even though you need cardinal utilities. The lack of a natural disagreement point is trickier. What intuitions might be lost going this route?
In order to get a better handle on the problem, I’d like to try walking through the mechanics of a how a vote by moral parliament might work. I don’t claim to be doing anything new here, I just want to describe the parliament in more detail to make sure I understand it, and so that it’s easier to reason about.
Here's the setup I have in mind:
Each MP wants to maximize the utility of the results according to their own scores, and they can engage in negotiation before the voting starts to accomplish this.
Does this seem to others like a reasonable description of how the parliamentary vote might work? Any suggestions for improvements to the descr...
In Ideal Advisor Theories and Personal CEV, my co-author and I describe a particular (but still imprecisely specified) version of the parliamentary approach:
we determine the personal CEV of an agent by simulating multiple versions of them, extrapolated from various starting times and along different developmental paths. Some of these versions are then assigned to a parliament where they vote on various choices and make trades with one another.
We then very briefly argue that this kind of approach can overcome some objections to parliamentary models (and similar theories) made by philosopher David Sobel.
The paper is short and non-technical, but still manages to summarize some concerns that we'll likely want a formalized parliamentary model to overcome or sidestep.
...It seems that specifying the delegates' informational situation creates a dilemma.
As you write above, we should take the delegates to think that Parliament's decision is a stochastic variable such that the probability of the Parliament taking action A is proportional to the fraction of votes for A, to avoid giving the majority bloc absolute power.
However, your suggestion generates its own problems (as long as we take the parliament to go with the option with the most votes):
Suppose an issue The Parliament votes on involves options A1, A2, ..., An
We discussed this issue at the two MIRIx Boston workshops. A big problem with parliamentary models which we were unable to solve, was what we've been calling ensemble stability. The issue is this: suppose your AI's value system is made from a collection of value systems in a voting-like system, is constructing a successor, more powerful AI, and is considering constructing the successor so that it represents only a subset of the original value systems. Each value system which is represented will be in favor; each value system which is not represented, will ...
It seems to me that if we're going to be formalizing the idea of the relative "moral importance" of various courses of action to different moral theories, we'll end up having to use something like utility functions. It's unfortunate, then, that deontological rules (which are pretty common) can't be specified with finite utility functions because of the timelessness issue (i.e., a deontologist who doesn't lie won't lie even if doing so would prevent them from being forced to tell ten lies in the future).
...To me it looks like the main issues are in configuring the "delegates" so that they don't "negotiate" quite like real agents - for example, there's no delegate that will threaten to adopt an extremely negative policy in order to gain negotiating leverage over other delegates.
The part where we talk about these negotiations seems to me like the main pressure point on the moral theory qua moral theory - can we point to a form of negotiation that is isomorphic to the "right answer", rather than just being an aw
One route towards analysing this would be to identify a unit of currency which was held in roughly equal value by all delegates (at least at the margin), so that we can analyse how much they value other things in terms of this unit of currency -- this could lead to market prices for things (?).
Perhaps a natural choice for a currency unit would be something like 'unit of total say in the parliament'. So for example a 1% chance that things go the way of your theory, applied before whatever else would happen.
I'm not sure if this could even work, just throwing it out there.
Is there some way to rephrase this without bothering with the parliament analogy at all? For example, how about just having each moral theory assign the available actions a "goodness number" (basically expected utility). Normalize the goodness numbers somehow, then just take the weighted average across moral theories to decide what to do.
If we normalize by dividing each moral theory's answers by its biggest-magnitude answer, (only closed sets of actions allowed :) ) I think this regenerates the described behavior, though I'm not sure. Obviously this cuts out "human-ish" behavior of parliament members, but I think that's a feature, since they don't exist.
Any parliamentary model will involve voting.
When voting arrows impossibly theorm is going to impose constraints that can't be avoided http://en.m.wikipedia.org/wiki/Arrow's_impossibility_theorem
In particular it is impossible to have all of the below
If every voter prefers alternative X over alternative Y, then the group prefers X over Y. If every voter's preference between X and Y remains unchanged, then the group's preference between X and Y will also remain unchanged (even if voters' preferences between other pairs like X and Z, Y and Z, or Z and W chang...
I was thinking last night of how vote trading would work in a completely rational parliamentary system. To simplify things a bit, lets assume that each issue is binary, each delegate holds a position on every issue, and that position can be normalized to a 0.0 - 1.0 ranking. (e.g. If I have a 60% belief that I will gain 10 utility from this issue being approved, it may have a normalized score of .6, if it is a 100% belief that I will gain 10 utility it may be a .7, while a 40% chance of -1000 utility may be a .1) The mapping function doesn't really matt...
Can MPs have unknown utility functions? For example, I might have a relatively low confidence in all explicitly formulated moral theories, and want to give a number of MPs to System 1 - but I don't know in advance how System 1 will vote. Is that problem outside the scope of the parliamentary model (i.e., I can't nominate MPs who don't "know" how they will vote)?
Can MPs have undecidable preference orderings (or sub-orderings)? E.g., such an MP might have some moral axioms that provide orderings for some bills but not others.
Thanks to ESrogs, Stefan_Schubert, and the Effective Altruism summit for the discussion that led to this post!
This post is to test out Polymath-style collaboration on LW. The problem we've chosen to try is formalizing and analyzing Bostrom and Ord's "Parliamentary Model" for dealing with moral uncertainty.
I'll first review the Parliamentary Model, then give some of Polymath's style suggestions, and finally suggest some directions that the conversation could take.
The Parliamentary Model is an under-specified method of dealing with moral uncertainty, proposed in 2009 by Nick Bostrom and Toby Ord. Reposting Nick's summary from Overcoming Bias:
Suppose that you have a set of mutually exclusive moral theories, and that you assign each of these some probability. Now imagine that each of these theories gets to send some number of delegates to The Parliament. The number of delegates each theory gets to send is proportional to the probability of the theory. Then the delegates bargain with one another for support on various issues; and the Parliament reaches a decision by the delegates voting. What you should do is act according to the decisions of this imaginary Parliament. (Actually, we use an extra trick here: we imagine that the delegates act as if the Parliament's decision were a stochastic variable such that the probability of the Parliament taking action A is proportional to the fraction of votes for A. This has the effect of eliminating the artificial 50% threshold that otherwise gives a majority bloc absolute power. Yet – unbeknownst to the delegates – the Parliament always takes whatever action got the most votes: this way we avoid paying the cost of the randomization!)
The idea here is that moral theories get more influence the more probable they are; yet even a relatively weak theory can still get its way on some issues that the theory think are extremely important by sacrificing its influence on other issues that other theories deem more important. For example, suppose you assign 10% probability to total utilitarianism and 90% to moral egoism (just to illustrate the principle). Then the Parliament would mostly take actions that maximize egoistic satisfaction; however it would make some concessions to utilitarianism on issues that utilitarianism thinks is especially important. In this example, the person might donate some portion of their income to existential risks research and otherwise live completely selfishly.
I think there might be wisdom in this model. It avoids the dangerous and unstable extremism that would result from letting one’s current favorite moral theory completely dictate action, while still allowing the aggressive pursuit of some non-commonsensical high-leverage strategies so long as they don’t infringe too much on what other major moral theories deem centrally important.
In a comment, Bostrom continues:
there are a number of known issues with various voting systems, and this is the reason I say our model is imprecise and under-determined. But we have some quite substantial intuitions and insights into how actual parliaments work so it is not a complete black box. For example, we can see that, other things equal, views that have more delegates tend to exert greater influence on the outcome, etc. There are some features of actual parliaments that we want to postulate away. The fake randomization step is one postulate. We also think we want to stipulate that the imaginary parliamentarians should not engage in blackmail etc. but we don't have a full specification of this. Also, we have not defined the rule by which the agenda is set. So it is far from a complete formal model.
It's an interesting idea, but clearly there are a lot of details to work out. Can we formally specify the kinds of negotiation that delegates can engage in? What about blackmail or prisoners' dilemmas between delegates? It what ways does this proposed method outperform other ways of dealing with moral uncertainty?
I was discussing this with ESRogs and Stefan_Schubert at the Effective Altruism summit, and we thought it might be fun to throw the question open to LessWrong. In particular, we thought it'd be a good test problem for a Polymath-project-style approach.
The Polymath comment style suggestions are not so different from LW's, but numbers 5 and 6 are particularly important. In essence, they point out that the idea of a Polymath project is to split up the work into minimal chunks among participants, and to get most of the thinking to occur in comment threads. This is as opposed to a process in which one community member goes off for a week, meditates deeply on the problem, and produces a complete solution by themselves. Polymath rules 5 and 6 are instructive:
5. If you are planning to think about some aspect of the problem offline for an extended length of time, let the rest of us know. A polymath project is supposed to be more than the sum of its individual contributors; the insights that you have are supposed to be shared amongst all of us, not kept in isolation until you have resolved all the difficulties by yourself. It will undoubtedly be the case, especially in the later stages of a polymath project, that the best way to achieve progress is for one of the participants to do some deep thought or extensive computation away from the blog, but to keep in the spirit of the polymath project, it would be good if you could let us know that you are doing this, and to update us on whatever progress you make (or fail to make). It may well be that another participant may have a suggestion that could save you some effort.
6. An ideal polymath research comment should represent a "quantum of progress". On the one hand, it should contain a non-trivial new insight (which can include negative insights, such as pointing out that a particular approach to the problem has some specific difficulty), but on the other hand it should not be a complex piece of mathematics that the other participants will have trouble absorbing. (This principle underlies many of the preceding guidelines.) Basically, once your thought processes reach a point where one could efficiently hand the baton on to another participant, that would be a good time to describe what you’ve just realised on the blog.
It seems to us as well that an important part of the Polymath style is to have fun together and to use the principle of charity liberally, so as to create a space in which people can safely be wrong, point out flaws, and build up a better picture together.
If you're still reading, then I hope you're interested in giving this a try. The overall goal is to clarify and formalize the Parliamentary Model, and to analyze its strengths and weaknesses relative to other ways of dealing with moral uncertainty. Here are the three most promising questions we came up with:
The original OB post had a couple of comments that I thought were worth reproducing here, in case they spark discussion, so I've posted them.
Finally, if you have meta-level comments on the project as a whole instead of Polymath-style comments that aim to clarify or solve the problem, please reply in the meta-comments thread.