Barring a major collapse of human civilization (due to nuclear war, asteroid impact, etc.), many experts expect the intelligence explosion Singularity to occur within 50-200 years.

That fact means that many philosophical problems, about which philosophers have argued for millennia, are suddenly very urgent.

Those concerned with the fate of the galaxy must say to the philosophers: "Too slow! Stop screwing around with transcendental ethics and qualitative epistemologies! Start thinking with the precision of an AI researcher and solve these problems!"

If a near-future AI will determine the fate of the galaxy, we need to figure out what values we ought to give it. Should it ensure animal welfare? Is growing the human population a good thing?

But those are questions of applied ethics. More fundamental are the questions about which normative ethics to give the AI: How would the AI decide if animal welfare or large human populations were good? What rulebook should it use to answer novel moral questions that arise in the future?

But even more fundamental are the questions of meta-ethics. What do moral terms mean? Do moral facts exist? What justifies one normative rulebook over the other?

The answers to these meta-ethical questions will determine the answers to the questions of normative ethics, which, if we are successful in planning the intelligence explosion, will determine the fate of the galaxy.

Eliezer Yudkowsky has put forward one meta-ethical theory, which informs his plan for Friendly AI: Coherent Extrapolated Volition. But what if that meta-ethical theory is wrong? The galaxy is at stake.

Princeton philosopher Richard Chappell worries about how Eliezer's meta-ethical theory depends on rigid designation, which in this context may amount to something like a semantic "trick." Previously and independently, an Oxford philosopher expressed the same worry to me in private.

Eliezer's theory also employs something like the method of reflective equilibrium, about which there are many grave concerns from Eliezer's fellow naturalists, including Richard Brandt, Richard Hare, Robert Cummins, Stephen Stich, and others.

My point is not to beat up on Eliezer's meta-ethical views. I don't even know if they're wrong. Eliezer is wickedly smart. He is highly trained in the skills of overcoming biases and properly proportioning beliefs to the evidence. He thinks with the precision of an AI researcher. In my opinion, that gives him large advantages over most philosophers. When Eliezer states and defends a particular view, I take that as significant Bayesian evidence for reforming my beliefs.

Rather, my point is that we need lots of smart people working on these meta-ethical questions. We need to solve these problems, and quickly. The universe will not wait for the pace of traditional philosophy to catch up.

New Comment
252 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I think Eliezer's meta-ethics is wrong because it's possible that we live in a world where Eliezer's "right" doesn't actually designate anything. That is, where a typical human's morality, when extrapolated, fails to be coherent. "Right" should still mean something in a world like that, but it doesn't under Eliezer's theory.

Also, to jump the gun a bit, your own meta-ethics, desirism, says:

Thus, morality is the practice of shaping malleable desires: promoting desires that tend to fulfill other desires, and discouraging desires that tend to thwart other desires.

What does this mean in the FAI context? To a super-intelligent AI, it's own desires, as well as those of everyone else on Earth, can be considered "malleable", in the sense that it can change all of them if it wanted to. But there might be some other super-intelligent AIs (created by aliens) whose desires it is powerless to change. I hope desirism doesn't imply that it should change my desires so as to fulfill the alien AIs' desires...

8Eliezer Yudkowsky
What should it mean in a world like that?

I haven't found a satisfactory meta-ethics yet, so I still don't know. But whatever the answer is, it has to be at least as good as "my current (unextrapolated) preferences". "Nothing" is worse than that, so it can't be the correct answer.

This is actually a useful way of looking at what metaethics (decision theory) is: tools for self-improvement, explaining specific ways in which correctness of actions (or correctness of other tools of the same kind) can be judged. In this sense, useless metaethics is one that doesn't help you with determining what should be done, and wrong metaethics is one that's actively stupid, suggesting you to do things that you clearly shouldn't (for FAI based on that metaethics, correspondingly doing things that it shouldn't). In this sense, the injunction of doing nothing in response to failed assumptions (i.e. no coherence actually present) in CEV is not stupid, since your own non-extrapolated mind is all you'll end up with in case CEV shuts down. It is a contingency plan for the case it turns out to be useless. (I find new obvious things everywhere after the recent realization that any explicit consideration an agent knows is subject to whole agent's judgment, even "preference" or "logical correctness". This also explains a bit of our talking past each other in the other thread.)
7Wei Dai
I don't have much idea what you mean here. This seems important enough to write up as more than a parenthetical remark.
I spent a lot of time laboring under the intuition that there's some "preference" thingie that summarizes all we care about, that we can "extract" from (define using a reference to) people and have an AI optimize it. In the lingo of meta-ethics, that would be "right" or "morality", and it distanced itself from the overly specific "utility" that also has the disadvantage of forgetting that prior is essential. Then, over the last few months, as I was capitalizing on finally understanding UDT in May 2010 (despite having convinced a lot of people that I understood it long before that, I completely failed to get the essential aspect of controlling the referents of fixed definitions, and only recognized in retrospect that what I figured out by that time was actually UDT), I noticed that a decision problem requires many more essential parts than just preference, and so to specify what people care about, we need a whole human decision problem. But the intuition that linked to preference in particular, which was by then merely a part of the decision problem, still lingered, and so I failed to notice that now not preference, but the whole decision problem, is analogous to "right" and "morality" (but not quite, since that decision problem still won't be the definition of right, it can be judged in turn), and the whole agent that implements such decision problem is the best tool available to judge them. This agent, in particular, can find itself judging its own preference, or its own inference system, or its whole architecture that might or might not specify an explicit inference system as its part, and so on. Whatever explicit consideration it's moved by, that is whatever module in the agent (decision problem) it considers, there's a decision problem of self-improvement where the agent replaces that module with something else, and things other than that module can have a hand in deciding. Also, there's little point in distinguishing "decision problem" and "agent", even thou

it follows that no human can know what they care about

This sounds weird, like you've driven off a cliff or something. A human mind is a computer of finite complexity. If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty which may or may not be reduced by applying powerful math. Or do I misunderstand you? Maybe the following two questions will help clarify things:

a) Can a paperclipper know what it cares about?

b) How is a human fundamentally different from a paperclipper with respect to (a)?

Hence "explicit considerations", that is not up to logical uncertainty. Also, you need to know that you care about logic to talk of "up to logical uncertainty" as getting you closer to what you want. Similarly (unhelpfully), everyone knows what they should do up to moral uncertainty. No, at least while it's still an agent in the same sense, so that it still has the problem of self-improvement on its hands, and hasn't disassembled itself into actual paperclips. For a human, its philosophy of precise reasoning about paperclips won't look like an adequate activity to spend resources on, but for the paperclipper, understanding paperclips really well is important.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality? I doubt it. ETA: I defy the possibility that we may "not care about logic" in the sense that you suggest.
(Not "morality" here, of course, but its counterpart in the analogy.) What is to guide its self-improvement? How is it to best convert the Sun into more computing machinery, in the face of logical uncertainty about consequences of such an action? What is meant by "actually proving it"? Does quantum suicide count as a method for achieving its goal? When should it risk performing an action in the environment, given that it could damage its own hardware as a result? When should it risk improving its inference system, given that there's a risk that this improvement will turn out to increase the time necessary to perform the proof, perhaps even eventually leading to moving this time outside what's physically available in our universe? Heuristics everywhere, no easy methods for deciding what should be done.
In a decision between what's logical and what's right, you ought to choose what's right.
If you can summarize your reasons for thinking that's actually a conflict that can arise for me, I'd be very interested in them.
Consider a possible self-improvement that changes your inference system in such a way that it (1) becomes significantly more efficient at inferring the kinds of facts that help you with making right decisions, and (2) obtains an additional tiny chance of being inconsistent. If all you care about is correctness, then notice that implementing this self-improvement will make you less correct, will increase the probability that you'll produce incorrect inferences in the future. On the other hand, expected utility of this decision argues that you should take it. This is a conflict, resolved either by self-improving or not.
That's fair. Yes, agreed that this is a decision between maximizing my odds of being logical and maximizing my odds of being right, which is a legitimate example of the conflict you implied. And I guess I agree that if being right has high utility then it's best to choose what's right. Thanks.
Seeking high utility is right (and following rules of logic is right), not the other way around. "Right" is the unreachable standard by which things should be, which "utility" is merely a heuristic for representation of.
It isn't clear to me what that statement, or its negation, actually implies about the world. But I certainly don't think it's false.
2Wei Dai
I'm generally sympathetic towards these intuitions, but I have a few reservations: 1. Isn't it possible that it only looks like "heuristics all the way down" because we haven't dug deep enough yet? Perhaps in the not too distant future, someone will come up with some insights that will make everything clear, and we can just implement that. 2. What is the nature of morality according to your approach? You say that a human can't know what they care about (which I assume you use interchangeably with "right", correct me if I'm wrong here). Is it because they can't, in principle, fully unfold the logical definition of right, or is it that they can't even define "right" in any precise way? 3. This part assumes that your answer to the last question is "the latter". Usually when someone says "heuristic" they have a fully precise theory or problem statement that the heuristic is supposed to be an approximate solution to. How is an agent supposed to design a set of heuristics without a such a precise definition to guide it? Also, if the agent itself uses the words "morality" or "right", what do they refer to? 4. If the answer to the question in 2 is "the former", do you have any idea what the precise definition of "right" looks like?
Everything's possible, but doesn't seem plausible at this point, and certainly not at human level. To conclude that something is not a heuristic, but the thing itself, one would need too much certainty to be expected of such a question. I did use that interchangeably. Both (the latter). Having an explicit definition would correspond to "preference" which I discussed in the grandparent comment. But when we talk of merely "precise", at least in principle we could hope to obtain a significantly more precise description, maybe even on human level, which is what meta-ethics should strive to give us. Every useful heuristic is an element of such a description, and some of the heuristics, such as laws of physics, are very precise. The current heuristics, its current implementation, which is understood to be fallible. Don't know (knowing would give a definition). To the extent it's known, see the current heuristics (long list), maybe brains.
3Wei Dai
Essentially, what you're describing is just the situation that we are actually faced with. I mean, when I use the word "right" I think I mean something but I don't know what. And I have to use my current heuristics, my current implementation without having a precise theory to guide me. And you're saying that this situation is unlikely to change significantly by the time we build an FAI, so the best we can expect to do is equivalent to a group of uploads improving themselves to the best of their abilities. I tend to agree with this (although I think I assign a higher probability that someone does make a breakthrough than you perhaps do), but it doesn't really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.
I'm glad it all adds up to normality, given the amount of ink I spilled getting to this point. Not necessarily. The uploads construct could in principle be made abstract, with efficient algorithms figuring out the result of the process much quickly than if it's actually simulated. More specific heuristics could be figured out that make use of computational resources to make better progress, maybe on early stages by the uploads construct. I'm not sure about that. If it's indeed all we can say about morality right now, then that's what we have to say, even if it doesn't belong to the expected literary genre. It's too easy to invent fake explanations, and absence of conclusions invites that, where a negative conclusion could focus the effort elsewhere. (Also, I don't remember particular points on which my current view disagrees with Eliezer's sequence, although I'd need to re-read it to have a better idea, which I really should, since I only read it as it was posted, when my understanding of the area was zilch.)
I second this request. In particular, please clarify whether "preference" and "logical correctness" are presented here as examples of "explicit considerations". And whether whole agent should be parsed as including all the sub-agents? Or perhaps as extrapolated agent?
Perhaps he's refering to the part of CEV that says "extrapolated as we wish that extrapolated, interpreted as we wish that interpreted". Even logical coherence becomes in this way a focus of extrapolation dynamics, and if this criterion should be changed to something else - as judged by the whole of our extrapolated morality in a strange-loopy way - well, so be it. The dynamics should reflect on itself and consider the foundational assumptions it was built upon, including the compelingness of basic logic we are currently so certain about - and of course, if it really should reflect on itself in this way. Anyway, I'd really like to hear what Vladimir has to say about this. Even though it's often quite hard for me to parse his writings, he does seem to clear things up for me or at least direct my attention towards some new, unexplored areas...
...and continuing from the other comment, the problem here is that one meta-ethical conclusion seems to be that no meta-ethics can actually define what "right" is. So any meta-ethics would only pour a limited amount of light on the question, and is expected to have failure modes, where the structure of the theory is not quite right. It's a virtue of a meta-ethical theory to point out explicitly some of its assumptions, which, if not right, would make the advice it gives incorrect. In this case, we have an assumption of reflective coherence in human value, and a meta-ethics that said that if it's not so, then it doesn't know anything. I'm pretty sure that Eliezer would disagree with the assertion that if any given meta-ethics, including some version of his own, would state that the notion of "right" is empty, then "right" is indeed empty (see "the moral void").
Better - for you, maybe. Under your hypothesis, what is good for you would be bad for others - so unless your meta-ethical system privileges you, this line of argument doesn't seem to follow.
Wei_Dai, Alonzo Fyfe and I are currently researching and writing a podcast on desirism, and we'll eventually cover this topic. The most important thing to note right now is that desirism is set up as a theory that explains things very specific things: human moral concepts like negligence, excuse, mens rea, and a dozen other things. You can still take the foundational meta-ethical principles of desirism - which are certainly not unique to desirism - and come up with implications for FAI. But they may have little in common with the bulk of desirism that Alonzo usually talks about. But I'm not trying to avoid your question. These days, I'm inclined to do meta-ethics without using moral terms at all. Moral terms are so confused, and carry such heavy connotational weights, that using moral terms is probably the worst way to talk about morality. I would rather just talk about reasons and motives and counterfactuals and utility functions and so on. Leaving out ethical terms, what implications do my own meta-ethical views have for Friendly AI? I don't know. I'm still catching up with the existing literature on Friendly AI.
2Wei Dai
What are the foundational meta-ethical principles of desirism? Do you have a link?
Hard to explain. Alonzo Fyfe and I are currently developing a structured and technical presentation of the theory, so what you're asking for is coming but may not be ready for many months. It's a reasons-internalist view, and actually I'm not sure how much of the rest of it would be relevant to FAI.
In what way? Since the idea hasn't been given much technical clarity, even if it moves conceptual understanding a long way, it's hard for me to imagine how one can arrive at confidence in a strong statement like that.
4Wei Dai
I'm not sure what you're asking. Are you asking how it is possible that Eliezer's "right" doesn't designate anything, or how that implies Eliezer's meta-ethics is wrong?
I'm asking (1) how is it possible that Eliezer's "right" doesn't designate anything, and (2) how could you arrive at such a strong conclusion based on his non-technical writings, since he could just mean something different, or could have insufficient precision in his own idea to determine this property (this is a meta-point possibly subsumed by the first point).

how is it possible that Eliezer's "right" doesn't designate anything

Eliezer identifies "right" with "the ideal morality that I would have if I heard all the arguments, to whatever extent such an extrapolation is coherent." It is possible that human morality, when extrapolated, shows no coherence, in which case Eliezer's "right" doesn't designate anything.

how could you arrive at such a strong conclusion based on his non-technical writings, since he could just mean something different, or could have insufficient precision in his own idea to determine this property

Are you saying that Eliezer's general approach might still turn out to be correct, if we substitute better definitions or understandings of "extrapolation" and/or "coherence"? If so, I agree, and I didn't mean to exclude this possibility with my original statement. Should I have made it clearer when I said "I think Eliezer's meta-ethics is wrong" that I meant "based on my understanding of Eliezer's current ideas"?

For example, I have no idea what this means. I don't know what "extrapolated" means, apart from some vague intuitions, and even what "coherent" means. Better than what? I have no specific adequate candidates, only a direction of research.
3Wei Dai
Did you read the thread I linked to in my opening comment, where Marcello and I argued in more detail why we think that? Perhaps we can move the discussion there, so you can point out where you disagree with or not understand us?
To respond to that particular argument, which I don't see how substantiates the point that morality according to Eliezer's meta-ethics could be void. When you're considering what a human mind would conclude upon considering certain new arguments, you're thinking of ways to improve it. A natural heuristic is to add opportunity for reflection, but obviously exposing one to "unbalanced" argument can lead a human mind anywhere. So you suggest a heuristic of looking for areas of "coherence" in conclusions reached upon exploration of different ways of reflecting. But this "coherence" is also merely a heuristic. What you want is to improve the mind in the right way, not in coherent way, or balanced way. So you let the mind reflect on strategies for exposing itself to more reflection, and then on strategies for reflecting on reflecting on strategies for getting more reflection, and so on, in any way deemed appropriate by the current implementation. There's probably no escaping this unguided stage, for the most right guide available is the agent itself (unfortunately). What you end up with won't have opportunity to "regret" past mistakes, for every regret is recognition of an error, and any error can be corrected (for the most part). What's wrong with "incoherent" future growth? Does lack of coherence indicate a particular error, something not done right? If it does, that could be corrected. If it doesn't, everything is fine. (By the way, this argument could potentially place advanced human rationality and human understanding of decision theory and meta-ethics directly on track to a FAI, with the only way of making a FAI using a human (upload) group self-improvement.)
5Wei Dai
I believe that in Eliezer's meta-ethics, both the extrapolation procedure and the coherence property are to be given fixed logical definitions as part of the meta-ethics, and are not just "heuristics" to be freely chosen by the subject being extrapolated. You seem to be describing your own ideas, which are perhaps similar enough to Eliezer's to be said to fall under his general approach, but I don't think can be said to be Eliezer's meta-ethics. Seems like a reasonable idea, but again, almost surely not what Eliezer intended.
Why "part of meta-ethics"? That would make sense as part of FAI design. Surely the details are not to be chosen "freely", but still there's only one criterion for anything, and that's full morality. For any fixed logical definition, any element of any design, there's a question of what could improve it, make the consequences better.
4Wei Dai
I think because Eliezer wanted to ensure a good chance that right_Eliezer and right_random_human turn out to be very similar. If you let each person choose how to extrapolate using their own current ideas, you're almost certainly going to end up with very different extrapolated moralities.
The point is not that they'll be different, but that mistakes will be made, making the result not quite right, or more likely not right at all. So on the early stage, one must be very careful, develop a reliable theory of how to proceed instead of just doing stuff at random, or rather according to current human heuristics. Extended amount of reflection looks like one least invasive self-improvement technique, something that's expected to make you more reliably right, especially if you're given opportunity to decide how the process is to be set up. This could get us to the next stage, and so on. More invasive heuristics can prove too disruptive, wrong in unexpected and poorly-understood ways, so that one won't be able to expect the right outcome without close oversight from a moral judgment, which we don't have in any technically strong enough form as of yet.
6Wei Dai
Suppose you have the intuition that extended reflection and coherence are good heuristics to guide your extrapolation. I, on the other hand, think that extended reflection as a base human is dangerous, and coherence has nothing to do with what's right. I'd rather that the extrapolated me experiment with self-modification after only a moderate amount of theorizing, and at the end merge with its counter-factual versions through acausal negotiation. Suppose further that you end up in control of FAI design, and you want it to take my morality into account. Would you have it extrapolate me using your preferred method, or mine?
What these heuristics discuss are ways of using more resources. The resources themselves are heuristically assumed to be useful, and so we discuss how to use them best. (Now to slip to an object-level argument for a change.) Notice the "especially if you're given opportunity to decide how the process is to be set up" in my comment. I agree that unnaturaly extended reflection is dangerous, we might even run into physiological problems with computations in the brains that are too chronologically old. But 50 years is better that 6 months, even if both 50 years and 6 months are dangerous. And if you actually work on planning these reflection sessions, so that you can set up groups of humans to work for some time, then maybe resetting them and only having them pass their writings to new humans, filtering such findings using not-older-than-50 humans trained on more and more improved findings and so on. For most points you could raise with the reason it's dangerous, we could work on finding a solution for that problem. For any experiment with FAI design, we would be better off thinking about it first. Likewise, if you task 1000 groups of humans to work on coming up with possible strategies for using the next batch of computational resources (not for doing most good explicitly, but for developing even better heuristic understanding of the problem), and you use the model of human research groups as having a risk of falling into reflective death spirals where all members of a group can fall to memetic infection that gives no answers to the question they considered, then it seems like a good heuristic to place considerably less weight on suggestions that come up very rarely and don't get supported by some additional vetting process. For example, the first batches of research could focus on developing effective training programs in rationality, then in social engineering, voting schemes, and so on. Overall architecture of future human-level meta-ethics necessary for more dr
It means, for instance, that segments of the population who have different ideas on controversial moral questions like abortion or capital punishment actually have different moralities and different sets of values, and that we as a species will never agree on what answers are right, regardless of how much debate or discussion or additional information we have. I strongly believe this to be true.
Clearly, I know all this stuff, so I meant something else. Like not having more precise understanding (that could also easily collapse this surface philosophizing).
Well, yes, I know you know all this stuff. Are you saying we can't meaningfully discuss it unless we have a precise algorithmic definition of CEV? People's desires and values are not that precise. I suspect we can only discuss it in vague terms until we come up with some sort of iterative procedure that fits our intuition of what CEV should be, at which point we'll have to operationally define CEV as that procedure.
So if a system of ethics entails that "right" doesn't designate anything actual, you reject that system. Can you say more about why?
0Wei Dai
Does my answer to Eliezer answer your question as well?
I'm not sure. Projecting that answer onto my question I get something like "Because ethical systems in which "right" has an actual referent are better, for unspecified reasons, than ones in which it doesn't, and Wei Dai's current unextrapolated preferences involve an actual though unspecified referent for "right," so we can at the very least reject all systems where "right" doesn't designate anything actual in favor of the system Wei Dai's current unextrapolated preferences implement, even if nothing better ever comes along." Is that close enough to your answer?
0Wei Dai
Yes, close enough.
In that case, not really... what I was actually curious about is why "right" having a referent is important.
8Wei Dai
I can make the practical case: If "right" refers to nothing, and we design an FAI to do what is right, then it will do nothing. We want the FAI to do something instead of nothing, so "right" having a referent is important. Or the philosophical case: If "right" refers to nothing, then "it's right for me to save that child" would be equivalent to the null sentence. From introspection I think I must mean something non-empty when I say something like that. Do either of these answer your question?
Congratulations, you just solved the Fermi paradox.
(sigh) Sure, agreed... if our intention is to build an FAI to do what is right, it's important that "what is right" mean something. And I could ask why we should build an FAI that way, and you could tell me that that's what it means to be Friendly, and on and on. I'm not trying to be pedantic here, but this does seem sort of pointlessly circular... a discussion about words rather than things. When a Jewish theist says "God has commanded me to save that child," they may be entirely sincere, but that doesn't in and of itself constitute evidence that "God" has a referent, let alone that the referent of "God" (supposing it exists) actually so commanded them. When you say "It's right for me to save that child," the situation may be different, but the mere fact that you can utter that sentence with sincerity doesn't constitute evidence of difference. If we really want to save children, I would say we should talk about how most effectively to save children, and design our systems to save children, and that talking about whether God commanded us to save children or whether it's right to save children adds nothing of value to the process. More generally, if we actually knew everything we wanted, as individuals and groups, then we could talk about how most effectively to achieve that and design our FAIs to achieve that and discussions about whether it's right would seem as extraneous as discussions about discussions about whether it's God-willed. The problem is that we don't know what we want. So we attach labels to that-thing-we-don't-understand, and over time those labels adopt all kinds of connotations that make discussion difficult. The analogy to theism applies here as well. At some point, it becomes useful to discard those labels. A CEV-implementing FAI, supposing such a thing is possible, will do what we collectively want done, whatever that turns out to be. A FAI implementing some other strategy will do something else. Whether those things are right is just as
5Wei Dai
TheOtherDave, I don't really want to argue about whether talking about "right" adds value. I suspect it might (i.e., I'm not so confident as you that it doesn't), but mainly I was trying to argue with Eliezer on his own terms. I do want to correct this: CEV will not do "what we collectively want done", it will do what's "right" according to Eliezer's meta-ethics, which is whatever is coherent amongst the volitions it extrapolates from humanity, which as others and I have argued, might turn out to be "nothing". If you're proposing that we build an AI that does do "what we collectively want done", you'd have to define what that means first.
OK. The question I started out with, way at the top of the chain, was precisely about why having a referent for "right" was important, so I will drop that question and everything that descends from it. As for your correction, I actually don't understand the distinction you're drawing, but in any case I agree with you that it might turn out that human volition lacks a coherent core of any significance.
2Wei Dai
To me, "what we collectively want done" means somehow aggregating (for example, through voting or bargaining) our current preferences. It lacks the elements of extrapolation and coherence that are central to CEV.
Gotcha... that makes sense. Thanks for clarifying.
What is the source of criteria such as voting or bargaining that you suggest? Why polling everyone and not polling every prime-indexed citizen instead? It's always your judgment about what is the right thing to do.

So let's say that you go around saying that philosophy has suddenly been struck by a SERIOUS problem, as in lives are at stake, and philosophers don't seem to pay any attention. Not to the problem itself, at any rate, though some of them may seem annoyed at outsiders infringing on their territory, and nonplussed at the thought of their field trying to arrive at answers to questions where the proper procedure is to go on coming up with new arguments and respectfully disputing them with other people who think differently, thus ensuring a steady flow of papers for all.

Let us say that this is what happens; which of your current beliefs, which seem to lead you to expect something else to happen, would you update?

No, that is exactly what I expect to happen with more than 99% of all philosophers. But we already have David Chalmers arguing it may be a serious problem. We have Nick Bostrom and the people at Oxford's Future of Humanity Institute. We probably can expect some work on SIAI's core concerns from philosophy grad students we haven't yet heard from because they haven't published much, for example Nick Beckstead, whose interests are formal epistemology and the normative ethics of global catastrophic risks.

As you've said before, any philosophy that would be useful to you and SIAI is hard to find. But it's out there, in tiny piles, and more of it is coming.

The problems appear to be urgent, and in need of actual solutions, not simply further debate, but it's not at all clear to me that people who currently identify as philosophers are, as a group, those most suited to work on them.
I'm not saying they are 'most suited to work on them', either. But I think they can contribute. Do you think that Chalmers and Bostrom have not already contributed, in small ways?
Bostrom, yes, Chalmers, I have to admit that I haven't followed his work enough to issue an opinion.
At the risk of repeating myself, or worse, sounding like an organizational skills guru rambling on about win-win opportunities, might it not be possible to change the environment so that philosophers can do both - publish a steady flow of papers containing respectful disputation AND work on a serious problem?
I might be wrong here, but I wonder if at least some philosophers have a niggling little worry that they are wasting their considerable intellectual gifts (no, I don't think that all philosophers are stupid) on something useless. If these people exist they might be pleased rather than annoyed to hear that the problems they are thinking about were actually important, and this might spur them to rise to the challenge. This all sounds hideously optimistic of course, but it suggests a line of attack if we really do want their help.

I don't remember the specifics, and so don't have the terms to do a proper search, but I think I recall being taught in one course about a philosopher who, based on the culmination of all his own arguments on ethics, came to the conclusion that being a philosopher was useless, and thus changed careers.

I know of a philosopher who claimed to have finished a grand theory he was working on, concluded that all life was meaningless, and thus withdrew from society and lived on a boat for many years fishing to live and practicing lucid dreaming. His doctrine was that we can't control reality, so we might as well withdraw to dreams, where complete control can be exercised by the trained. I also remember reading about a philosopher who finished some sort of ultra-nihilist theory, concluded that life was indeed completely meaningless, and committed suicide-- getting wound up too tightly in a theory can be hazardous to your physical as well as epistemic health!
This doesn't automatically follow unless you first prove he was wrong =P

As a layman I'm still puzzled how the LW sequences do not fall into the category of philosophy. Bashing philosophy seems to be over the top, there is probably as much "useless" mathematics.

I think the problem is that philosophy has, as a field, done a shockingly bad job of evicting obsolete and incorrect ideas (not just useless ones). Someone who seeks a philosophy degree can expect to waste most of their time and potential on garbage. To use a mathematics analogy, it's as if mathematicians were still holding debates between binaryists, decimists, tallyists and nominalists.

Most of what's written on Less Wrong is philosophy, there's just so much garbage under philosophy's name that it made sense to invent a new name ("rationalism"), pretend it's unrelated, and guard that name so that people can use it as a way to find good philosophy without wading through the bad. It's the only reference class I know of for philosophy writings that's (a) larger than one author, (b) mostly sane, and (c) enumerable by someone who isn't an expert.

Totally agree. Not exactly. The subfields are more than specialized enough to make it pretty easy to avoid garbage. Once you're in the field it isn't hard to locate the good stuff. For institutional and political reasons the sane philosophers tend to ignore the insane philosophers and vice versa, with just the occasional flare up. It is a problem. Er, I suspect the majority of "naturalistic philosophy in the analytic tradition" would meet the sanity waterline of Less Wrong, particularly the sub-fields of epistemology and philosophy of science.

They do. (Many of EY's own posts are tagged "philosophy".) Indeed, FAI will require robust solutions to several standard big philosophical problems, not just metaethics; e.g. subjective experience (to make sure that CEV doesn't create any conscious persons while extrapolating, etc.), the ultimate nature of existence (to sort out some of the anthropic problems in decision theory), and so on. The difference isn't (just) in what questions are being asked, but in how we go about answering them. In traditional philosophy, you're usually working on problems you personally find interesting, and if you can convince a lot of other philosophers that you're right, write some books, and give a lot of lectures, then that counts as a successful career. LW-style philosophy (as in the "Reductionism" and "Mysterious Answers" sequences) is distinguished in that there is a deep need for precise right answers, with more important criteria for success than what anyone's academic peers think.

Basically, it's a computer science approach to philosophy: any progress on understanding a phenomenon is measured by how much closer it gets you to an algorithmic description of it. Academic philosophy occasionally generates insights on that level, but overall it doesn't operate with that ethic, and it's not set up to reward that kind of progress specifically; too much of it is about rhetoric, formality as an imitation of precision, and apparent impressiveness instead of usefulness.

e.g. subjective experience (to make sure that CEV doesn't create any conscious persons while extrapolating, etc.), Also, to figure out whether particular uploads have qualia, and whether those qualia resemble pre-upload qualia, it that's wanted.
I should just point out that these two goals (researching uploads, and not creating conscious persons) are starkly antagonistic.
Not in the slightest. First, uploads are continuing conscious persons. Second, creating conscious persons is a problem if they might be created in uncomfortable or possibly hellish conditions - if, say, the AI was brute-forcing every decision it would simulate countless numbers of humans in pain before it found the least painful world. I do not think we would have a problem with the AI creating conscious persons in a good environment. I mean, we don't have that problem with parenthood.
What if it's researching pain qualia at ordinary levels because it wants to understand the default human experience? I don't know if we're getting into eye-speck territory, but what are the ethics of simulating an adult human who's just stubbed their toe, and then ending the simulation?
I feel like the consequences are net positive, but I don't trust my human brain to correctly determine this question. I would feel uncomfortable with an FAI deciding it, but I would also feel uncomfortable with a person deciding it. It's just a hard question.
What if they were created in a good environment and then abruptly destroyed because the AI only needed to simulate them for a few moments to get whatever information it needed?
What if they were created in a good environment, (20) stopped, and then restarted (goto 20) ? Is that one happy immortal life or an infinite series of murders?
I think closer to the latter. Starting a simulated person, running them for a while, and then ending and discarding the resulting state effectively murders the person. If you then start another copy of that person, then depending on how you think about identity, that goes two ways: Option A: The new person, being a separate running copy, is unrelated to the first person identity-wise, and therefore the act of starting the second person does not change the moral status of ending the first. Result: Infinite series of murders. Option B: The new person, since they are running identically to the old person, is therefore actually the same person identity-wise. Thus, you could in a sense un-murder them by letting the simulation continue to run after the reset point. If you do the reset again, however, you're just recreating the original murder as it was. Result: Single murder. Neither way is a desirable immortal life, which I think is a more useful way to look at it then "happy".
Well - what if a real person went through the same thing? What does your moral intuition say?
That it would be wrong. If I had the ability to spontaneously create fully-formed adult people, it would be wrong to subsequently kill them, even if I did so painlessly and in an instant. Whether a person lives or dies should be under the control of that person, and exceptions to this rule should lean towards preventing death, not encouraging it.
The sequences are definitely philosophy, but written (mostly) without referencing the philosophers who have given (roughly) the same arguments or defended (roughly) the same positions. I really like Eliezer's way of covering many of these classic debates in philosophy. In other cases, for example in the meta-ethics sequence, I found EY's presentation unnecessarily difficult.
I'd appreciate an annotation to EY's writings that includes such references, as I'm not aware of philosophers who have given similar arguments (except Dennett and Drescher).
That would make for a very interesting project! If I find the time, maybe I'll do this for a post here or there. It would integrate Less Wrong into the broader philosophical discussion, in a way.

I have mixed feelings about that. One big difference in style between the sciences and the humanities lies in the complete lack of respect for tradition in the sciences. The humanities deal in annotations and critical comparisons of received texts. The sciences deal with efficient pedagogy.

I think that the sequences are good in that they try to cover this philosophical material in the great-idea oriented style of the sciences rather than the great-thinker oriented style of the humanities. My only complaint about the sequences is that in some places the pedagogy is not really great - some technical ideas are not explained as clearly as they might be, some of the straw men are a little too easy to knock down, and in a few places Eliezer may have even reached the wrong conclusions.

So, rather than annotating The Sequences (in the tradition of the humanities), it might be better to re-present the material covered by the sequences (in the tradition of the sciences). Or, produce a mixed-mode presentation which (like Eliezer's) focuses on getting the ideas across, but adds some scholarship (unlike Eliezer) in that it provides the standard Googleable names to the ideas discussed - both the good ideas and the bad ones.

I like this idea.
You and EY might find it particularly useful to provide such an annotation as an appendix for the material that he's assembling into his book. Or not.
I certainly think that positioning the philosophical foundations assumed by the quest for Friendly AI would give SIAI more credibility in academic circles. But right now SIAI seems to be very anti-academia in some ways, which I think is unfortunate.
I really don't think it is, as a whole. Vassar and Yudkowsky are somewhat, but there are other people within and closely associated with the organization who are actively trying to get papers published, etc. And EY himself just gave a couple of talks at Oxford, so I understand. (In fact it would probably be more accurate to say that academia is somewhat more anti-SIAI than the other way around, at the moment.) As for EY's book, my understanding is that it is targeted at popular rather than academic audiences, so it presumably won't be appropriate for it to trace the philosophical history of all the ideas contained therein, at least not in detail. But there's no reason it can't be done elsewhere.
I'm thinking of what Dennett did in Consciousness Explained, where he put all the academic-philosophy stuff in an appendix so that people interested in how his stuff relates to the broader philosophical discourse can follow that, and people not interested in it can ignore it.
Near the end of the meta-ethics sequence, Eliezer wrote that he chose to postpone reading Good and Real until he finished writing about meta-ethics because otherwise he might not finish it. For most of his life, writing for public consumption was slow and tedious, and he often got stuck. That seemed to change after he started blogging daily on Overcoming Bias, but the change was recent enough that he probably questioned its permanence.
Why, they do fall in the category of philosophy (for the most part). You can imagine that bashing bad math is just as rewarding.
There definitely is, and I would suspect that many pure mathematicians have the same worry (in fact I don't need to suspect it, sources like A Mathematicians Apology provide clear evidence of this). These people might be another good source of thinkers for a different side of the problem, although I do wonder if anything they can do to help couldn't be done better by an above average computer programmer. I would say the difference between the sequences and most philosophy is one of approach rather than content.
"Most philosophers" is not necessarily the target audience of such argument.

Just want to flag that it's not entirely obvious that we need to settle questions in meta-ethics in order to get the normative and applied ethics right. Why not just call for more work directly in the latter fields?

Yes, that's a claim that in my experience, most philosophers disagree with. It's one I'll need to argue for. But I do think one's meta-ethical views have large implications for one's normative views that are often missed.
Even if we grant that one's meta-ethical position will determine one's normative theory (which is very contentious), one would like some evidence that it would be easier to find the correct meta-ethical view than it would be to find the correct (or appropriate, or whatever) normative ethical view. Otherwise, why not just do normative ethics?
My own thought is that doing meta-ethics may illuminate normative theory, but I could be wrong about that. For example, I think doing meta-ethics right seals the deal for consequentialism, but not utilitarianism.
Since nobody understands these topics with enough clarity, and they seem related, I don't see how anyone can claim with confidence that they actually aren't related. So you saying that you "could be wrong about that" doesn't communicate anything about your understanding.
Many attempts to map out normative ethics wander substantially into meta-ethics, and vice versa. Especially the better ones. So I doubt it matters all that much where one starts - the whole kit and caboodle soon will figure into the discussion.
What exactly does normative ethics mean?

I like this post but I'd like a better idea of how it's meant to be taken to the concrete level.

Should SIAI try to hire or ask for contributions from the better academic philosophers? (SIAI honchos could do that.)

Should there be a concerted effort to motivate more research in "applied" meta-ethics, the kind that talks to neuroscience and linguistics and computer science? (Philosophers and philosophy students anywhere could do that.)

Should we LessWrong readers, and current or potential SIAI workers, educate ourselves about mainstream meta-ethics, so that we know more about it than just the Yudkowsky version, and be able to pick up on errors? (Anyone reading this site can do that.)

Note that the Future of Humanity Institute is currently hiring postdocs, either with backgrounds in philosophy or alternatively in math/cognitive science/computer science. There is close collaboration between FHI and SIAI, and the FHI is part of Oxford University, which is a bit less of a leap for a philosophy graduate student.

Folks who are anyhow heading into graduate school, and who have strengths and interests in social science, should perhaps consider focusing on moral psychology research. But I'm not at all sure of that -- if someone is aiming at existential risk reduction, there are many other useful paths to consider, and a high opportunity cost to choosing one and not others.
That's true -- I'm just trying to get a sense of what lukeprog is aiming at. Just thinking out loud, for a moment: if AI really is an imminent possibility, AI strong enough that what it chooses to do is a serious issue for humanity's safety, and if we think that we can lessen the probability of disaster by defining and building moral machines, then it's very, very important to get our analysis right before anyone starts programming. (This is just my impression of what I've read from the site, please correct me if I misunderstood.) In which case, more moral psychology research (or research in other fields related to metaethics) is really important, unless you think that there's no further work to be done. Is it the best possible use of any one person's time? I'd say, probably not, except if you are already in an unusual position. There are not many top students or academics in these fields, and even fewer who have heard of existential risk; if you are one, and you want to, this doesn't seem like a terrible plan.
I don't yet have much of an opinion on what the best way to do it is, I'm just saying it needs doing. We need more brains on the problem. Eliezer's meta-ethics is, I think, far from obviously correct. Moving toward normative ethics, CEV is also not obviously the correct solution for Friendly AI, though it is a good research proposal. The fate of the galaxy cannot rest on Eliezer's moral philosophy alone. We need critically-minded people to say, "I don't think that's right, and here are four arguments why." And then Eliezer can argue back, or change his position. And then the others can argue back, or change their positions. This is standard procedure for solving difficult problems, but as of yet I haven't seen much published dialectic like this in trying to figure out the normative foundations for the Friendly AI project. Let me give you an explicit example. CEV takes extrapolated human values as the source of an AI's eventually-constructed utility function. Is that the right way to go about things, or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function, whether or not they happen to be reasons for action arising from the brains of a particular species of primate on planet Earth? What if there are 5 other intelligent species in the galaxy who interests will not at all be served when our Friendly AI takes over the galaxy? Is that really the right thing to do? How would we go about answering questions like that?

or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function

...this sentence makes me think that we really aren't on the same page at all with respect to naturalistic metaethics. What is a reason for action? How would a computer program enumerate them all?

A 'reason for action' is the standard term in Anglophone philosophy for a source of normativity of any kind. For example, a desire is the source of normativity in a hypothetical imperative. Others have proposed that categorical imperatives exist, and provide reasons for action apart from desires. Some have proposed that divine commands exist, and are sources of normativity apart from desires. Others have proposed that certain objects or states of affairs can ground normativity intrinsically - i.e. that they have intrinsic value apart from being valued by an agent. A source of normativity (a reason for action) is anything that grounds/justifies an 'ought' or 'should' statement. Why should I look both ways before crossing the street? Presumably, this 'should' is justified by reference to my desires, which could be gravely thwarted if I do not look both ways before crossing the street. If I strongly desired to be run over by cars, the 'should' statement might no longer be justified. Some people might say I should look both ways anyway, because God's command to always look before crossing a street provides me with reason for action to do that even if it doesn't help fulfill my desires. But I don't believe that proposed reason for action exists.

Okay, see, this is why I have trouble talking to philosophers in their quote standard language unquote.

I'll ask again: How would a computer program enumerate all reasons for action?


I wonder, since it's important to stay pragmatic, if it would be good to design a "toy example" for this sort of ethics.

It seems like the hard problem here is to infer reasons for action, from an individual's actions. People do all sorts of things; but how can you tell from those choices what they really value? Can you infer a utility function from people's choices, or are there sets of choices that don't necessarily follow any utility function?

The sorts of "toy" examples I'm thinking of here are situations where the agent has a finite number of choices. Let's say you have Pac-Man in a maze. His choices are his movements in four cardinal directions. You watch Pac-Man play many games; you see what he does when he's attacked by a ghost; you see what he does when he can find something tasty to eat; you see when he's willing to risk the danger to get the food.

From this, I imagine you could do some hidden Markov stuff to infer a model of Pac-Man's behavior -- perhaps an if-then tree.

Could you guess from this tree that Pac-Man likes fruit and dislikes dying, and goes away from fruit only when he needs to avoid dying? Yeah, you could (though I don't know how to... (read more)

Something like: Run simulations of agents that can chose randomly out of the same actions as the agent has. Look for regularities in the world state that occur more or less frequently in the sensible agent compared to random agent. Those things could be said to be what it likes and dislikes respectively. To determine terminal vs instrumental values look at the decision tree and see which of the states gets chosen when a choice is forced.
Thanks. Come to think of it that's exactly the right answer.
Perhaps the next step would be to add to the model a notion of second-order desire, or analyze a Pac-Man whose apparent terminal values can change when they're exposed to certain experiences or moral arguments.


I think the reason you're having trouble with the standard philosophical category of "reasons for action" is because you have the admirable quality of being confused by that which is confused. I think the "reasons for action" category is confused. At least, the only action-guiding norm I can make sense of is desire/preference/motive (let's call it motive). I should eat the ice cream because I have a motive to eat the ice cream. I should exercise more because I have many motives that will be fulfilled if I exercise. And so on. All this stuff about categorical imperatives or divine commands or intrinsic value just confuses things.

How would a computer program enumerate all motives (which according to me, is co-exensional with "all reasons for action")? It would have to roll up its sleeves and do science. As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems (as it had done already with us), and thereby enumerate all the motives it encounters in the universe, their strengths, the relations between them, and so on.

Bu... (read more)

As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems

Now, it's just a wild guess here, but I'm guessing that a lot of philosophers who use the language "reasons for action" would disagree that "knowing the Baby-eaters evolved to eat babies" is a reason to eat babies. Am I wrong?

I'm merely raising questions that need to be considered very carefully.

I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don't expect those two tracks to meet much.

I count myself among the philosophers who would say that "knowing the Baby-eaters want to eat babies" is not a reason (for me) to eat babies. Some philosophers don't even think that the Baby-eaters' desires to eat babies are reasons for them to eat babies, not even defeasible reasons. Interesting. I always assumed that raising a question was the first step toward answering it - especially if you don't want yourself to be the only person who tries to answer it. The point of a post like the one we're commenting on is that hopefully one or more people will say, "Huh, yeah, it's important that we get this issue right," and devote some brain energy to getting it right. I'm sure the "figure it out and move on" track doesn't meet much with the "I'm uncomfortable settling on an answer" track, but what about the "pose important questions so we can work together to settle on an answer" track? I see myself on that third track, engaging in both the 'pose important questions' and the 'settle on an answer' projects.

Interesting. I always assumed that raising a question was the first step toward answering it

Only if you want an answer. There is no curiosity that does not want an answer. There are four very widespread failure modes around "raising questions" - the failure mode of paper-writers who regard unanswerable questions as a biscuit bag that never runs out of biscuits, the failure mode of the politically savvy who'd rather not offend people by disagreeing too strongly with any of them, the failure mode of the religious who don't want their questions to arrive at the obvious answer, the failure mode of technophobes who mean to spread fear by "raising questions" that are meant more to create anxiety by their raising than by being answered, and all of these easily sum up to an accustomed bad habit of thinking where nothing ever gets answered and true curiosity is dead.

So yes, if there's an interim solution on the table and someone says "Ah, but surely we must ask more questions" instead of "No, you idiot, can't you see that there's a better way" or "But it looks to me like the preponderance of evidence is actually pointing in this here other ... (read more)

Awesome. Now your reaction here makes complete sense to me. The way I worded my original article above looks very much like I'm in either the 1st category or the 4th category.

Let me, then, be very clear:

  • I do not want to raise questions so that I can make a living endlessly re-examining philosophical questions without arriving at answers.

  • I want me, and rationalists in general, to work aggressively enough on these problems so that we have answers by the time AI+ arrives. As for the fact that I don't have answers yet, please remember that I was a fundamentalist Christian 3 years ago, with no rationality training at all, and a horrendous science education. And I didn't discover the urgency of these problems until about 6 months ago. I've have had to make extremely rapid progress from that point to where I am today. If I can arrange to work on these problems full time, I think I can make valuable contributions to the project of dealing safely with Friendly AI. But if that doesn't happen, well, I hope to at least enable others who can work on this problem full time, like yourself.

  • I want to solve these problems in 15 years, not 20. This will make most academic philosophers, and most

... (read more)
Any response to this, Eliezer?

Well, the part about you being a fundamentalist Christian three years ago is damned impressive and does a lot to convince me that you're moving at a reasonable clip.

On the other hand, a good metaethical answer to the question "What sort of stuff is morality made out of?" is essentially a matter of resolving confusion; and people can get stuck on confusions for decades, or they can breeze past confusions in seconds. Comprehending the most confusing secrets of the universe is more like realigning your car's wheels than like finding the Lost Ark. I'm not entirely sure what to do about the partial failure of the metaethics sequence, or what to do about the fact that it failed for you in particular. But it does sound like you're setting out to heroically resolve confusions that, um, I kinda already resolved, and then wrote up, and then only some people got the writeup... but it doesn't seem like the sort of thing where you spending years working on it is a good idea. 15 years to a piece of paper with the correct answer written on it is for solving really confusing problems from scratch; it doesn't seem like a good amount of time for absorbing someone else's solution. If y... (read more)

I should add that I don't think I will have meta-ethical solutions in 15 years, significantly because I'm not optimistic that I can get someone pay my living expenses while I do 15 years of research. (Why should they? I haven't proven my abilities.) But I think these problems are answerable, and that we are in a fantastic position to answer them if we want to do so. We know an awful lot about physics, psychology, logic, neuroscience, AI, and so on. Even experts that were active 15 years before now did not have all these advantages. More importantly, most thinkers today do not even take advantage of them.

Have you considered applying to the SIAI Visiting Fellows program? It could be worth a month or 3 of having your living expenses taken care of while you research, and could lead to something longer term.

Seconding JGWeissman — you'd probably be accepted as a Visiting Fellow in an instant, and if you turn out to be sufficiently good at the kind of research and thinking that they need to have done, maybe you could join them as a paid researcher.
15 years is much too much; if you haven't solved metaethics after 15 years of serious effort, you probably never will. The only things that're actually time consuming on that scale are getting stopped with no idea how to proceed, and wrong turns into muck. I see no reason why a sufficiently clear thinker couldn't finish a correct and detailed metaethics in a month.
I suppose if you let "sufficiently clear thinker" do enough work this is just trivial. But it's a sui generis problem... I'm not sure what information a time table could be based on other than the fact that it has been way longer than a month and no one has succeeded yet. It is also worth keeping in mind, that scientific discoveries routinely impact the concepts we use to understand the world. The computational model of the human brain was generated as a hypothesis until after we had built computers and could see what they do, even though, in principle that hypothesis could have been invented at nearly any point in history. So it seems plausible the crucial insight needed for a successful metaethics will come from a scientific discovery that someone concentrating on philosophy for a month wouldn't make.
Supposing anyone had already succeeded, how strong an expectation do you think we should have of knowing about it?
Not all that strong. It may well be out there in some obscure journal but just wasn't interesting enough for anyone to bother replying to. Hell, it multiple people may have succeeded. But I think "success" might actually be underdetermined here. Some philosophers may have had the right insights, but I suspect that if they had communicated those insights in the formal method necessary for Friendly AI the insights would have felt insightful to readers and the papers would have gotten attention. Of course, I'm not even familiar with cutting edge metaethics. There may well be something like that out there. It doesn't help that no one here seems willing to actually read philosophy in non-blog format.
Related question: suppose someone handed us a successful solution, would we recognize it?
So Yudkowsky came up with a correct and detailed metaethics but failed to communicate it?
I think it's correct, but it's definitely not detailed; some major questions, like "how to weight and reconcile conflicting preferences", are skipped entirely.
What do you believe to be the reasons? Didn't he try or fail? I'm trying to fathom what kind of person is a sufficiently clear thinker. If not even EY is a sufficiently clear thinker, then your statement that such a person could come up with a detailed metaethics in a month seems self-evident. If someone is a sufficiently clear thinker to accomplish a certain task then they will complete it if they try. What's the point? It sounds like you are saying that there are many smart people that could accomplish the task if they only tried. But if in fact EY is not one of them, that's bad. Yesterday I read In Praise of Boredom. It seems that EY also views intelligence as something proactive: No doubt I am a complete layman when it comes to what intelligence is. But as far as I am aware it is a kind of goal-oriented evolutionary process equipped with a memory. It is evolutionary insofar as it still needs to stumble upon novelty. Intelligence is not a meta-solution but an efficient searchlight that helps to discover unknown unknowns. Intelligence is also a tool that can efficiently exploit previous discoveries, combine and permute them. But claiming that you just have to be sufficiently intelligent to solve a given problem sounds like it is more than that. I don't see that. I think that if something crucial is missing, something you don't know that it is missing, you'll have to discover it first and not invent it by the sheer power of intelligence.
By "a sufficiently clear thinker" you mean an AI++, right? :)
Nah, an AI++ would take maybe five minutes.
A month sounds considerably overoptimistic to me. Wrong steps and backtracking are probably to be expected, and it would probably be irresponsible to commit to a solution before allowing other intelligent people (who really want to find the right answer, not carry on endless debate) to review it in detail. For a sufficiently intelligent and committed worker, I would not be surprised if they could produce a reliably correct metaethical theory within two years, perhaps one, but a month strikes me as too restrictive.
Of course, this one applies to scaremongers in general, not just technophobes.
Knowing the Baby-eaters want to eat babies is a reason for them to eat babies. It is not a reason for us to let them eat babies. My biggest problem with desirism in general is that it provides no reason for us to want to fulfill others' desires. Saying that they want to fulfill their desires is obvious. Whether we help or hinder them is based entirely on our own reasons for action.
That's not a bug, it's a feature.
Are you familiar with desirism? It says that we should want to fulfill others' desires, but, AFAI can tell, gives no reason why.
No. This is not what desirism says.
From your desirism FAQ: The moral thing to do is to shape my desires to fulfill others' desires, insofar as they are malleable. This is what I meant by "we should want to fulfill others' desires," though I acknowledge that a significant amount of precision and clarity was lost in the original statement. Is this all correct?
The desirism FAQ needs updating, and is not a very clear presentation of the theory, I think. One problem is that much of the theory is really just a linguistic proposal. That's true for all moral theories, but it can be difficult to separate the linguistic from the factual claims. I think Alonzo Fyfe and I are doing a better job of that in our podcast. The latest episode is The Claims of Desirism, Part 1.
I will listen to that.
Unfortunately, we're not making moral claims yet. In meta-ethics, there is just too much groundwork to lay down first. Kinda like how Eliezer took like like 200 posts to build up to talking about meta-ethics.
So, just to make sure, what I said in the grandparent is not what desirism says?
Ah, oops. I wasn't familiar with it, and I misunderstood the sentence.
Is knowing that Baby-eaters want babies to be eaten a reason, on your view, to design an FAI that optimizes its surroundings for (among other things) baby-eating?
I very much doubt it. Even if we assume my own current meta-ethical views are correct - an assumption I don't have much confidence in - this wouldn't leave us with reason to design an FAI that optimizes its surroundings for (among other things) baby-eating. Really, this goes back to a lot of classical objections to utilitarianism.
For the record, I currently think CEV is the most promising path towards solving the Friendly AI problem, I'm just not very confident about any solutions yet, and am researching the possibilities as quickly as possible, using my outline for Ethics and Superintelligence as a guide to research. I have no idea what the conclusions in Ethics and Superintelligence will end up being.
Here's an interesting juxtaposition... Eliezer-2011 writes: Eliezer-2007 quotes Robyn Dawes, saying that the below is "so true it's not even funny": Is this a change of attitude, or am I just not finding the synthesis? Eliezer-2011 seems to want to propose solutions very quickly, move on, and come back for repairs if necessary. Eliezer-2007 advises that for difficult problems (one would think that FAI qualifies) we take our time to understand the relevant issues, questions, and problems before proposing solutions.
There's a big different between "not immediately" and "never". Don't propose a solution immediately, but do at least have a detailed working guess at a solution (which can be used to move to the next problem) in a year. Don't "merely" raise a question, make sure that finding an answer is also part of the agenda.
It's a matter of the twelfth virtue of rationality, the intention to cut through to the answer, whatever the technique. The purpose of holding off on proposing solutions is to better find solutions, not to stop at asking the question.
I suggest that he still holds both of those positions (at least, I know I do so do not see why he wouldn't) but that they apply to slightly different contexts. Eliezer's elaboration in the descendant comments from the first quote seemed to illustrate why fairly well. They also, if I recall, allowed that you do not fit into the 'actually answering is unsophisticated' crowd, which further narrows down just what he is meaning.
The impression I get is that EY-2011 believes that he has already taken the necessary time to understand the relevant issues, questions, and problems and that his proposed solution is therefore unlikely to be improved upon by further up-front thinking about the problem, rather than by working on implementing the solution he has in mind and seeing what difficulties come up. Whether that's a change of attitude, IMHO, depends a lot on whether his initial standard for what counts as an adequate understanding of the relevant issues, questions, and problems was met, or whether it was lowered. I'm not really sure what that initial standard was in the first place, so I have no idea which is the case. Nor am I sure it matters; presumably what matters more is whether the current standard is adequate.
The point of the Dawes quote is to hold off on proposing solutions until you've thoroughly comprehended the issue, so that you get better solutions. It doesn't advocate discussing problems simply for the sake of discussing them. Between both quotes there's a consistent position that the point is to get the right answer, and discussing the question only has a point insofar as it leads to getting that answer. If you're discussing the question without proposing solutions ad infinitum, you're not accomplishing anything.
Keep in mind that talking with regard to solutions is just so darn useful. Even if you propose an overly specific solution early, than it has a large surface area of features that can be attacked to prove it incompatible with the problem. You can often salvage and mutate what's left of the broken idea. There's not a lot of harm in that, rather there is a natural give and take whereby dismissing a proposed solution requires identifying what part of the problem requirements are contradicted, and it may very well not have occurred to you to specify that requirement in the first place. I believe it has been observed that experts almost always talk in terms of candidate solutions, and amateurs attempt to build up from a platform of the problem itself. Experts of course having objectively better performance. The algorithm for provably moral superintelligences might not have a lot of prior solutions to draw from, but you could, for instance, find some inspiration even from the outside view of how some human political systems have maintained generally moral dispositions. There is a bias to associate your status with ideas you have vocalized in the past since they reflect on the quality of your thinking, but you can't throw the baby out with the bathwater. The Maier quote comes off as way to strong for me. And what's with this conclusion:
I think there's a synthesis possible. There's a purpose of finding a solid answer, but finding it requires a period of exploration rather than getting extremely specific in the beginning of the search.
If you don't spend much time on the track where people just raise questions, how do you encounter the new insights that make it necessary to dance back for repairs on your track? Just asking. :) Though I do tend to admire your attitude of pragmatism and impatience with those who dither forever.
I presume you encounter them later on. Maybe while doing more ground-level thinking about how to actually implement your meta-ethics you realise that it isn't quite coherent. I'm not sure if this flying-by-the-seat-of-your-pants approach is best, but as has been pointed out before, there are costs associated with taking too long as well as with not being careful enough, there must come a point where the risk is too small and the time it would take to fix it too long.
Well, I'll certainly agree that more potential problems are surfaced by moving ahead with the implementation than by going back to the customer with another round of questions about the requirements.
I can see that you might question the usefulness of the notion of a "reason for action" as something over and above the notion of "ought", but I don't see a better case for thinking that "reason for action" is confused. The main worry here seems to have to do with categorical reasons for action. Diagnostic question: are these more troubling/confused than categorical "ought" statements? If so, why? Perhaps I should note that philosophers talking this way make a distinction between "motivating reasons" and "normative reasons". A normative reason to do A is a good reason to do A, something that would help explain why you ought to do A, or something that counts in favor of doing A. A motivating reason just helps explain why someone did, in fact, do A. One of my motivating reasons for killing my mother might be to prevent her from being happy. By saying this, I do not suggest that this is a normative reason to kill my mother. It could also be that R would be a normative reason for me to A, but R does not motivate my to do A. (ata seems to assume otherwise, since ata is getting caught up with who these considerations would motivate. Whether reasons could work like this is a matter of philosophical controversy. Saying this more for others than you, Luke.) Back to the main point, I am puzzled largely because the most natural ways of getting categorical oughts can get you categorical reasons. Example: simple total utilitarianism. On this view, R is a reason to do A if R is the fact that doing A would cause someone's well-being to increase. The strength of R is the extent to which that person's well-being increases. One weighs one's reasons by adding up all of their strengths. On then does the thing that one has most reason to do. (It's pretty clear in this case that the notion of a reason plays an inessential role in the theory. We can get by just fine with well-being, ought, causal notions, and addition.) Utilitarianism, as always, is a simple case. But it seems like ma
utilitymonster, For the record, as a good old Humean I'm currently an internalist about reasons, which leaves me unable (I think) to endorse any form of utilitarianism, where utilitarianism is the view that we ought to maximize X. Why? Because internal reasons don't always, and perhaps rarely, support maximizing X, and I don't think external reasons for maximizing X exist. For example, I don't think X has intrinsic value (in Korsgaard's sense of "intrinsic value"). Thanks for the link to that paper on rational choice theories and decision theories!
So are categorical reasons any worse off than categorical oughts?
Categorical oughts and reasons have always confused me. What do you see as the difference, and which type of each are you thinking of? The types of categorical reasons or reasons with which I'm most familiar are Kant's and Korsgaard's.
R is a categorical reason for S to do A iff R counts in favor doing A for S, and would so count for other agents in a similar situation, regardless of their preferences. If it were true that we always have reasons to benefit others, regardless of what we care about, that would be a categorical reason. I don't use the term "categorical reason" any differently than "external reason". S categorically ought to do A just when S ought to do A, regardless of what S cares about, and it would still be true that S ought to do A in similar situations, regardless of what S cares about. The rule: always maximize happiness, would, if true, ground a categorical ought. I see very little reason to be more or less skeptical of categorical reasons or categorical oughts than the other.
Agreed. And I'm skeptical of both. You?
Hard to be confident about these things, but I don't see the problem with external reasons/oughts. Some people seem to have some kind of metaphysical worry...harder to reduce or something. I don't see it.
Nitpick: Wallach & Collin are cited only for the term 'artificial moral agents' (and the paper is by myself and Roko Mijic). The comparison in the paper is mostly just to the idea of specifying object-level moral principles.
Oops. Thanks for the correction.

A 'reason for action' is the standard term in Anglophone philosophy for a source of normativity of any kind. For example, a desire is the source of normativity in a hypothetical imperative. Others have proposed that categorical imperatives exist, and provide reasons for action apart from desires. Some have proposed that divine commands exist, and are sources of normativity apart from desires. Others have proposed that certain objects or states of affairs can ground normativity intrinsically - i.e. that they have intrinsic value apart from being valued by an agent.

Okay, but all of those (to the extent that they're coherent) are observations about human axiology. Beware of committing the mind projection fallacy with respect to compellingness — you find those to be plausible sources of normativity because your brain is that of "a particular species of primate on planet Earth". If your AI were looking for "reasons for action" that would compel all agents, it would find nothing, and if it were looking for all of the "reasons for action" that would compel each possible agent, it would spend an infinite amount of time enumerating stupid pointless motivatio... (read more)

If you want to be run over by cars, you should still look both ways. You might miss otherwise!
One way might be enough, in that case.
That depends entirely on the street, and the direction you choose to look. ;)
Depends on how soon you insist it happen.
Sorry... what I said above is not quite right. There are norms that are not reasons for action. For example, epistemological norms might be called 'reasons to believe.' 'Reasons for action' are the norms relevant to, for example, prudential normativity and moral normativity.
This is either horribly confusing, or horribly confused. I think that what's going on here is that you (or the sources you're getting this from) have taken a bundle of incompatible moral theories, identified a role that each of them has a part playing, and generalized a term from one of those theories inappropriately. The same thing can be a reason for action, a reason for inaction, a reason for belief and a reason for disbelief all at once, in different contexts depending on what consequences these things will have. This makes me think that "reason for action" does not carve reality, or morality, at the joints.
I'm sort of surprised by how people are taking the notion of "reason for action". Isn't this a familiar process when making a decision? 1. For all courses of action you're thinking of taking, identify the features (consequences if you that's you think about things) that count in favor of taking that course of action and those that count against it. 2. Consider how those considerations weigh against each other. (Do the pros outweigh the cons, by how much, etc.) 3. Then choose the thing that does best in this weighing process. It is not a presupposition of the people talking this way that if R is a reason to do A in a context C, then R is a reason to do in all contexts. The people talking this way also understand that a single R might be both a reason to do A and a reason to believe X at the same time. You could also have R be a reason to believe X and a reason to cause yourself to not believe X. Why do you think these things make the discourse incoherent/non-perspicuous? This seems no more puzzling than the familiar fact that believing a certain thing could be epistemically irrational but prudentially rational to (cause yourself) to believe.
All the reasons for action that exist? Like, the preferences of all possible minds? I'm not sure that utility function would be computable... Edit: Actually, if we suppose that all minds are computable, then there's only a countably infinite number of possible minds, and for any mind with a utility function U(x), there is a mind somewhere in that set with the utility function -U(x). So, depending on how you weight the various possible utility functions, it may be that they'd all cancel out. Notice that you're a human but you care about that. If there weren't something in human axiology that could lead to sufficiently smart and reflective people concluding that nonhuman intelligent life is valuable, you wouldn't have even thought of that — and, indeed, it seems that in general as you look at smarter, more informed, and more thoughtful people, you see less provincialism and more universal views of ethics. And that's exactly the sort of thing that CEV is designed to take into account. Don't you think that there would be (at least) strong support for caring about the interests of other intelligent life, if all humans were far more intelligent, knowledgeable, rational, and consistent, and heard all the arguments for and against it? And if we were all much smarter and still largely didn't think it was a good idea to care about the interests of other intelligent species... I really don't think that'll happen, but honestly, I'll have to defer to the judgment of our extrapolated selves. They're smarter and wiser than me, and they've heard more of the arguments and evidence than I have. :)
The same argument applies to just using one person as the template and saying that their preference already includes caring about all the other people. The reason CEV might be preferable to starting from your own preference (I now begin to realize) is that the decision to privilege yourself vs. grant other people fair influence is also subject to morality, so to the extent you can be certain about this being more moral, it's what you should do. Fairness, also being merely a heuristic, is subject to further improvement, as can be inclusion of volition of aliens in the original definition. Of course, you might want to fall back to a "reflective injunction" of not inventing overly elaborate plans, since you haven't had the capability of examining them well enough to rule them superior to more straightforward plans, such as using volition of a single human. But this is still a decision point, and the correct answer is not obvious.
This reminds me of the story of the people who encounter a cake, one of whom claims that what's "fair" is that they get all the cake for themself. It would be a mistake for us to come to a compromise with them on the meaning of "fair". Does the argument for including everyone in CEV also argue for including everyone in a discussion of what fairness is?
But making humans more intelligent, more rational would mean to alter their volition. An FAI that would proactively make people become more educated would be similar to one that altered the desires of humans directly. If it told them that the holy Qur'an is not the word of God it would dramatically change their desires. But what if people actually don't want to learn that truth? In other words, any superhuman intelligence will have a very strong observer effect and will cause a subsequent feedback loop that will shape the future according to the original seed AI, or the influence of its creators. You can't expect to create a God and still be able to extrapolate the natural desires of human beings. Human desires are not just a fact about their evolutionary history but also a mixture of superstructural parts like environmental and cultural influences. If you have some AI God leading humans into the future then at some point you have altered all those structures and consequently changed human volition. The smallest bias in the original seed AI will be maximized over time by the feedback between the FAI and its human pets. ETA You could argue that all that matters is the evolutionary template for the human brain. The best way to satisfy it maximally is what we want, what is right. But leaving aside the evolution of culture and the environment seems drastic. Why not go a step further and create a new better mind as well? I also think it is a mistake to generalize from the people you currently know to be intelligent and reasonable as they might be outliers. Since I am a vegetarian I am used to people telling me that they understand what it means to eat meat but that they don't care. We should not rule out the possibility that the extrapolated volition of humanity is actually something that would appear horrible and selfish to us "freaks". That is only reasonable if matters of taste are really subject to rational argumentation and judgement. If it really doesn't matter
Judging from his posts and comments here, I conclude that EY is less interested in dialectic than in laying out his arguments so that other people can learn from them and build on them. So I wouldn't expect critically-minded people to necessarily trigger such a dialectic. That said, perhaps that's an artifact of discussion happening with a self-selected crowd of Internet denizens... that can exhaust anybody. So perhaps a different result would emerge if a different group of critically-minded people, people EY sees as peers, got involved. The Hanson/Yudkowsky debate about FOOMing had more of a dialectic structure, for example. With respect to your example, the discussion here might be a starting place for that discussion, btw. The discussions here and here and here might also be salient. Incidentally: the anticipated relationship between what humans want, what various subsets of humans want, and what various supersets including humans want, is one of the first questions I asked when I encountered the CEV notion. I haven't gotten an explicit answer, but it does seem (based on other posts/discussions) that on EY's view a nonhuman intelligent species valuing something isn't something that should motivate our behavior at all, one way or another. We might prefer to satisfy that species' preferences, or we might not, but either way what should be motivating our behavior on EY's view is our preferences, not theirs. What matters on this view is what matters to humans; what doesn't matter to humans doesn't matter. I'm not sure if I buy that, but satisfying "all the reasons for action that exist" does seem to be a step in the wrong direction.
TheOtherDave, Thanks for the links! I don't know what "satisfying all the reasons for action that exist" is the solution, but I listed it as an example alternative to Eliezer's theory. Do you have a preferred solution?
Not really. Rolling back to fundamentals: reducing questions about right actions to questions about likely and preferred results seems reasonable. So does treating the likely results of an action as an empirical question. So does approaching an individual's interests empirically, and as distinct from their beliefs about their interests, assuming they have any. The latter also allows for taking into account the interests of non-sapient and non-sentient individuals, which seems like a worthwhile goal. Extrapolating a group's collective interests from the individual interests of its members is still unpleasantly mysterious to me, except in the fortuitous special case where individual interests happen to align neatly. Treating this as an optimization problem with multiple weighted goals is the best approach I know of, but I'm not happy with it; it has lots of problems I don't know how to resolve. Much to my chagrin, some method for doing this seems necessary if we are to account for individual interests in groups whose members aren't peers (e.g., children, infants, fetuses, animals, sufferers of various impairments, minority groups, etc., etc., etc.), which seems good to address. It's also at least useful to addressing groups of peers whose interests don't neatly align... though I'm more sanguine about marketplace competition as an alternative way of addressing that. Something like this may also turn out to be critical for fully accounting for even an individual human's interests, if it turns out that the interests of the various sub-agents of a typical human don't align neatly, which seems plausible. Accounting for the probable interests of probable entities (e.g., aliens) I'm even more uncertain about. I don't discount them a priori, but without a clearer understanding of such an accounting would actually look like I really don't know what to say about them. I guess if we have grounds for reliably estimating the probability of a particular interest being had by
To respond to your example (while agreeing that it is good to have more intelligent people evaluating things like CEV and the meta-ethics that motivates it): I think the CEV approach is sufficiently meta that if we would conclude on meeting and learning about the aliens, and considering their moral significance, that the right thing to do involves giving weight to their preferences, then an FAI constructed from our current CEV would give weight to their preferences once it discovers them.
If they are to be given weight at all, then this could as well be done in advance, so prior to observing aliens we give weight to preferences of all possible aliens, conditionally on future observations of which ones turn out to actually exist.
From a perspective of pure math, I think that is the same thing, but in considering practical computability, it does not seem like a good use of computing power to figure what weight to give the preference of a particular alien civilization out of a vast space of possible civilizations, until observing that the particular civilization exists.
Such considerations could have some regularities even across all the diverse possibilities, which are easy to notice with a Saturn-sized mind.
One such regularity comes to mind: most aliens would rather be discovered by a superintelligence that was friendly to them than not be discovered, so spreading and searching would optimize their preferences.

CEV also has the problem that nothing short of a superintelligence could actually use it, so unless AI has a really hard takeoff you're going to need something less complicated for your AI to use in the meantime.

Personally I've always thought EY places too much emphasis on solving the whole hard problem of ultimate AI morality all at once. It would be quite valuable to see more foundation-building work on moral systems for less extreme sots of AI, with an emphasis on avoiding bad failure modes rather than trying to get the best possible outcome. That’s the sort of research that could actually grow into an academic sub-discipline, and I’d expect it to generate insights that would help with attempts to solve the SI morality problem.

Of course, the last I heard EY was still predicting that dangerous levels of AI will come along in less time than it would take such a discipline to develop. The gradual approach could work if it takes 100 years to go from mechanical kittens to Skynet’s big brother, but not if it only takes 5.

Agreed. Also note that a really hard fast takeoff is even more of a reason to shift emphasis away from distant uncomputable impracticable problems and focus on the vastly smaller set of actual practical choices that we can make now.

I'd like to add the connection between the notions of "meta-ethics" and "decision theory" (of the kind we'd want a FAI/CEV to start out with). For the purpose of solving FAI, these seem to be the same, with "decision theory" emphasizing the outline of the target, and "meta-ethics" the source of correctness criteria for such theory in human intuition.

Hmm. I thought metaethics was about specifying a utility function, and decision theory was about algorithms for achieving the optimum of a given utility function. Or do you have a different perspective on this?
Even if we assume that "utility function" has anything to do with FAI-grade decision problems, you'd agree that prior is also part of specification of which decisions should be made. Then there's the way in which one should respond to observations, the way one handles logical uncertainty and decides that given amount of reflection is sufficient to suspend an ethical injunction (such as "don't act yet"), the way one finds particular statements first in thinking about counterfactuals (what forms agent-provability), which can be generalized to non-standard inference systems, and on and on this list goes. This list is as long as morality, and it is morality, but it parses it in a specific way that extracts the outline of its architecture and not just individual pieces of data. When you consider methods of more optimally solving a decision problem, how do you set criteria of optimality? Some things are intuitively obvious, and very robust to further reflection, but ultimately you'd want the decision problem itself to decide what counts as an improvement in the methods of solving it. For example, obtaining superintelligent ability to generate convincing arguments for a wrong statement can easily ruin your day. So efficient algorithms are, too, a subject of meta-ethics, but of course in the same sense as we can conclude that we can include an "action-definition" as a part of general decision problems, we can conclude that "more computational resources" is an improvement. And as you know from agent-simulates-predictor, that is not universally the case.
I think it is important to keep in mind that the approach currently favored here, in which your choice of meta-ethics guides your choice of decision theory, and in which your decision theory justifies your metaethics (in a kind of ouroborean epiphany of reflective equilibrium) - that approach is only one possible research direction. There are other approaches that might be fruitful. In fact, it is far from clear to many people that the problem of preventing uFAI involves moral philosophy at all. (ETA: Or decision theory.) To a small group, it sometimes appears that the only way of making progress is to maintain a narrow focus and to ruthlessly prune research subtrees as soon as they fall out of favor. But pruning in this way is gambling - it is an act of desperation by people who are made frantic by the ticking of the clock. My preference (which may turn out to be a gamble too), is to ignore the ticking and to search the tree carefully with the help of a large, well-trained army of researchers.
Much depends of course on the quantity of time we have available. If the market progresses to AGI on it's own in 10 years, our energies are probably best spent focused on a narrow set of practical alternatives. If we have a hundred years, then perhaps we can afford to entertain several new generations of philosophers.
But the problem itself seems to suggest that if you don't solve it on its own terms, and instead try to mitigate the practical difficulties, you still lose completely. AGI is a universe-exploding A-Bomb which the mad scientists are about to test experimentally in a few decades, you can't improve the outcome by building better shelters (or better casing for the bomb).
Yudkowsky apparently councils ignoring the ticking as well - here: I have argued repeatedly that the ticking is a fundamental part of the problem - and that if you ignore it, you just lose (with high probability) to those who are paying their clocks more attention. The "blank them completely out of your mind" advice seems to be an obviously-bad way of approaching the whole area. It is unfortunate that getting more time looks very challenging. If we can't do that, we can't afford to dally around very much.
Yes, and that comment may be the best thing he has ever written. It is a dilemma. Go too slow and the bad guys may win. Go too fast, and you may become the bad guys. For this problem, the difference between "good" and "bad" has nothing to do with good intentions.
Another analyis is that there are at least two types of possible problem: * One is the "runaway superintelligence" problem - which the SIAI seems focused on; * Another type of problem involves the preferences of only a small subset of human being respected. The former problem has potentially more severe consequences (astronomical waste), but an engineering error like that seems pretty unlikely - at least to me. The latter problem could still have some pretty bad consequences for many people, and seems much more probable - at least to me. In a resource-limited world, too much attention on the first problem could easily contribute to running into the second problem.
Vladimir_Nesov, Gary Drescher has an interesting way of grabbing deontological normative theory from his meta-ethics, coupled with decision theory and game theory. He explains it in Good and Real, though I haven't had time to evaluate it much yet.
Given your interest, you probably should read it (if not having read it is included in what you mean by not having had time to evaluate it). Although I still haven't, I know Gary is right on most things he talks about, and expresses himself clearly.
Right; you can extend decision theory to include reasoning about which computations the decision theoretic 'agent' is situated in and how that matters for which decisions to make, shaper/anchor semantics-style. Meta-ethics per se is just the set of (hopefully mathematical-ish) intuitions we draw on and that guide how humans go about reasoning about what is right, and that we kind of expect to align somewhat with what a good situated decision theory would do, at least before the AI starts trading with other superintelligences that represented different contexts. If meta-level contextual/situated decision theory is convergent, the only differences between superintelligences are differences about what kind of world they're in. Meta-ethics is thus kind of superfluous except as a vague source of intuitions that should probably be founded in math, whereas practical axiology (fueled by evolutionary psychology, evolutionary game theory, social psychology, etc) is indicative of the parts of humanity that (arguably) aren't just filled in by the decision theory.

Would someone familiar with the topic be able to do a top level treatment similar to the recent one on self-help? A survey of the literature, etc.

I am a software engineer, but I don't know much about general artificial intelligence. The AI research I am familiar with is very different from what you are talking about here.

Who is currently leading the field in attempts at providing mathematical models for philosophical concepts? Are there simple models that demonstrate what is meant by computational meta-ethics? Is that a correct search term -- as in a term ... (read more)

It sounds like you're asking for something broader than this, but I did just post a bibliography on Friendly AI, which would make for a good start. Unfortunately, meta-ethics is one of the worst subjects to try to "dive into," because it depends heavily on so many other fields. I was chatting with Stephen Finlay, a meta-ethicist at USC, and he said something like: "It's hard to have credibility as a professor teaching meta-ethics, because meta-ethics depends on so many fields, and in most of the graduate courses I teach on meta-ethics, I know that every one of my students knows more about one of those fields than I do."

How much thought has been given to hard coding an AI with a deontological framework rather than giving it some consequentialist function to maximize? Is there already a knockdown argument showing why that is a bad idea?

EDIT: I'm not talking about what ethical system to give an AI that has the potential to do the most good, but one that would be capable of the least bad.

It is very hard to get an AI to understand the relevant deontological rules. Once you have accomplished that, there is no obvious next step easier and safer than CEV.
Intuitively, it seems easier to determine if a given act violates the rule "do not lie" than the rule "maximize the expected average utility of population x". Doesn't this mean that I understand the first rule better than the second?
Yes, but you're a human, not an AI. Your brain comes factory-equipped with lots of machinery for understanding deontological injunctions, and no (specific) machinery for understanding the concept of expected utility maximization. Programming each of those concepts into an AI and conveying them to a human are entirely different tasks.
Logical uncertainty, which is unavoidable no matter how smart you are, blurs the line. AI won't "understand" expected utility maximization completely either, it won't see all the implications no matter how much computational resources it has. And so it needs more heuristics to guide its decisions where it can't figure out all the implications. Those are the counterparts of deontological injunctions, although of course they must be subject to revision on sufficient reflection (and what "sufficient" means is one of these injunctions, also subject to revision). Some of then will even have normative implications, in fact that's once reason preference is not utility function.
That said, it's hard to reason about what preferences/morality/meta-ethics/etc. an AI actually converges to if you give it vague deontological injunctions like "be nice" or "produce paperclips". It'd be really cool if more people were thinking about likely attractors on top of or instead of the recognized universal AI drives. (Also I'll note that I agree with Nesov that logical uncertainty / the grounding problem / no low level language etc. problems pose similar difficulties to the 'you can't just do ethical injunctions' problem. That said, humans are able to do moral reasoning somehow, so it can't be crazy difficult.)
You are making a huge number of assumptions here: Such as? Where is this machinery? How do you understand the concept of expected utility maximization? Is it not through the highly general machinery of your cortex? And how can we expect that the algorithm of "expected utility maximization" actually represents our best outcome? debatable
"Machinery" was a figure of speech, I'm not saying we're going to find a deontology lobe. I was referring, for instance, to the point that there are evolutionary reasons why we'd expect to find (as we do) that an understanding of deontological injunctions is fairly universal among humans. Oops, sorry, I accidentally used the opposite of the word I meant. That should have been "specific", not "general". Yes, we understand expected utility maximization with highly general machinery, and in very abstract terms.
EY's theory linked in the 1st post that deontological injunctions evolved as some sort of additional defense against black swan events does not appear especially convincing to me. The cortex is intrinsically predictive consequentialist at a low level, but simple deontological rules are vast computational shortcuts. An animal brain learns the hard way, the way AIXI does, thoroughly consequentialist at first, but once predictable pattern matches are learned at higher levels they can be sometimes simplified down to simpler rules for quick decisions. Even non-verbal animals find ways to pass down some knowledge to their offspring, but in humans this is vastly amplified through language. Every time a parent tells a child what to do, the parent is transmitting complex consequentualist results down to the younger mind in the form of simpler cached deontological behaviors. Ex: It would be painful for the child to learn a firsthand consequentualist account of why stealing is detrimental (the tribe will punish you). Once this machinery was in place, it could extend over generations and develop into more complex cultural and religious deontologies. All of this can be accomplished through cortical reinforcement learning as the child develops. Feral children, for all intents and purposes, act like feral animals. Human minds are cultural/linguistic software phenomena. I'm not aware of any practical approach to AI which consists of programming concepts directly into an AI. All modern approaches program only the equivalent of an empty brain, the concepts and resulting mind forms through learning. Humans concepts are expressed in natural language, and for an AGI to compete with humans it will need to learn extant human knowledge. Learning natural language thus seems like the most practical approach. The problem is this: if we define an algorithm to represent our best outcome and use that as the standard of rationality, and the algorithm's predictions then differ significantl
Even if deontological injunctions are only transmitted through language, they are based on human predispositions (read brain wiring) to act morally and cooperate, which has evolved. This somewhat applies to animals too, there's been research on altruism in animals.
That he makes assumptions is no point against him; the question is do those assumptions hold. To support the first one: the popularity and success of the fallacy of appealing to authority, Milgram's comments on his experiment, the "hole-shaped God" theory (well supported). For the second one: First, it's not entirely clear we do understand expected utility maximisation. Certainly, I know of no-one who acts as though they are maximising their expected utility. Second, to the extent that we do understand it, I would draw the metaphor of a Turing tarpit - I would say that we understand it only in the sense that we can hack together a bunch of neural processes that do other things, in such a way that they produce the words "expected utility maximisation" and the concept "act to get the most of what you really want". This is still an understanding, of course, but in no way do we have machinery for that purpose like how we have machinery for orders from authority / deontological injunctions. "Expected utility maximisation" is, by definition what actually represents our best outcome. To the extent that it doesn't, it is a failure of our ability to grasp and apply the concept, not a failure in the concept itself. As for the third, and for your claim of debatable: Yes, you could debate it. You would have to stand on some very wide definitions of entirely and different, and you'd lose the debate. For example: speaking aloud to an AI and speaking aloud to a human are entirely different tasks. Not to mention that conveying a concept to a human carries no instructions; programming concepts into an AI is all instructions. Another entire difference.
No, it's based on certain axioms that are not unbreakable in strange contexts, which in turn assume a certain conceptual framework (where you can, say, enumerate possibilities in a certain way).
Name one exception to any axiom other than the third or to the general conceptual framework.
There's no point in assuming completeness, being able to compare events that you won't be choosing between (in the context of utility function having possible worlds as domain). Updateless analysis says that you never actually choose between observational events. And there are only so many counterfactuals to consider (which in this setting are more about high-level logical properties of a fixed collection of worlds, which lead to their different utility, and not presence/absence of any given possible world, so in one sense even counterfactuals don't give you nontrivial events).
Is there ever actually a two events for which this would not hold if you did need to make such a choice? I'm not sure what you mean. Outcomes do not have to be observed in order to be chosen between. Isn't this just seperating degrees of freedom and assuming that some don't affect others? It can be derived from the utility axioms.

Rather, my point is that we need lots of smart people working on these meta-ethical questions.

I'm curious if the SIAI shares that opinion. Is Michael Vassar trying to hire more people or is his opinion that a small team will be able to solve the problem? Can the problem be subdivided into parts, is it subject to taskification?

I'm curious if the SIAI shares that opinion.

I do. More people doing detailed moral psychology research (such as Jonathan Haidt's work), or moral philosophy with the aim of understanding what procedure we would actually want followed, would be amazing.

Research into how to build a powerful AI is probably best not done in public, because it makes it easier to make unsafe AI. But there's no reason not to engage as many good researchers as possible on moral psychology and meta-ethics.

Is the SIAI concerned with the data security of its research? Is the latest research saved unencrypted on EY's laptop and shared between all SIAI members? Could a visiting fellow just walk into the SIAI house, plug-in a USB stick and run with the draft for a seed AI? Those questions arise when you make a distinction between you and the "public". Can that research be detached from decision theory? Since you're working on solutions applicable to AGI, is it actually possible to differentiate between the mathematical formalism of an AGI's utility function and the fields of moral psychology and meta-ethics. In other words, can you learn a lot by engaging with researchers if you don't share the math? That is why I asked if the work can effectively be subdivided if you are concerned with security.
I find this dubious - has this belief been explored in public on this site? If AI research is completely open and public, then more minds and computational resources will be available to analyze safety. In addition, in the event that a design actually does work, it is far less likely to have any significant first mover advantage. Making SIAI's research public and open also appears to be nearly mandatory for proving progress and joining the larger scientific community.
I tend to think that the broadest issues, such as those of meta-ethics, may be discussed by a wider professional community, though of course most meta-ethicists will have little to contribute (divine command theorists, for example). It may still be the case that a small team is best for solving more specific technical problems in programming the AI's utility function and proving that it will not reprogram its own terminal values. But I don't know what SIAI's position is.
My intuition leads me to disagree with the suggestion that a small team might be better. The only conceivable (to me) advantage of keeping the team small would be to minimize the pedagogical effort of educating a large team on the subtleties of the problem and the technicalities of the chosen jargon. But my experience has been that investment in clarity of pedagogy yields dividends in your own understanding of the problem, even if you never get any work or useful ideas out of the yahoos you have trained. And, of course, you probably will get some useful ideas from those people. There are plenty of smart folks out there. Whole generations of them.
People argue that small is better for security reasons. But one could as well argue that large is better for security reasons because there exist more supervision and competition. Do you rather trust 5 people (likely friends) or a hundred strangers working for fame and money? After all we're talking about a project that will result in the implementation of a superhuman AI to destine the future of the universe. A handful of people might do anything, regardless of what they are signaling. But a hundred people are much harder to control. So the security argument runs both ways. The question is what will maximize the chance of success. Here I agree that it will take many more people than are currently working on the various problems.
I agree. But, with Luke, I am assuming that the problem of AGI Friendliness can be addressed independently of the question of actually achieving AGI. Only the second of those two questions requires security - there is no reason not to pursue Friendliness theory openly.
That is probably not true. There may well be some differences, though. For instance, it is hard to see how the corner cases in decision theory that are so discussed around here have much relevance to the problem of actually constructing a machine intelligence - UNLESS you want to prove things about how its goal system behaves under iterative self-modification.
The "smaller is better" idea seems linked to "security through obscurity" - a common term of ridicule in computer security circles. The NSA manage to get away with some security through obscurity - but they are hardly a very small team.
On this sort of problem that can be safely researched publicly, SIAI need not hire people to get them to work on the problem. They can also engage the larger academic community to get them interested in finding practical answers to these questions.

I would ordinarily vote down a post that restated things that most people on LW should already know, but... LW is curiously devoid of discussion on this issue, whether criticism of CEV, or proposals of alternatives. And LP's post hits all the key points, very efficiently.

If LW has a single cultural blind spot, it is that LWers claim to be Bayesians, yet routinely analyze potential futures as if the single "most-likely" scenario, hypothesis, or approach accepted as dogma on LessWrong (fast takeoff, Friendly AI, multiple worlds, CEV, etc.) had probability 1.

6Wei Dai
"Devoid"? * * * * * * Not to mention various comments elsewhere...
Eliezer has stated that he will not give his probability for the successful creation of Friendly AI. Presumably because people would get confused about why working desperately towards it is the rational thing to do despite a low probability. As for CEV 'having a probability of 1', that doesn't even make sense. But an awful lot of people have said that CEV as described in Eliezer's document would be undesirable even assuming the undeveloped parts were made into more than hand wavy verbal references.
I dunno, I perceive a lot of criticism of CEV here-- if I recall correctly there have been multiple top-level posts expressing skepticism of it. And doesn't Robin Hanson (among others) disagree with the hard takeoff scenario?
That's true. (Although notice that not one of those posts has ever gotten the green button.) CEV does not fit well into my second paragraph, since it is not a prerequisite for anything else, and therefore not a point of dependency in an analysis.

It's not just a matter of pace; this perspective also implies a certain prioritization of the questions.

For example, as you say, it's important to conclude soon whether animal welfare is important. (1) (2) But if we preserve the genetic information that creates new animals, we preserve the ability to optimize animal welfare in the future, should we at that time conclude that it is important. (2) If we don't, then later concluding it's important doesn't get us much.

It seems to follow that preserving that information (either in the form of a breeding popula... (read more)

Am I correct in saying that there is not necessarily any satisfactory solution to this problem?

Also, this seems relevant: The Terrible, Horrible, No Good Truth About Morality.

Depends on what would satisfy us, I suppose. I mean, for example, if it turns out that implementing CEV creates a future that everyone living in desires and is made happy and fulfilled and satisfied by and continues to do so indefinitely, and that everyone living now would if informed of the details of also desire and etc., but we are never able to confirm that any of that is right... or worse yet, later philosophical analysis somehow reveals that it isn't right, despite being desirable and fulfilling and satisfying and so forth... well, OK, we can decide at that time whether we want to give up what is desirable and etc. in exchange for what is right, but in the meantime I might well be satisfied by that result. Maybe it's OK to leave future generations some important tasks to implement. Or, if it turns out that EY's approach is all wrong because nobody agrees on anything important to anyone, so that extrapolating humanity's coherent volition leaves out everything that's important to everyone, so that implementing it doesn't do anything important... in that case, coming up with an alternate plan that has results as above would satisfy me. Etc.
It might turn out that what does satisfy us is to be "free", to do what we want, even if that means that we will mess up our own future. It might turn out that humans are only satisfied if they can work on existential problems, "no risk no fun". Or we might simply want to learn about the nature of reality. The mere existence of an FAI might spoil all of it. Would you care to do science if there was some AI-God that already knew all the answers? Would you be satisfied if it didn't tell you the answers or made you forget that it does exist so that you'd try to invent AGI without ever succeeding? But there is another possible end. Even today many people are really bored and don't particularly enjoy life. What if it turns out that there is no "right" out there or that it can be reached fairly easily without any way to maximize it further. In other words, what if fun is something that isn't infinite but a goal that can be reached? What if it all turns out to be wireheading, the only difference between 10 minutes of wireheading or 10^1000 years being the number enumerating the elapsed time? Think about it, would you care about 10^1000 years of inaction? What would you do if that was the optimum? Maybe we'll just decide to choose the void instead.
This is a different context for satisfaction, but to answer your questions: * yes, I often find satisfying working through problems that have already been solved, though I appreciate that not everyone does; * no, I would not want to be denied the solutions if I asked (assuming there isn't some other reason why giving me the solution is harmful), or kept in ignorance of the existence of those solutions (ibid); * if it turns out that all of my desires as they currently exist are fully implemented, leaving me with no room for progress and no future prospects better than endless joy, fulfillment and satisfaction, I'd be satisfied and fulfilled and joyful. * Admittedly, I might eventually become unsatisfied with that and desire something else, at which point I would devote efforts to satisfying that new desire. It doesn't seem terribly likely that my non-existence would be the best possible way of doing so, but I suppose it's possible, and if it happened I would cease to exist.
It might indeed.
0Wei Dai
No, once ostensibly-Friendly AI has run CEV and knows what it wants, it won't matter if we eventually realize that CEV was wrong after all. The OFAI will go on to do what CEV says it should do, and we won't have a say in the matter.
Agreed: avoiding irreversible steps is desirable.

Here is a simple moral rule that should make an AI much less likely to harm the interests of humanity:

Never take any action that would reduce the number of bits required to describe the universe by more than X.

where X is some number smaller than the number of bits needed to describe an infant human's brain. For information-reductions smaller than X, the AI should get some disutility, but other considerations could override. This 'information-based morality' assigns moral weight to anything that makes the universe a more information-filled or complex place,... (read more)

This doesn't work, because the universe could require many bits to describe while those bits were allocated to describing things we don't care about. Most of the information in the universe is in non-morally-significant aspects of the arrangement of molecules, such that things like simple combustion increase the number of bits required to describe the universe (aka the entropy) by a large amount while tiling the universe with paperclips only decreases it by a small amount.