This is for anyone in the LessWrong community who has made at least some effort to read the sequences and follow along, but is still confused on some point, and is perhaps feeling a bit embarrassed. Here, newbies and not-so-newbies are free to ask very basic but still relevant questions with the understanding that the answers are probably somewhere in the sequences. Similarly, LessWrong tends to presume a rather high threshold for understanding science and technology. Relevant questions in those areas are welcome as well.  Anyone who chooses to respond should respectfully guide the questioner to a helpful resource, and questioners should be appropriately grateful. Good faith should be presumed on both sides, unless and until it is shown to be absent.  If a questioner is not sure whether a question is relevant, ask it, and also ask if it's relevant.

New Comment
264 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Well, hmmm. I wonder if this qualifies as "stupid".

Could someone help me summarize the evidence for MWI in the quantum physics sequence? I tried once, and only came up with 1) the fact that collapse postulates are "not nice" (i.e., nonlinear, nonlocal, and so on) and 2) the fact of decoherence. However, the following quote from Many Worlds, One Best Guess (emphasis added):

The debate should already be over. It should have been over fifty years ago. The state of evidence is too lopsided to justify further argument. There is no balance in this issue. There is no rational controversy to teach. The laws of probability theory are laws, not suggestions; there is no flexibility in the best guess given this evidence. Our children will look back at the fact that we were STILL ARGUING about this in the early 21st-century, and correctly deduce that we were nuts.

Is there other evidence as well, then? 1) seems depressingly weak, and as for 2)...

As was mentioned in Decoherence is Falsifiable and Testable, and brought up in the comments, the existence of so-called "microscopic decoherence" (which we have evidence for) is independent from so-called "macr... (read more)

(There are two different argument sets here: 1) against random collapse, and 2) for MWI specifically. It's important to keep these distinct.)
Unless I'm missing something, EY argues that evidence against random collapse is evidence for MWI. See that long analogy on Maxwell's equations with angels mediating the electromagnetic force.
It's also evidence for a bunch of other interpretations though, right? I meant "for MWI specifically"; I'll edit my comment to be clearer.
I agree, which is one of the reasons why I feel 1) alone isn't enough to substantiate "There is no rational controversy to teach" and etc.
Quantum mechanics can be described by a set of postulates. (Sometimes five, sometimes four. It depends how you write them.) In the "standard" Interpretation, one of these postulates invokes something called "state collapse". MWI can be described by the same set of postulates without doing that. When you have two theories that describe the same data, the simpler one is usually the right one.
This falls under 1) above, and is also covered here below. Was there something new you wanted to convey?
I think 1) should probably be split into two arguments, then. One of them is that Many World is strictly simpler (by any mathematical formalization of Occam's Razor.) The other one is that collapse postulates are problematic (which could itself be split into sub-arguments, but that's probably unnecessary). Grouping those makes no sense. They can stand (or fall) independently, they aren't really connected to each other, and they look at the problem from different angles.
Ah, okay, that makes more sense. 1a) (that MWI is simpler than competing theories) would be vastly more convincing than 1b) (that collapse is bad, mkay). I'm going to have to reread the relevant subsequence with 1a) in mind.
I really don't think 1a) is addressed by Eliezer; no offense meant to him, but I don't think he knows very much about interpretations besides MWI (maybe I'm wrong and he just doesn't discuss them for some reason?). E.g. AFAICT the transactional interpretation has what people 'round these parts might call an Occamian benefit in that it doesn't require an additional rule that says "ignore advanced wave solutions to Maxwell's equations". In general these Occamian arguments aren't as strong as they're made out to be.
If you read Decoherence is Simple while keeping in mind that EY treats decoherence and MWI as synonymous, and ignore the superfluous references to MML, Kolmogorov and Solomonoff, then 1a) is addressed there.
The claim in parentheses isn't obvious to me and seems to be probably wrong. If one replaced any with "many" or "most" it seems more reasonable. Why do you assert this applies to any formalization?
Kolmogorov Complexity/Solmanoff Induction and Minimum Message Length have been proven equivalent in their most-developed forms. Essentially, correct mathematical formalizations of Occam's Razor are all the same thing.
The whole point is superfluous, because nobody is going to sit around and formally write out the axioms of these competing theories. It may be a correct argument, but it's not necessarily convincing.
This is a pretty unhelpful way of justifying this sort of thing. Kolmogorv complexity doesn't give a unique result. What programming system one uses as one's basis can change things up to a constant. So simply looking at the fact that Solomonoff induction is equivalent to a lot of formulations isn't really that helpful for this purpose. Moreover, there are other formalizations of Occam's razor which are not formally equivalent to Solomonoff induction. PAC learning is one natural example.
Is it really so strange that people are still arguing over "interpretations of quantum mechanics" when the question of whether atoms existed wasn't settled until one hundred years after John Dalton published his work?
From the Wikipedia fined-tuned universe page (Ikeda & Jeffrey are linked at note 21.) In a nutshell, MWI provides a mechanism whereby a spectrum of universes are produced, some life-friendly and some life-unfriendly. Consistent with the weak anthropic principle, life can only exist in the life-friendly (hence fine-tuned) universes. So, MWI provides an explanation of observed fine-tuning, whereas the standard QM interpretation does not.
That line of reasoning puzzles me, because the anthropic-principle explanation of fine tuning works just fine without MWI: Out of all the conceivable worlds, of course we find ourselves in one that is habitable.
This only works if all worlds that follow the same fundamental theory exist in the same way our local neighborhood exists. If all of space has just one set of constants even though other values would fit the same theory of everything equally well, the anthropic principle does not apply, and so the fact that the universe is habitable is ordinary Bayesian evidence for something unknown going on.
The word "exist" doesn't do any useful work here. There are conceivable worlds that are different from this one, and whether they exist depends on the definition of "exist". But they're still relevant to an anthropic argument. The habitability of the universe is not evidence of anything because the probability of observing a habitable universe is practically unity.
Can you clarify why a conceivable world that doesn't exist in the conventional sense of existing is relevant to an anthropic argument? I mean, if I start out as part of a group of 2^10 people, and that group is subjected to an iterative process whereby we split the group randomly into equal subgroups A and B and kill group B, then at every point along the way I ought to expect to have a history of being sorted into group A if I'm alive, but I ought not expect to be alive very long. This doesn't seem to depend in any useful way on the definition of "alive." Is it different for universes? Why?
I agree with all that. I don't quite see where that thought experiment fits into the discussion here. I see that the situation where we have survived that iterative process is analogous to fine-tuning with MWI, and I agree that fine-tuning is unsurprising given MWI. I further claim that fine-tuning is unsurprising even in a non-quantum universe. Let me describe the though experiment I have in mind: Imagine a universe with very different physics. (1) Suppose the universe, by nature, splits into many worlds shortly after the beginning of time, each with different physical constants, only one of which allows for life. The inhabitants of that one world ought not to be surprised at the fine-tuning they observe. This is analogous to fine-tuning with MWI. (2) Now suppose the universe consists of many worlds at its inception, and these other worlds can be observed only with great difficulty. Then the inhabitants still ought not to be surprised by fine-tuning. (3) Now suppose the universe consists of many worlds from its inception, but they are completely inaccessible, and their existence can only be inferred from the simplest scientific model of the universe. The inhabitants still ought not to be surprised by fine-tuning. (4) Now suppose the simplest scientific model describes only one world, but the physical constants are free parameters. You can easily construct a parameterless model that says "a separate world exists for every choice of parameters somehow", but whether this means that those other worlds "exist" is a fruitless debate. The inhabitants still ought not to be surprised by fine-tuning. This is what I mean when I say that fine-tuning is not surprising even without MWI. In cases (1)-(4), the inhabitants can make an anthropic argument: "If the physical constants were different, we wouldn't be here to wonder about them. We shouldn't be surprised that they allow us to exist." Does that makes sense?
Ah, I see. Yes, I agree: as long as there's some mechanism for the relevant physical constants to vary over time, anthropic arguments for the "fined-tuned" nature of those constants can apply; anthropic arguments don't let us select among such mechanisms. Thanks for clarifying.
Hm, only the first of the four scenarios in the grandparent involves physical constants varying over time. But yes, anthropic arguments don't distinguish between the scenarios.
Huh. Then I guess I didn't understand you after all. You're saying that in scenario 4, the relevant constants don't change once set for the first time? In that case this doesn't fly. If setting the constants is a one-time event in scenario 4, and most possible values don't allow for life, then while I ought not be surprised by the fine-tuning given that I observe something (agreed), I ought to be surprised to observe anything at all. That's why I brought up the small-scale example. In that example, I ought not be surprised by the history of A's given that I observe something, but I ought to be surprised to observing anything in the first place. If you'd asked me ahead of time whether I would survive I'd estimate a .0001 chance... a low-probability event. If my current observed environment can be explained by positing scenarios 1-4, and scenario 4 requires assuming a low-probability event that the others don't, that seems like a reason to choose 1-3 instead.
I'm saying that in all four scenarios, the physical constants don't change once set for the first time. And in scenarios (2)-(4), they are set at the very beginning of time. I was confused as to why you started talking about changing constants, but it occurs to me that we may have different ideas about how the MWI explanation of fine-tuning is supposed to run. I admit I'm not familiar with cosmology. I imagine the Big Bang occurs, the universal wavefunction splits locally into branches, the branches cool down and their physical constants are fixed, and over the next 14 billion years they branch further but their constants do not change, and then life evolves in some of them. Were you imagining our world constantly branching into other worlds with slightly different constants?
No, I wasn't; I don't think that's our issue here. Let me try it this way. If you say "I'm going to roll a 4 on this six-sided die", and then you roll a 4 on a six-sided die, and my observations of you are equally consistent with both of the following theories: Theory T1: You rolled the die exactly once, and it came up a 4 Theory T2: You rolled the die several times, and stopped rolling once it came up 4 ...I should choose T2, because the observed result is less surprising given T2 than T1. Would you agree? (If you don't agree, the rest of this comment is irrelevant: that's an interesting point of disagreement I'd like to explore further. Stop reading here.) OK, good. Just to have something to call it, let's call that the Principle of Least Surprise. Now, suppose that in all scenarios constants are set shortly after the creation of a world, and do not subsequently change, but that the value of a constant is indeterminate prior to being set. Suppose further that life-supporting values of constants are extremely unlikely. (I think that's what we both have been supposing all along, I just want to say it explicitly.) In scenario 1-3, we have multiple worlds with different constants. Constants that support life are unlikely, but because there are multiple worlds, it is not surprising that at least one world exists with constants that support life. We'd expect that, just like we'd expect a six-sided die to come up '4' at least once if tossed ten times. We should not be surprised that there's an observer in some world, and that world has constants that support life, in any of these cases. In scenario 4, we have one world with one set of constants. It is surprising that that world has life-supporting constants. We ought not expect that, just like we ought not expect a six-sided die to come up '4' if tossed only once. We should be surprised that there's an observer in some world. So. If I look around, and what I observe is equally consistent with scenarios 1-4, the Pr
This bit is slightly ambiguous. I would agree if Theory T1 were replaced by "You decided to roll the die exactly once and then show me the result", and Theory T2 were replaced by "You decided to roll the die until it comes up '4', and then show me the result", and the two theories have equal prior probability. I think this is probably what you meant, so I'll move on. I agree that we should not be surprised. Although I have reservations about drawing this analogy, as I'll explain below. If we take scenario 4 as I described it — there's a scientific model where the constants are free parameters, and a straightforward parameterless modification of the model (of equal complexity) that posits one universe for every choice of constants — then I disagree; we should not be surprised. I disagree because I think the die-rolling scenario is not a good analogy for scenarios 1-4, and scenario 4 resembles Theory T2 at least as much as Theory T1. * Scenario 4 as I described it basically is scenario 3. The theory with free parameters isn't a complete theory, and the parameterless theory sorta does talk about other universes which kind of exist, in the sense that a straightforward interpretation of the parameterless theory talks about other universes. So scenario 4 resembles Theory T2 at least as much as it resembles Theory T1. * You could ask why we can't apply the same argument in the previous bullet point to the die-rolling scenario and conclude that Theory T1 is just as plausible as Theory T2. (If you don't want to ask that, please ignore the rest of this bullet point, as it could spawn an even longer discussion.) We can't because the scenarios differ in essential ways. To explain further I'll have to talk about Solomonoff induction, which makes me uncomfortable. The die-rolling scenario comes with assumptions about a larger universe with a causal structure such that (Theory T1 plus the observation '4') has greater K-complexity than (Theory T2 plus the observation '4'). Bu
I didn't really follow this, I'm afraid. It seems to follow from what you're saying that the assertions "a world containing an observer exists in scenario 4" and "a world containing an observer doesn't exist in scenario 4" don't make meaningful different claims about scenario 4, since we can switch from a model that justifies the first to a model that justifies the second without any cost worth considering. If that's right, then I guess it follows from the fact that I should be surprised to observe an environment in scenario 4 that I should not be surprised to observe an environment in scenario 4, and vice-versa, and there's not much else I can think of to say on the subject.
By 'explain observed fine-tuning', I mean 'answer the question why does there exist a universe (which we inhabit) which is fine-tuned to be life-friendly.' The anthropic principle, while tautologically true, does not answer this question, in my view. In other words, the existence of life does not cause our universe to be life-friendly (of course it implies that the universe is life friendly); rather, the life-friendliness of our universe is a prerequisite for the existence of life.
We may have different ideas of what sort of answers a "why does this phenomenon occur?" question deserves. You seem to be looking for a real phenomenon that causes fine-tuning, or which operates at a more fundamental level of nature. I would be satisfied with a simple, plausible fact that predicts the phenomenon. In practice, the scientific hypotheses with the greatest parsimony and predictive power tend to be causal ones, or hypotheses that explain observed phenomena as arising from more fundamental laws. But the question of where the fundamental constants of nature come from will be an exception if they are truly fundamental and uncaused.
You're right that observing that we're in a habitable universe doesn't tell us anything. However, there are a lot more observations about the universe that we use in discussions about quantum mechanics. And some observations suit the idea that we're know what's going on better than others. "Know what's going on" here means that a theory that is sufficient to explain all of reality in our local neighborhood is also followed more globally.
I glanced at Ikeda & Jefferys, and they seem to explicitly not presuppose MWI: At first glance, they seem to render the fine-tuning phenomenon unsurprising using only an anthropic argument, without appealing to multiverses or a simulator. I am satisfied that someone has written this down.
As a step toward this goal, I would really appreciate someone rewriting the post you mentioned to sound more like science and less like advocacy. I tried to do that, but got lost in the forceful emotional assertions about how collapse is a gross violation of Bayes, and how "The discussion should simply discard those particular arguments and move on."
Here's some evidence for macroscopic decoherence.
The interpretations of quantum mechanics that this sort of experiment tests are not all of the same ones as the ones Eliezer argues against. You can have "one world" interpretations that appear exactly identical to many-worlds, and indeed that's pretty typical. Maybe I should have written this in reply to the original post.
Actually, this is evidence for making a classical object behave in a quantum way, which seems like the opposite of decoherence.
I don't understand your point. How would you demonstrate macroscopic decoherence without creating a coherent object which then decoheres?

If the SIAI engineers figure out how to construct friendly super-AI, why would they care about making it respect the values of anyone but themselves? What incentive do they have to program an AI that is friendly to humanity, and not just to themselves? What's stopping LukeProg from appointing himself king of the universe?

Not an answer, but a solution:

You know what they say the modern version of Pascal's Wager is? Sucking up to as many Transhumanists as possible, just in case one of them turns into God. -- Julie from Crystal Nights by Greg Egan


What's stopping LukeProg from appointing himself king of the universe?

Personal abhorrence at the thought, and lack of AI programming abilities. :)

(But, your question deserves a more serious answer than this.)

Too late - Eliezer and Will Newsome are already dual kings of the universe. They balance each other's reigns in a Ying/Yang kind of way.

This is basically what I was asking before. Now, it seems to me highly unlikely that SIAI is playing that game, but I still want a better answer than "Trust us to not be supervillains".
Serious or not, it seems correct. There might be some advanced game thoery that says otherwise, but it only aplies to those who know the game theory.
Lots of incorrect answers in other replies to this one. The real answer is that, from Luke's perspective, creating Luke-friendly AI and becoming king of the universe isn't much better than creating regular friendly AI and getting the same share of the universe as any other human. Because it turns out, after the first thousand galaxies worth of resources and trillion trillion millenia of lifespan, you hit such diminishing returns that having another seven-billion times as many resources isn't a big deal. This isn't true for every value - he might assign value to certain things not existing, like powerful people besides him, which other people want to exist. And that last factor of seven billion is worth something. But these are tiny differences in value, utterly dwarfed by the reduced AI-creation success-rate that would happen if the programmers got into a flamewar over who should be king.
2John_Maxwell The rest of your question has the same answer as "why is anyone altruist to begin with", I think.
I understand CEV. What I don't understand is why the programmers would ask the AI for humanity's CEV, rather than just their own CEV.

I understand CEV. What I don't understand is why the programmers would ask the AI for humanity's CEV, rather than just their own CEV.

The only (sane) reason is for signalling - it's hard to create FAI without someone else stopping you. Given a choice, however, CEV is strictly superior. If you actually do want to have FAI then FAI will be equivalent to it. But if you just think you want FAI but it turns out that, for example, FAI gets dominated by jerks in a way you didn't expect then FAI will end up better than FAI... even from a purely altruistic perspective.

Yeah, I've wondered this for a while without getting any closer to an understanding. It seems that everything that some human "really wants" (and therefore could potentially be included in the CEV target definition) is either something that, if I was sufficiently well-informed about it, I would want for that human (in which case my CEV, properly unpacked by a superintelligence, includes it for them) or is something that, no matter how well informed I was, I would not want for that human (in which case it's not at all clear that I ought to endorse implementing it). If CEV-humanity makes any sense at all (which I'm not sure it does), it seems that CEV-arbitrary-subset-of-humanity makes leads to results that are just as good by the standards of anyone whose standards are worth respecting. My working answer is therefore that it's valuable to signal the willingness to do so (so nobody feels left out), and one effective way to signal that willingness consistently and compellingly is to precommit to actually doing it.
Is this question any different from the question of why there are altruists?
Sure. For example, if I want other people's volition to be implemented, that is sufficient to justify altruism. (Not necessary, but sufficient.) But that doesn't justify directing an AI to look at other people's volition to determine its target directly... as has been said elsewhere, I can simply direct an AI to look at my volition, and the extrapolation process will naturally (if CEV works at all) take other people's volition into account.
I think it would be significantly easier to make FAI than LukeFreindly AI: for the latter, you need to do most of the work involved in the former, but also work out how to get the AI to find you (and not accidentally be freindly to someone else). If it turns out that there's a lot of coherance in human values, FAI will resemble LukeFreindlyAI quite closely anyway.

I think it would be significantly easier to make FAI than LukeFreindly AI

Massively backwards! Creating an FAI (presumably 'friendly to humanity') requires an AI that can somehow harvest and aggregate preferences over humans in general but an FAI just needs to scan one brain.

Scanning is unlikely to be the bottleneck for a GAI, and it seems most of the difficulty with CEV is from the Extrapolation part, not the Coherence.
It doesn't matter how easy the parts may be, scanning, extrapolating and cohering all of humanity is harder than scanning and extrapolating Luke.
Not if Luke's values contain pointers to all those other humans.
If FAI is HumanityFriendly rather than LukeFriendly, you have to work out how to get the AI to find humanity and not accidentally optimize for the extrapolated volition of some other group. It seems easier to me to establish parameters for "finding" Luke than for "finding" humanity.
Yes, it depends on whether you think Luke is more different from humanity than humanity is from StuffWeCareNotOf
Of course an arbitrarily chosen human's values are more similar to to the aggregated values of humanity as a whole than humanity's values are similar to an arbitrarily chosen point in value-space. Value-space is big. I don't see how my point depends on that, though. Your argument here claims that "FAI" is easier than "LukeFriendlyAI" because LFAI requires an additional step of defining the target, and FAI doesn't require that step. I'm pointing out that FAI does require that step. In fact, target definition for "humanity" is a more difficult problem than target definition for "Luke"
I find it much more likely that it's the other way around; making one for a single brain that already has an utility function seems much easier than finding out a good compromise between billions. Especially if the form "upload me, then preform this specific type of enchantment to enable me to safely continue self improving." turns out to be safe enough.
Game theory. If different groups compete in building a "friendly" AI that respects only their personal extrapolated coherent violation (extrapolated sensible desires) then cooperation is no longer an option because the other teams have become "the enemy". I have a value system that is substantially different from Eliezer's. I don't want a friendly AI that is created in some researcher's personal image (except, of course, if it's created based on my ideals). This means that we have to sabotage each other's work to prevent the other researchers to get to friendly AI first. This is because the moment somebody reaches "friendly" AI the game is over and all parties except for one lose. And if we get uFAI everybody loses. That's a real problem though. If different fractions in friendly AI research have to destructively compete with each other, then the probability of unfriendly AI will increase. That's real bad. From a game theory perspective all FAI researchers agree that any version of FAI is preferable to uFAI, and yet they're working towards a future where uFAI is becoming more and more likely! Luckily, if the FAI researchers take the coherent extrapolated violation of all of humanity the problem disappears. All FAI researchers can work to a common goal that will fairly represent all of humanity, not some specific researcher's version of "FAI". It also removes the problem of different morals/values. Some people believe that we should look at total utility, other people believe we should consider only average utility. Some people believe abstract values matter, some people believe consequences of actions matter most. Here too the solution of an AI that looks at a representative set of all human values is the solution that all people can agree on as most "fair". Cooperation beats defection. If Luke were to attempt to create a LukeFriendlyAI he knows he's defecting from the game theoretical optimal strategy and thereby increasing the probability of a world with uFAI. I
Game Theory only helps us if it's impossible to deceive others. If one is able to engage in deception, the dominant strategy becomes to pretend to support CEV FAI while actually working on your own personal God in a jar. AI development in particular seems an especially susceptible domain for deception. The creation of a working AI is a one time event, it's not like most stable games in nature which allow one to detect defections of hundreds of iterations. The creation of a working AI (FAI or uFAI) is so complicated that it's impossible for others to check if any given researcher is defecting or not. Our best hope then is for the AI project to be so big it cannot be controlled by a single entity and definitely not by a single person. If it only takes guy in a basement getting lucky to make an AI go FOOM, we're doomed. If it takes ten thousand researchers collaborating in the biggest group coding project ever, we're probably safe. This is why doing work on CEV is so important. So we can have that piece of the puzzle already built when the rest of AI research catches up and is ready to go FOOM.
This doesn't apply to all of humanity, just to AI researchers good enough to pose a threat.
As I understand the terminology, AI that only respects some humans' preferences is uFAI by definition. Thus: is actually unFriendly, as Eliezer uses the term. Thus, the researcher you describe is already an "uFAI researcher" ---------------------------------------- What do you mean by "representative set of all human values"? Is there any reason to that the resulting moral theory would be acceptable to implement on everyone?
Absolutely. I used "friendly" AI (with scare quotes) to denote it's not really FAI, but I don't know if there's a better term for it. It's not the same as uFAI because Eliezer's personal utopia is not likely to be valueless by my standards, whereas a generic uFAI is terrible from any human point of view (paperclip universe, etc).
I guess it just doesn't bother me that uFAI includes both indifferent AI and malicious AI. I honestly think that indifferent AI is much more likely than malicious (Clippy is malicious, but awfully unlikely), but that's not good for humanity's future either.
Right now, and for the foreseeable future, SIAI doesn't have the funds to actually create FAI. All they're doing is creating a theory for friendliness, which can be used when someone else has the technology to create AI. And of course, nobody else is going to use the code if it focuses on SIAI.
Funds are not a relevant issue for this particular achievement at present time. It's not yet possible to create a FAI even given all the money in the world; a pharaoh can't build a modern computer. (Funds can help with moving the time when (and if) that becomes possible closer, improving the chances that it happens this side of an existential catastrophe.)
Yeah, I was assuming that they were able to create FAI for the sake of responding to the grandparent post. If they weren't, then there wouldn't be any trouble with SIAI making AI only friendly to themselves to begin with.
If they have all the threory and coded it and whatnot, where is the cost coming from?
The theory for friendliness is completely separate from the theory of AI. So, assuming they complete one does not mean that they complete the other. Furthermore, for something as big as AI/FAI, the computing power required is likely to be huge, which makes it unlikely that a small company like SIAI will be able to create it. Though, I suppose it might be possible if they were able to get large enough loans, I don't have the technical knowledge to say how much computing power is needed or how much that would cost.
??? Maybe I'm being stupid, but I suspect it's fairly hard to fully and utterly solve the friendliness problem without, by the end of doing so, AT LEAST solving many of the tricky AI problems in general.
Now that I understand your question better, here's my answer: Let's say the engineers decide to make the AI respect only their values. But if they were the sort of people who were likely to do that, no one would donate money to them. They could offer to make the AI respect the values of themselves and their donors, but that would alienate everyone else and make the lives of themselves and their donors difficult. The species boundary between humans and other living beings is a natural place to stop expanding the circle of enfranchised agents.
This seems to depend on the implicit assumption that their donors (and everyone else powerful enough to make their lives difficult) don't mind having the values of third parties respected. If some do mind, then there's probably some optimally pragmatic balancing point short of all humans.
Probably, but defining that balancing point would mean a lot of bureaucratic overhead to determine who to exclude or include.
Can you expand on what you mean by "bureaucratic" here?
Are people going to vote on whether someone should be included? Is there an appeals process? Are all decisions final?
OK, thanks. It seems to me all these questions arise for "include everyone" as well. Somewhere along the line someone is going to suggest "don't include fundamentalist Christians", for example, and if I'm committed to the kind of democratic decision process you imply, then we now need to have a vote, or at least decide whether we have a vote, etc. etc, all of that bureaucratic overhead. Of course, that might not be necessary; I could just unilaterally override that suggestion, mandate "No, we include everyone!", and if I have enough clout to make that stick, then it sticks, with no bureaucratic overhead. Yay! This seems to more or less be what you have in mind. It's just that the same goes for "Include everyone except fundamentalist Christians." In any case, I don't see how any of this cumbersome democratic machinery makes any sense in this scenario. Actually working out CEV implies the existence of something, call it X, that is capable of extrapolating a coherent volition from the state of a group of minds. What's the point of voting, appeals, etc. when that technology is available? X itself is a better solution to the same problem. Which implies that it's possible to identify a smaller group of minds as the Advisory Board and say to X "Work out the Advisory Board's CEV with respect to whose minds should be included as input to a general-purpose optimizer's target definition, then work out the CEV of those minds with respect to the desired state of the world." Then anyone with enough political clout to get in my way, I add to the Advisory Board, thereby ensuring that their values get taken into consideration (including their values regarding whose values get included). That includes folks who think everyone should get an equal say, folks who think that every human should get an equal say, folks who think that everyone with more than a certain threshold level of intelligence and moral capacity get a say, folks who think that everyone who agrees with them get a
There is no clear bright line determining who is or is not a fundamentalist Christian. Right now, there pretty much is a clear bright line determining who is or is not human. And that clear bright line encompasses everyone we would possibly want to cooperate with. Your advisory board suggestion ignores the fact that we have to be able to cooperate prior to the invention of CEV deducers. And you're not describing a process for how the advisory board is decided either. Different advisory boards may produce different groups of enfranchised minds. So your suggestion doesn't resolve the problem. In fact, I don't see how putting a group of minds on the advisory board is any different than just making them the input to the CEV. If a person's CEV is that someone's mind should contribute to the optimizer's target, that will be their CEV regardless of whether it's measured in an advisory board context or not.
There is no clear bright line determining what is or isn't a clear bright line. I agree that the line separating "human" from "non-human" is much clearer and brighter than that separating "fundamentalist Christian" from "non-fundamentalist Christian", and I further agree that for minds like mine the difference between those two lines is very important. Something with a mind like mine can work with the first distinction much more easily than with the second. So what? A mind like mine doesn't stand a chance of extrapolating a coherent volition from the contents of a group of target minds. Whatever X is, it isn't a mind like mine. If we don't have such an X available, then it doesn't matter what defining characteristic we use to determine the target group for CEV extrapolation, because we can't extrapolate CEV from them anyway. If we do have such an X available, then it doesn't matter what lines are clear and bright enough for minds like mine to reliably work with; what matters is what lines are clear and bright enough for systems like X to reliably work with. I have confidence < .1 that either one of us can articulate a specification determining who is human that doesn't either include or exclude some system that someone included in that specification would contest the inclusion/exclusion of. I also have confidence < .1 that, using any definition of "human" you care to specify, the universe contains no nonhuman systems I would possibly want to cooperate with. Sure, but so does your "include all humans" suggestion. We're both assuming that there's some way the AI-development team can convincingly commit to a policy P such that other people's decisions to cooperate will plausibly be based on the belief that P will actually be implemented when the time comes; we are neither of us specifying how that is actually supposed to work. Merely saying "I'll include all of humanity" isn't good enough to ensure cooperation if nobody believes me. I have confidence that, giv
Complicated or ambiguous schemes take more time to explain, get more attention, and risk folks spending time trying to gerrymander their way in instead of contributing to FAI. I think any solution other than "enfranchise humanity" is a potential PR disaster. Keep in mind that not everyone is that smart, and there are some folks who would make a fuss about disenfranchisement of others even if they themselves were enfranchised (and therefore, by definition, those they were making a fuss about would be enfranchised if they thought it was a good idea). I agree there are potential ambiguity problems with drawing the line at humans, but I think the potential problems are bigger with other schemes. I agree there are potential problems with credibility, but that seems like a separate argument. It's not all or nothing. The more inclusive the enfranchisement, the more cooperation there will be in general. With that scheme, you're incentivizing folks to prove they have enough political clout to get in your way. Moreover, humans aren't perfect reasoning systems. Your way of determining enfranchisement sounds a lot more adversarial than mine, which would affect the tone of the effort in a big and undesirable way. Why do you think that the right to vote in democratic countries is as clearly determined as it is? Restricting voting rights to those of a certain IQ or higher would be a politically unfeasible PR nightmare. Again, this is a different argument about why people cooperate instead of defect. To a large degree, evolution hardwired us to cooperate, especially when others are trying to cooperate with us. I agree that if the FAI project seems to be staffed with a lot of untrustworthy, selfish backstabbers, we should cast a suspicious eye on it regardless of what they say about their project. Ultimately it probably doesn't matter much what their broadcasted intention towards the enfranchisement of those outside their group is, since things will largely come down to w
That's not clear to me. Suppose the Blues and the Greens are political opponents. If I credibly commit to pointing my CEV-extractor at all the Blues, I gain the support of most Blues and the opposition of most Greens. If I say "at all Blues and Greens" instead, I gain the support of some of the Greens, but I lose the support of some of the Blues, who won't want any part of a utopia patterned even partially on hateful Green ideologies. This is almost undoubtedly foolish of the Blues, but I nevertheless expect it. As you say, people aren't all that smart. The question is, is the support I gain from the Greens by including them worth the support I lose from the Blues by including the Greens? Of course it depends. That said, the strong support of a sufficiently powerful small group is often more valuable than the weak support of a more powerful larger group, so I'm not nearly as convinced as you sound that saying "we'll incorporate the values of both you and your hated enemies!" will get more net support than picking a side and saying "we'll incorporate your values and not those of your hated enemies." Sure, that's true. Heck, they don't have to prove it; if they give me enough evidence to consider it plausible, I'll include 'em. So what? I think you underestimate how threatening egalitarianism sounds to a lot of people, many of whom have a lot of power. Cf including those hateful Greens, above. That said, I suspect there's probably ways to spin your "include everyone" idea in such a way that even the egalitarianism-haters will not oppose it too strongly. But I also suspect there's ways to spin my "don't include everyone" idea in such a way that even the egalitarianism-lovers will not oppose it too strongly. Because many people believe it represents power. That's also why it's not significantly more clearly determined. It's also why that right is not universal. Sure, I agree. Nor would I recommend announcing that we're restricting the advisory board to people of
What you see as the factors holding back people from cooperating with modern analogues of FAI projects? Do you think those modern analogues could derive improved cooperation through broadcasting specific enfranchisement policy? As a practical matter, it looks to me like the majority of wealthy, intelligent, rational modern folks an FAI project might want to cooperate with lean towards egalitarianism and humanism, not blues versus greens type sectarianism. If you don't think someone has enough political clout to bother with, they'll be incentivized to prove you wrong. Even if you're right most of the time, you'll be giving yourself trouble. I agree that very young humans are a potential difficult gray area. One possible solution is to simulate their growth into adults before computing their CEV. Presumably the age at which their growth should be simulated up to is not as controversial as who should be included. FAI team trustworthiness is a different subject than optimal enfranchisement structure.
I'm not sure what those modern analogues are, but in general here are a few factors I see preventing people from cooperating on projects where both mutual cooperation and unilateral cooperation would be beneficial: * Simple error in calculating the expected value of cooperating. * Perceiving more value in obtaining higher status within my group by defending my group's wrong beliefs about the project's value than in defecting from my group by cooperating in the project * Perceiving more value in continuing to defend my previously articulated position against the project (e.g., in being seen as consistent or as capable of discharging earlier commitments) than in changing my position and cooperating in the project Why do you ask? I suspect that would be an easier question to answer with anything other than "it depends" if I had a specific example to consider. In general, I expect that it depends on who is motivated to support the project now to what degree, and the specific enfranchisement policy under discussion, and what value they perceive in that policy. Sure, that's probably true, at least for some values of "lean towards" (there's a lot to be said here about actual support and signaled support but I'm not sure it matters). And it will likely remain true for as long as the FAI project in question only cares about the cooperation of wealthy, intelligent, rational modern folks, which they are well advised to continuing to do for as long as FAI isn't a subject of particular interest to anyone else, and to stop doing as soon as possible thereafter. (shrug) Sure, there's some nonzero expected cost to the brief window between when they start proving their influence and I concede and include them. Can you clarify what the relevant difference is between including a too-young person in the target for a CEV-extractor, vs. pointing a growth-simulator at the too-young-person and including the resulting simulated person in the target for a CEV-extractor? I agree with
It was mainly rhetorical; I tend to think that what holds back today's FAI efforts is lack of rationality and inability of folks to take highly abstract arguments seriously. Potentially bad things that could happen from implementing the CEV of a two-year-old.
I conclude that I do not understand what you think the CEV-extractor is doing.
Humans acquire morality as part of their development. Three-year-olds have a different, more selfish morality than older folks. There's no reason in principle why a three-year-old who was "more the person he wished he was" would necessarily be a moral adult... CEV does not mean considering the preferences of an agent who is "more moral". There is no such thing. Morality is not a scalar quantity. I certainly hope the implementation would end up favoring the sort of morals I like enough to calculate the CEV of a three-year-old and get an output similar to that of an adult, but it seems like a bad idea to count on the implementation being that robust.
Consider the following three target-definitions for a superhuman optimizer: a) one patterned on the current preferences of a typical three-year-old b) one patterned on the current preferences of a typical thirty-year old c) one that is actually safe to implement (aka "Friendly") I understand you to be saying that the gulf between A and C is enormous, and I quite agree. I have not the foggiest beginnings of a clue how one might go about building a system that reliably gets from A to C and am not at all convinced it's possible. I would say that the gulf between B and C is similarly enormous, and I'm equally ignorant of how to build a system that spans it. But this whole discussion (and all discussions of CEV-based FAI) presumes that this gulf is spannable in practice. If we can span the B-C gulf, I take that as strong evidence indicating that we can span the A-C gulf. Put differently: to talk seriously about implementing an FAI based on the CEV of thirty-year-olds, but at the same time dismiss the idea of doing so based on the CEV of three-year-olds, seems roughly analogous to seriously setting out to build a device that lets me teleport from Boston to Denver without occupying the intervening space, but dismissing the idea of building one that goes from Boston to San Francisco as a laughable fantasy because, as everyone knows, San Francisco is further away than Denver. That's why I said I don't understand what you think the extractor is doing. I can see where, if I had a specific theory of how a teleporter operates, I might confidently say that it can span 2k miles but not 3k miles, arbitrary as that sounds in the absence of such a theory. Similarly, if I had a specific theory of how a CEV-extractor operates, I might confidently say it can work safely on a 30-year-old mind but not a 3-year-old. It's only in the absence of such a theory that such a claim is arbitrary.
It seems likely to me that the CEV of the 30-year-old would be friendly and the CEV of the three-year-old would not be, but as you say at this point it's hard to say much for sure.
(nods) That follows from what you've said earlier. I suspect we have very different understandings of how similar the 30-year-old's desires are to their volition. Perhaps one way of getting at that difference is thus: how likely do you consider it that the CEV of a 30-year-old would be something that, if expressed in a form that 30-year-old can understand (say, for example, the opportunity to visit a simulated world for a year that is constrained by that CEV), would be relatively unsurprising to that 30-year-old... something that would elicit "Oh, cool, yeah, this is more or less what I had in mind" rather than "Holy Fucking Mother of God what kind of an insane world IS this?!?"? For my own part, I consider the latter orders of magnitude more likely.
I'm pretty uncertain.
Is there? What about unborn babies? What about IVF fetuses? People in comas? Cryo-presevered bodies? Sufficiently-detailed brain scans?
Short answer is that they're nice people, and they understand that power corrupts, so they can't even rationalize wanting to be king of the universe for altruistic reasons. Also, a post-Singularity future will probably (hopefully) be absolutely fantastic for everyone, so it doesn't matter whether you selfishly get the AI to prefer you or not.
I for one welcome our new singularitarian overlords!

Before I ask these questions, I'd like to say that my computer knowledge is limited to "if it's not working, turn it off and turn it on again" and the math I intuitively grasp is at roughly a middle-school level, except for statistics, which I'm pretty talented at. So, uh... don't assume I know anything, okay? :)

How do we know that an artificial intelligence is even possible? I understand that, in theory, assuming that consciousness is completely naturalistic (which seems reasonable), it should be possible to make a computer do the things neurons do to be conscious and thus be conscious. But neurons work differently than computers do: how do we know that it won't take an unfeasibly high amount of computer-form computing power to do what brain-form computing power does?

I've seen some mentions of an AI "bootstrapping" itself up to super-intelligence. What does that mean, exactly? Something about altering its own source code, right? How does it know what bits to change to make itself more intelligent? (I get the feeling this is a tremendously stupid question, along the lines of "if people evolved from apes then why are there still apes?")

Finally, why is SIAI the best place for artificial intelligence? What exactly is it doing differently than other places trying to develop AI? Certainly the emphasis on Friendliness is important, but is that the only unique thing they're doing?

Consciousness isn't the point. A machine need not be conscious, or "alive", or "sentient," or have "real understanding" to destroy the world. The point is efficient cross-domain optimization. It seems bizarre to think that meat is the only substrate capable of efficient cross-domain optimization. Computers already surpass our abilities in many narrow domains; why not technology design or general reasoning, too?

Neurons work differently than computers only at certain levels of organization, which is true for every two systems you might compare. You can write a computer program that functionally reproduces what happens when neurons fire, as long as you include enough of the details of what neurons do when they fire. But I doubt that replicating neural computation is the easiest way to build a machine with a human-level capacity for efficient cross-domain optimization.

How does it know what bits to change to make itself more intelligent?

There is an entire field called "metaheuristics" devoted to this, but nothing like improving general abilities at efficient cross-domain optimization. I won't say more about this at the moment because I'm writi... (read more)

Thank you for the link to the Chalmers article: it was quite interesting and I think I now have a much firmer grasp on why exactly there would be an intelligence explosion.
(I see what you mean, but technically speaking your second sentence is somewhat contentious and I don't think it's necessary for your point to go through. Sorry for nitpicking.)
(Slepnev's "narrow AI argument" seems to be related. A "narrow AI" that can win world-optimization would arguably lack person-like properties, at least on the stage where it's still a "narrow AI".)
This is wrong in a boring way; you're supposed to be wrong in interesting ways. :-)
What prevents you from making a meat-based AI?
Obligatory link.
A couple of things come to mind, but I've only been studying the surrounding material for around eight months so I can't guarantee a wholly accurate overview of this. Also, even if accurate, I can't guarantee that you'll take to my explanation. Anyway, the first thing is that brain form computing probably isn't a necessary or likely approach to artificial general intelligence (AGI) unless the first AGI is an upload. There doesn't seem to be good reason to build an AGI in a manner similar to a human brain and in fact, doing so seems like a terrible idea. The issues with opacity of the code would be nightmarish (I can't just look at a massive network of trained neural networks and point to the problem when the code doesn't do what I thought it would). The second is that consciousness is not necessarily even related to the issue of AGI, the AGI certainly doesn't need any code that tries to mimick human thought. As far as I can tell, all it really needs (and really this might be putting more constraints than are necessary) is code that allows it to adapt to general environments (transferability) that have nice computable approximations it can build by using the data it gets through it's sensory modalities (these can be anything from something familiar, like a pair of cameras, or something less so like a geiger counter or some kind of direct feed from thousands of sources at once). Also, a utility function that encodes certain input patterns with certain utilities, some [black box] statistical hierarchical feature extraction [/black box] so it can sort out useful/important features in its environment that it can exploit. Researchers in the areas of machine learning and reinforcement learning are working on all of this sort of stuff, it's fairly mainstream. As far as computing power - the computing power of the human brain is definitely measurable so we can do a pretty straightforward analysis of how much more is possible. As far as raw computing power, I think we're
I am not entirely sure I understood what was meant by those two paragraphs. Is a rough approximation of what you're saying "an AI doesn't need to be conscious, an AI needs code that will allow it to adapt to new environments and understand data coming in from its sensory modules, along with a utility function that will tell it what to do"?
Yeah, I'd say that's a fair approximation. The AI needs a way to compress lots of input data into a hierarchy of functional categories. It needs a way to recognize a cluster of information as, say, a hammer. It also needs to recognize similarities between a hammer and a stick or a crow bar or even a chair leg, in order to queue up various policies for using that hammer (if you've read Hofstadter, think of analogies) - very roughly, the utility function guides what it "wants" done, the statistical inference guides how it does it (how it figures out what actions will accomplish its goals). That seems to be more or less what we need for a machine to do quite a bit. If you're just looking to build any AGI, he hard part of those two seems to be getting a nice, working method for extracting statistical features from its environment in real time. The (significantly) harder of the two for a Friendly AI is getting the utility function right.
Interestingly, hypothetical UFAI (value drift) risk is something like other existential risks in its counterintuitive impact, but more so, in that (compared to some other risks) there are many steps where you can fail, that don't appear dangerous beforehand (because nothing like that ever happened), but that might also fail to appear dangerous after-the-fact, and therefore as properties of imagined scenarios where they're allowed to happen. The grave implications aren't easy to spot. Assuming soft takeoff, a prototype AGI escapes to the Internet - would that be seen as a big deal if it didn't get enough computational power to become too disruptive? In 10 years it grown up to become a major player, and in 50 years it controls the whole future... Even without assuming intelligence explosion or other extraordinary effects, the danger of any misstep is absolute, and yet arguments against these assumptions are taken as arguments against the risk.
As far as we know, it easily could require an insanely high amount of computing power. The thing is, there are things out there that have as much computing power as human brains—namely, human brains themselves. So if we ever become capable of building computers out of the same sort of stuff that human brains are built out of (namely, really tiny machines that use chemicals and stuff), we'll certainly be able to create computers with the same amount of raw power as the human brain. How hard will it be to create intelligent software to run on these machines? Well, creating intelligent beings is hard enough that humans haven't managed to do it in a few decades of trying, but easy enough that evolution has done it in three billion years. I don't think we know much else about how hard it is. Well, "bootstrapping" is the idea of AI "pulling itself up by its own bootstraps", or, in this case, "making itself more intelligent using its own intelligence". The idea is that every time the AI makes itself more intelligent, it will be able to use its newfound intelligence to find even more ways to make itself more intelligent. Is it possible that the AI will eventually "hit a wall", and stop finding ways to improve itself? In a word, yes. There's no easy way. If it knows the purpose of each of its parts, then it might be able to look at a part, and come up with a new part that does the same thing better. Maybe it could look at the reasoning that went into designing itself, and think to itself something like, "What they thought here was adequate, but the system would work better if they had known this fact." Then it could change the design, and so change itself.
The highlighted portion of your sentence is not obvious. What exactly do you mean by work differently? There's a thought experiment (that you've probably heard before) about replacing your neurons, one by one, with circuits that behave identically to each replaced neuron. The point of the hypo is to ask when, if ever, you draw the line and say that it isn't you anymore. Justifying any particular answer is hard (since it is axiomatically true that the circuit reacts the way that the neuron would). I'm not sure that circuit-neuron replacement is possible, but I certainly couldn't begin to justify (in physics terms) why I think that. That is, the counter-argument to my position is that neurons are physical things and thus should obey the laws of physics. If the neuron was build once (and it was, since it exists in your brain), what law of physics says that it is impossible to build a duplicate? I'm not physicist, but I don't know that it is feasible (or understand the science well enough to have an intelligent answer). That said, it is clearly feasible with biological parts (again, neurons actually exist). By hypothesis, the AI is running a deterministic process to make decisions. Let's say that the module responsible for deciding Newcomb problems is originally coded to two-box. Further, some other part of the AI decides that this isn't the best choice for achieving AI goals. So, the Newcomb module is changed so that it decides to one-box. Presumably, doing this type of improvement repeatedly to will make the AI better and better at achieving its goals. Especially if the self-improvement checker can itself by improved somehow. It's not obvious to me that this leads to super intelligence (i.e. Straumli-perversion level intelligence, if you've read [EDIT] A Fire on the Deep), even with massively faster thinking. But that's what the community seems to mean by "recursive self-improvement."
(A Fire Upon the Deep) ETA: Oops! Deepness in the Sky is a prequel, didn't know and didn't google. (Also, added to reading queue.)
Thanks, edited.

Given that utility functions are only defined up to positive linear transforms, what do total utilitarians and average utilitarians actually mean when they're talking about the sum or the average of several utility functions? I mean, taking what they say literally, if Alice's utility function were twice what it actually is, she would behave the exact same way but she would be twice as ‘important’; that cannot possibly be what they mean. What am I missing?

This is actually an open problem in utilitarianism; there were some posts recently looking to bargaining between agents as a solution, but I can't find them at the moment, and in any case that's not a mainstream LW conclusion.
See here.
They don't know. In most cases, they just sort of wave their hands. You can combine utility functions, but "sum" and "average" do not uniquely identify methods for doing so, and no method identified so far has seemed uniquely compelling.
There isn't a right answer (I think), but some ways of comparing are better than others. Stuart Armstrong is working on some of this stuff, as he mentions here.
I think you figure out common units to denote utilons in through revealed preference. This only works if both utility functions are coherent. also last time this came up I linked this to see if anyone knew anything about it: and got downvoted. shrug
If two possible futures have different numbers of people, those will be subject to different affine transforms, so the utility function as a whole will have been transformed in a non-affine way. See repugnant conclusion for a concrete example.
I think you misunderstood my question. I wasn't asking about what would the difference between summing and averaging be, but how to sum utility functions of different people together in the first place.
Oh, I completely misunderstood that. The right answer is that utilitarians aren't summing utility functions, they're just summing some expression about each person. The term hedonic function is used for these when they just care about pleasure or when they aren't worried about being misinterpreted as just caring about pleasure and the term utility function is used when they don't know what a utility function is or when they are willing to misuse it for convenience.

I would like someone who understands Solomonoff Induction/the univeral prior/algorithmic probability theory to explain how the conclusions drawn in this post affect those drawn in this one. As I understand it, cousin_it's post shows that the probability assigned by the univeral prior is not related to K-complexity; this basically negates the points Eliezer makes in Occam's Razor and in this post. I'm pretty stupid with respect to mathematics, however, so I would like someone to clarify this for me.

Solomonoff's universal prior assigns a probability to every individual Turing machine. Usually the interesting statements or hypotheses about which machine we are dealing with are more like "the 10th output bit is 1" than "the machine has the number 643653". The first statement describes an infinite number of different machines, and its probability is the sum of the probabilities of those Turing machines that produce 1 as their 10th output bit (as the probabilities of mutually exclusive hypotheses can be summed). This probability is not directly related to the K-complexity of the statement "the 10th output bit is 1" in any obvious way. The second statement, on the other hand, has probability exactly equal to the probability assigned to the Turing machine number 643653, and its K-complexity is essentially (that is, up to an additive constant) equal to the K-complexity of the number 643653. So the point is that generic statements usually describe a huge number of different specific individual hypotheses, and that the complexity of a statement needed to delineate a set of Turing machines is not (necessarily) directly related to the complexities of the individual Turing machines in the set.
I don't think there's very much conflict. The basic idea of cousin-it's post is that the probabilities of generic statements are not described by a simplicity prior. Eliezer's post is about the reasons why the probabilities of every mutually exclusive explanation for your data should look like a simplicity prior (an explanation is a sort of statement, but in order for the arguments to work, you can't assign probabilities to any old explanations - they need to have this specific sort of structure).
Stupid question: Does everyone agree that algorithmic probability is irrelevant to human epistemic practices?
I see it as a big open question.
I don't think it's a clear-cut issue. Algorithmic probability seems to be the justification for several Sequence posts, most notably this one and this one. But, again, I am stupid with respect to algorithmic probability theory and its applications.
Kolmogorov Complexity is defined with respect to a Turing complete language. I think cousin_it is saying we should be careful what language our hypothesis is encoded in before checking its complexity. It's easy to make mistakes when trying to explain something in terms of Solomonoff Induction. "A or B or C or D" is invalid to judge the complexity of directly because it is a set of different alternative hypothesis rather than a single one. This one is invalid to judge the complexity of directly because K-complexity is not computable and the algorithm to compute it in this special case is very large. If the term K-complexity were expanded this statement would be astronomically complex. Here we must be careful to note that (presumably) all alternative low-complexity worlds where the "brother's wife's first son's best friend" does not flip the coin have already been ruled out.

(I super-upvoted this, since asking stupid questions is a major flinch/ugh field)

Ok, my stupid question, asked in a blatantly stupid way, is: where does the decision theory stuff fit in The Plan? I have gotten the notion that it's important for Value-Preserving Self-Modification in a potential AI agent, but I'm confused because it all sounds too much like game theory - there all all these other-agents it deals with. If it's not for VPSM, and it fact some exploration of how AI would deal with potential agents, why is this important at all? Let AI figure that out, it's going to be smarter than us anyway.

If there is some Architecture document I should read to grok this, please point me there.

My impression is that, with self-modification and time, continuity of identity becomes a sticky issue. If I can become an entirely different person tomorrow, how I structure my life is not the weak game theory of "how do I bargain with another me?" but the strong game theory of "how do I bargain with someone else?"
I think Eliezer's reply (point '(B)') to this comment by Wei Dai provides some explanation, as to what the decision theory is doing here. From the reply (concerning UDT):
Other agents are complicated regularities in the world (or a more general decision problem setting). Finding problems with understanding what's going on when we try to optimize in other agents' presence is a good heuristic for spotting gaps in our understanding of the idea of optimization.
I think the main reason is simple. It's hard to create a transparent/reliable agent without decision theory. Also, since we're talking about a super-power agent, you don't want to mess this up. CDT and EDT are known to mess up, so it would be very helpful to find a "correct" decision theory. Though you may somehow be able to get around it by letting an AI self-improve, it would be nice to have one less thing to worry about, especially because how the AI improves is itself a decision.

What exactly is the difference in meaning of "intelligence", "rationality", and "optimization power" as used on this site?

Optimization power is a processes' capacity for reshaping the world according to its preferences. Intelligence is optimization power divided by the resources used. "Intelligence" is also sometimes used to talk about whatever is being measured by popular tests of "intelligence," like IQ tests. Rationality refers to both epistemic and instrumental rationality: the craft of obtaining true beliefs and of achieving one's goals. Also known as systematized winning.

If I had a moderately powerful AI and figured out that I could double its optimisation power by tripling its resources, my improved AI would actually be less intelligent? What if I repeat this process a number of times; I could end up an AI that had enough optimisation power to take over the world, and yet its intelligence would be extremely low.

We don't actually have units of 'resources' or optimization power, but I think the idea would be that any non-stupid agent should at least triple its optimization power when you triple its resources, and possibly more. As a general rule, if I have three times as much stuff as I used to have, I can at the very least do what I was already doing but three times simultaneously, and hopefully pool my resources and do something even better.
For "optimization power", we do now have some fairly reasonable tests: * AIQ * Generic Compression Benchmark
Machine learning and AI algorithms typically display the opposite of this, i.e. sub-linear scaling. In many cases there are hard mathematical results that show that this cannot be improved to linear, let alone super-linear. This suggest that if a singularity were to occur, we might be faced with an intelligence implosion rather than explosion.
If intelligence=optimization power/resources used, this might well be the case. Nonetheless, this "intelligence implosion" would still involve entities with increasing resources and thus increasing optimization power. A stupid agent with a lot of optimization power (Clippy) is still dangerous.
I agree that it would be dangerous. What I'm arguing is that dividing by resource consumption is an odd way to define intelligence. For example, under this definition is a mouse more intelligent than an ant? Clearly a mouse has much more optimisation power, but it also has a vastly larger brain. So once you divide out the resource difference, maybe ants are more intelligent than mice? It's not at all clear. That this could even be a possibility runs strongly counter to the everyday meaning of intelligence, as well as definitions given by psychologists (as Tim Tyler pointed out above).

Intelligence is optimization power divided by the resources used.

I checked with: A Collection of Definitions of Intelligence.

Out of 71 definitions, only two mentioned resources:

“Intelligence is the ability to use optimally limited resources – including time – to achieve goals.” R. Kurzweil

“Intelligence is the ability for an information processing system to adapt to its environment with insufficient knowledge and resources.” P. Wang

The paper suggests that the nearest thing to a consensus is that intelligence is about problem-solving ability in a wide range of environments.

Yes, Yudkowsky apparently says otherwise - but: so what?

I don't think he really said this. The exact quote is This seems like just a list of different measurements trying to convey the idea of efficiency. When we want something to be efficient, we really just mean that we have other things to use our resources for. The right way to measure this is in terms of the marginal utility of the other uses of resources. Efficiency is therefore important, but trying to calculate efficiency by dividing is oversimplifying.
What about a giant look-up table, then?
That requires lots of computing resources. (I think that's the answer.)
That would surely be very bad at solving problems in a wide range of environments.
For any agent, I can create a GLUT that solves problems just as well (provided the vast computing resources necessary to store it), by just duplicating that agent's actions in all of its possible states.
Surely its performance would be appalling on most problems - vastly inferior to a genuinely intellligent agent implemented with the same hardware technology - and so it will fail to solve many of the problems with time constraints. The idea of a GLUT seems highly impractical. However, if you really think that it would be a good way to construct an intelligent machine, go right ahead.
I agree. That's the point of the original comment- that "efficient use of resources" is as much a factor in our concept of intelligence as is "cross-domain problem-solving ability". A GLUT could have the latter, but not the former, attribute.
"Cross-domain problem-solving ability" implicitly includes the idea that some types of problem may involve resource constraints. The issue is whether that point needs further explicit emphasis - in an informal definition of intelligence.
Sure, if you had an infinitely big and fast computer. Of course, even then you still wouldn't know what to put in the table. But if we're in infinite theory land, then why not just run AIXI on your infinite computer? Back in reality, the lookup table approach isn't going to get anywhere. For example, if you use a video camera as the input stream and after just one frame of data your table would already need something like 256^1000000 entries. The observable universe only has 10^80 particles.
You misunderstand me. I'm pointing out that a GLUT is an example of something with (potentially) immense optimization power, but whose use of computational resources is ridiculously prodigal, and which we might hesitate to call truly intelligent. This is evidence that our concept of intelligence does in fact include some notion of efficiency, even if people don't think of this aspect without prompting.
Right, but the problem with this counter example is that it isn't actually possible. A counter example that could occur would be much more convincing. Personally, if a GLUT could cure cancer, cure aging, prove mind blowing mathematical results, write a award wining romance novel, take over the world, and expand out to take over the universe... I'd be happy considering it to be extremely intelligent.
It's infeasible within our physics, but it's possible for (say) our world to be a simulation within a universe of vaster computing power, and to have a GLUT from that world interact with our simulation. I'd say that such a GLUT was extremely powerful, but (once I found out what it really was) I wouldn't call it intelligent- though I'd expect whatever process produced it (e.g. coded in all of the theorem-proof and problem-solution pairs) to be a different and more intelligent sort of process. That is, a GLUT is the optimizer equivalent of a tortoise with the world on its back- it needs to be supported on something, and it would be highly unlikely to be tortoises all the way down.
A 'featherless biped' definition. That is, it's decent attempt at a simplified proxy but massively breaks down if you search for exceptions.
What Intelligence Tests Miss is a book about the difference between intelligence and rationality. The linked LW-article about the book should answer your questions about the difference between the two. A short answer would be that intelligence describes how well you think, but not some important traits and knowledge like: Do you use your intelligence (are you a reflective person), do you have a strong need for closure, can you override your intuitions, do you know Bayes-theorem, probability theory, or logic?
"Intelligence" is often defined as being the "g-factor" of humans - which is a pretty sucky definition of "rationality". Go to definitions of "intelligence" used by machine intelligence researchers and it's much closer to "rationality".

If I understand it correctly, the FAI problem is basically about making an AI whose goals match those of humanity. But why does the AI need to have goals at all? Couldn't you just program a question-answering machine and then ask it to solve specific problems?

This idea is called "Oracle AI"; see this post and its dependencies for some reasons why it's probably a bad idea.

That's exactly what I was looking for. Thank you.
In addition to the post Vladimir linked, see also this paper.
Presumably once AGI becomes smarter than humans, it will develop goals of some kind, whether we want it or not. Might as well try to influence them.
A better wording would probably be that you can't design something with literally no goals and still call it an AI. A system that answers questions and solves specific problems has a goal: to answer questions and solve specific problems. To be useful for that task, its whole architecture has to be crafted with that purpose in mind. For instance, suppose it was provided questions in the form of written text. This means that its designers will have to build it in such a way that it interprets text in a certain way and tries to discover what we mean by the question. That's just one thing that it could do to the text, though - it could also just discard any text input, or transform each letter to a number and start searching for mathematical patterns in the numbers, or use the text to seed its random-number generator that it was using for some entirely different purpose, and so forth. In order for the AI to do anything useful, it has to have a large number of goals such as "interpret the meaning of this text file I was provided" implicit in its architecture. As the AI grows more powerful, these various goals may manifest themselves in unexpected ways.

In this interview between Eliezer and Luke, Eliezer says that the "solution" to the exploration-exploitation trade-off is to "figure out how much resources you want to spend on exploring, do a bunch of exploring, use all your remaining resources on exploiting the most valuable thing you’ve discovered, over and over and over again." His point is that humans don't do this, because we have our own, arbitrary value called boredom, while an AI would follow this "pure math."

My potentially stupid question: doesn't this strategy assu... (read more)

You got me curious, so I did some searching. This paper gives fairly tight bounds in the case where the payoffs are adaptive (i.e. can change in response to your previous actions) but bounded. The algorithm is on page 5.
Thanks for the link. Their algorithm, the “multiplicative update rule,” which goes about "selecting each arm randomly with probabilities that evolve based on their past performance," does not seem to me to be the same strategy as Eliezer describes. So does this contradict his argument?
You should probably be prepared to change how much you plan to spend on exploring based on the initial information recieved.
This has me confused as well. Assume a large area divided into two regions. Region A has slot machines with average payout 50, while region B has machines with average payout 500. I am blindfolded and randomly dropped into region A or B. The first slot machine I try has payout 70. I update in the direction of being in region A. Doesn't this affect how many resources I wish to spend doing exploration?
Are you also assuming that you know all of those assumed facts about the area? I would certainly expect that how many resources I want to spend on exploration will be affected by how much a priori knowledge I have about the system. Without such knowledge, the amount of exploration-energy I'd have to expend to be confident that there are two regions A and B with average payout as you describe is enormous.
Do you mean to set the parameter specifying the amount of resources (e.g., time steps) to spend exploring (before switching to full-exploiting) based on the info you receive upon your first observation? Also, what do you mean by "probably"?
Sure. For example, if your environment is such that the process of exploitation can alter your environment in such a way that your earlier judgment of "the most valuable thing" is no longer reliable, then an iterative cycle of explore-exploit-explore can potentially get you better results. Of course, you can treat each loop of that cycle as a separate optimization problem and use the abovementioned strategy.
Could I replace "can potentially get you better results" with "will get you better results on average"?
Would you accept "will get you better results, all else being equal" instead? I don't have a very clear sense of what we'd be averaging.
I meant averaging over the possible ways that the environment could change following your exploitation. For example, it's possible that a particular course of exploitation action could shape the environment such that your exploitation strategy actually becomes more valuable upon each iteration. In such a scenario, exploring more after exploiting would be an especially bad decision. So I don't think I can accept "will" without "on average" unless "all else" excludes all of these types of scenarios in which exploring is harmful.
OK, understood. Thanks for clarifying. Hm. I expect that within the set of environments where exploitation can alter the results of what-to-exploit-next calculations, there more possible ways for it to do so such that the right move in the next iteration is further exploration than further exploitation. So, yeah, I'll accept "will get you better results on average."

So in Eliezer's meta-ethics he talks about the abstract computation called "right", whereas in e.g. CEV he talks about stuff like reflective endorsement. So in other words in one place he's talking about goodness as a formal cause and in another he's talking about goodness as a final cause. Does he argue anywhere that these should be expected to be the same thing? I realize that postulating their equivalence is not an unreasonable guess but it's definitely not immediately or logically obvious, non? I suspect that Eliezer's just not making a clear... (read more)

Not explicitly. He does in various places talk about why alternative considerations of abstract 'rightness' - some sort of objective morality or something - are absurd. He does give some details on his reductionist moral realism about the place but I don't recall where. Incidentally I haven't seen Eliezer talk about formal or final causes about anything, ever. (And they don't seem to be especially useful concepts to me.)
Aren't "formal cause" and "final cause" just synonyms for "shape" and "purpose", respectively?
Basically, but Aristotle applied naive philosophical realism to them, and Will might have additional connotations in mind.
Sweet phrase, thanks. Maybe there should be a suite of these? I've noticed naive physical realism and naive philosophical (especially metaphysical) realism.
They're not the same. CEV is an attempt to define a procedure that can infer morality by examining the workings of a big bunch of sometimes confused human brains just like you might try to infer mathematical truths by examining the workings of a big bunch of sometimes buggy calculators. The hope is that CEV finds morality, but it's not the same as morality, any more than math is defined to be the output of a certain really well made calculator.

I keep scratching my head over this comment made by Vladimir Nesov in the discussion following “A Rationalist’s Tale”. I suppose it would be ideal for Vladimir himself to weigh in and clarify his meaning, but because no objections were really raised to the substance of the comment, and because it in fact scored nine upvotes, I wonder if perhaps no one else was confused. If that’s the case, could someone help me comprehend what’s being said?

My understanding is that it’s the LessWrong consensus that gods do not exist, period; but to me the comment seems to ... (read more)


"Magical gods" in the conventional supernatural sense generally don't exist in any universes, insofar as a lot of the properties conventionally ascribed to them are logically impossible or ill-defined, but entities we'd recognize as gods of various sorts do in fact exist in a wide variety of mathematically-describable universes. Whether all mathematically-describable universes have the same ontological status as this one is an open question, to the extent that that question makes sense.

(Some would disagree with referring to any such beings as "gods", e.g. Damien Broderick who said "Gods are ontologically distinct from creatures, or they're not worth the paper they're written on", but this is a semantic argument and I'm not sure how important it is. As long as we're clear that it's probably possible to coherently describe a wide variety of godlike beings but that none of them will have properties like omniscience, omnipotence, etc. in the strongest forms theologians have come up with.)

Thanks, that makes more sense to me. I didn't think qualities like omnipotence and such could actually be realized. Any way you can give me an idea of what these godlike entities look like though? You indicate they aren't actually "magical" per se - so they would have to be subject to whatever laws of physics reign in their world, no? I take it we must talking about superintelligent AIs or alien simulators or something weird like that?
Why, we could come up with abstract universes where the Magical Gods have exactly the powers and understanding of what's going on befitting Magical Gods. I wasn't thinking of normal and mundane things like superintelligent AIs or alien simulators. Take Thor, for example: he doesn't need to obey Maxwell's equations or believe himself to be someone other than a hummer-wielding god of lightning and thunder.
Maybe I'm just confused by your use of the term "magical". I am imagining magic as some kind of inexplicable, contracausal force - so for example, if Thor wanted to magically heal someone he would just will the person's wounds to disappear and, voila, without any physical process acting on the wounds to make them heal up, they just disappear. But surely that's not possible, right?
Your imagining what's hypothetically-anticipated to happen is the kind of lawful process that magical worlds obey by stipulation.
Our world doesn't have gods; but if all possible worlds exist (which is an attractive belief for various reasons) then some of those have gods. However, they're irrelivant to us.

When people talk about designing FAI, they usually say that we need to figure out how to make the FAI's goals remain stable even as the FAI changes itself. But why can't we just make the FAI incapable of changing itself?

Database servers can improve their own performance, to a degree, simply by performing statistical analysis on tables and altering their metadata. Then they just consult this metadata whenever they have to answer a query. But we never hear about a database server clobbering its own purpose (do we?), since they don't actually alter their own ... (read more)

But why can't we just make the FAI incapable of changing itself?

Because it would be weak as piss and incapable of doing most things that we want it to do.

Would upvote twice for this expression if I could :-)

The majority of Friendly AI's ability to do good comes from its ability to modify its own code. Recursive self improvement is key to gaining intelligence and ability swiftly. An AI that is about as powerful as a human is only about as useful as a human.

I disagree. AIs can be copied, which is a huge boost. You just need a single Stephen Hawking AI to come out of the population, then you make 1 million copies of it and dramatically speed up science.
I don't buy any argument saying that an FAI must be able to modify its own code in order to take off. Computer programs that can't modify their own code can be Turing-complete; adding self-modification doesn't add anything to Turing-completeness. That said, I do kind of buy this argument about how if an AI is allowed to write and execute arbitrary code, that's kind of like self-modification. I think there may be important differences.
It makes sense to say that a computer language is Turing-complete. It doesn't make sense to say that a computer program is Turing-complete.
Arguably, a computer program with input is a computer language. In any case, I don't think this matters to my point.
In addition to these other answers, I read a paper, I think by Eliezer, which argued that it was almost impossible to stop an AI from modifying its own source code, because it would figure out that it would gain a massive efficiency boost from doing so. Also, remember that the AI is a computer program. If it is allowed to write other algorithms and execute them, which it has to be to be even vaguely intelligent, then it can simply write a copy of its source code somewhere else, edit it as desired, and run that copy. I seem to recall the argument being something like the "Beware Seemingly Simple Wishes" one. "Don't modify yourself" sounds like a simple instruction for a human, but isn't as obvious when you look at it more carefully. However, remember that a competent AI will keep its utility function or goal system constant under self modification. The classic analogy is that Gandhi doesn't want to kill people, so he also doesn't want to take a pill that makes him want to kill people. I wish I could remember where that paper was where I read about this.
Well, let me describe the sort of architecture I have in mind. The AI has a "knowledge base", which is some sort of database containing everything it knows. The knowledge base includes a set of heuristics. The AI also has a "thought heap", which is a set of all the things it plans to think about, ordered by how promising the thoughts seem to be. Each thought is just a heuristic, maybe with some parameters. The AI works by taking a thought from the heap and doing whatever it says, repeatedly. Heuristics would be restricted, though. They would be things like "try to figure out whether or not this number is irrational", or "think about examples". You couldn't say, "make two more copies of this heuristic", or "change your supergoal to something random". You could say "simulate what would happen if you changed your supergoal to something random", but heuristics like this wouldn't necessarily be harmful, because the AI wouldn't blindly copy the results of the simulation; it would just think about them. It seems plausible to me that an AI could take off simply by having correct reasoning methods written into it from the start, and by collecting data about what questions are good to ask.
I found the paper I was talking about. The Basic AI Drives, by Stephen M. Omohundro. From the paper:
I'm not really qualified to answer you here, but here goes anyway. I suspect that either your base design is flawed, or the restrictions on heuristics would render the program useless. Also, I don't think it would be quite as easy to control heuristics as you seem to think. Also, AI people who actually know what they're talking about, unlike me, seem to disagree with you. Again, I wish I could remember where it was I was reading about this.
Maturing isn't a magical process. It happens because of good modifications made to source code.
Why can't it happen because of additional data collected about the world?
It could, although frankly I'm sceptical. I've had 18 years to collect data about the world and so far it hasn't led me to a point where I'd be confident in modifying myself without changing my goals, if an AI takes much longer than that another UFAI will probably beat it to the punch? If it is possible to figure out friendliness only through empirical reasoning without intelligence enhancement, why not figure it out ourselves and then build the AI (this seems roughly the approach SIAI is counting on).
"Safety" of own source code is actually a weak form of the problem. An AI has to keep the external world sufficiently "safe" as well, because the external world might itself host AIs or other dangers (to the external world, but also to AI's own safety), that must either remain weak, or share AI's values, to keep AI's internal "safety" relevant.

Are there any intermediate steps toward the CEV, such as individual EV, and if so, are they discussed anywhere?

Only preliminary research into the potential EV algorithms have been explored. See these citations... Brandt 1979; Railton 1986; Lewis 1989; Sobel 1994; Zimmerman 2003; Tanyi 2006 ...from The Singularity and Machine Ethics.
  1. Where should I ask questions like question 2?

  2. I've been here less than thirty days. Why does my total karma sometimes but not always show a different number from my karma from the last 30 days?

Presumably because the respective caches are recalculated at different intervals.
One of the numbers updates a couple minutes before the other one does. I forget which.

Why are flowers beautiful? I can't think of any "just so" story why this should be true, so it's puzzled me. I don't think it's justification for a God or anything, just something I currently cannot explain.


Many flowers are optimized for being easily found by insects, who don't have particularly good eyesight. To stick out from their surroundings, they can use bright unnatural colors (i.e. not green or brown), unusual patterns (concentric circles is a popular one), have a large surface, etc.

Also, flowers are often quite short-lived, and thus mostly undamaged; we find smoothness and symmetry attractive (for evolutionary reasons - they're signs of health in a human).

In addition, humans select flowers that they themselves find pretty to place in gardens and the like, so when you think of "flowers", the pretty varieties are more likely to come to mind than the less attractive ones (like say that of the plane tree, or of many species of grass - many flowers are also prettier if you look at them in the UltraViolet.). If you take a walk in the woods, most plants you encounter won't have flowers you'll find that pretty; ugly or unremarkable flowers may not even register in your mind as "flowers".

0Peter Wildeford
That makes sense, thanks. Do you have any more references on this?
I think one possible "just so" explanation is: Humans find symmetry more beautiful than asymmetry. Flowers are symmetrical. Standard caveats: There are more details, and that's not a complete explanation, but it might prove a good starting point to look into if you're curious about the explanation.
Is it flowers in particular that puzzle you? Or is it more generally the fact that humans are wired so as to find anything at all beautiful?
1Peter Wildeford
I suppose it would be finding beauty in things that don't seem to convey a survival advantage or that I can't personally draw a connection to something with a survival advantage. Another good example would be the beauty of rainbows.

How do I stop my brain from going: "I believe P and I believe something that implies not P -> principle of explosion -> all statements are true!" and instead go "I believe P and I believe something that implies not P -> I one of my beliefs are incorrect". It doesn't happen to often, but it'd be nice to have an actual formal refutation for when it does.

Do you actually do this - "Oh, not P! I must be the pope." - or do you just notice this - "Not P, so everything's true. Where do I go from here?". If you want to know why you shouldn't do this it's because you never really learn not P, you just learn evidence against P which you should update with Bayes' rule. If you want to understand this process more intuitively (and you've already read the sequences and are still confused), I would recommend this short tutorial or studying belief propagation in Bayesian networks, for which I don't know a great source for the intuitions behind, but units 3 and 4 of the online Stanford AI class might help.
I've actually done that class and gotten really good grades. Looking at it, it seems I have automatic generation of nodes for new statements, and the creation of a new node does not check for an already existing node for it's inversion. To complicate matters further, I don't go "I'm the pope" nor "all statements are true.", I go "NOT Bayes theorem, NOT induction, and NOT Occhams razor!"
Well, one mathematically right thing to do is to make a new node descending from both other nodes representing E = (P and not P) and then observe not E. Did you read the first tutorial? Do you find the process of belief-updating on causal nets intuitive, or do you just understand the math? How hard would it be for you to explain why it works in the language of the first tutorial? Strictly speaking, causal networks only apply to situations where the number of variables does not change, but the intuitions carry over.
Thats what I try to do, the problem is I end up observing E to be true. And E leads to an "everything" node. I'm not sure how well I understand the math, but I feel like I probably do...
You don't observe E to be true, you infer it to be (very likely) true by propagating from P and from not P. You observe it to be false using the law of noncontradiction. Parsimony suggests that if you think you understand the math, it's because you understand it. Understanding Bayesianism seems easier than fixing a badly-understood flaw in your brain's implementation of it.
How can I get this law of noncontradiction? it seems like an useful thing to have.
The reason is that you don't believe anything with logical conviction, if your "axioms" imply absurdity, you discard the "axioms" as untrustworthy, thus refuting the arguments for their usefulness (that always precede any beliefs, if you look for them). Why do I believe this? My brain tells me so, and its reasoning is potentially suspect.
I think I've found the problem: I don't have any good intuitive notion of absurdity. The only clear association I have with it is under "absurdity heuristic" as "a thing to ignore". That is: It's not self evident to me that what it implies IS absurd. After all, it was implied by a chain of logic I grok and can find no flaw in.
I used "absurdity" in the technical math sense.
To the (mostly social) extent that concepts were useful to your ancestors, one is going to lead to better decisions than the other, and so you should expect to have evolved the latter intuition. (You trust two friends, and then one of them tells you the other is lying- you feel some consternation of the first kind, but then you start trying to figure out which one is trustworthy.)
It seems a lot of intuitions all humans are supposed to have were overwritten by noise at some point...

Is there an easy way to read all the top level posts in order starting from the beginning? There doesn't seem to be a 'first post' link anywhere.

There is a draft of a suggested reading order. As I understand it, the sequences of LessWrong more or less grew out of prior writings by Eliezer, especially out of his posts at Overcoming Bias, so, there isn't a definitive first post.
I've read most of the posts on the suggested order, its more to satisfy my completionist streak and because Eleizer's early posts have an ongoing narrative to them. The brute force solution would simply be to find an early post and click 'previous' until they run out, but I would hope there would be an easier way, as sort by oldest firs tends to be one of the default options in such things.
Isn't the first one "The Martial Art of Rationality"?
You may find one of these helpful. As a heads up, though, you may want to begin with the essays on Eliezer's website (Bayes' Theorem, Technical Explanation, Twelve Virtues, and The Simple Truth) before you start his OB posts.
Check out this page. Additionally, at the end of each post, there is a link labeled "Article Navigation". Click that, and it will open links to the previous and next post by the author.

Is there a proof that it's possible to prove Friendliness?

No. There's also no proof that it's possible to prove that P!=NP, and for the Friendliness problem it's much, much less clear what the problem even means. You aren't entitled to that particular proof, it's not expected to be available until it's not needed anymore. (Many difficult problems get solved or almost solved without a proof of them being solvable appearing in the interim.)
Why is it plausible that Friendliness is provable? Or is it more a matter that the problem is so important that it's worth trying regardless?

There is no clearly defined or motivated problem of "proving Friendliness". We need to understand what goals are, what humane goals are, what process can be used to access their formal definition, and what kinds of things can be done with them how to what end. We need to understand these things well, which (on psychological level) triggers association with mathematical proofs, and will probably actually involve some mathematics suitable to the task. Whether the answers take the form of something describable as "provable Friendliness" seems to me an unclear/unmotivated consideration. Unpacking that label might make it possible to provide a more useful response to the question.

I wonder what SI would do next if they could prove that friendly AI was not possible. For example if it could be shown that value drift was inevitable and that utility-functions are unstable under recursive self-improvement.
That doesn't seem like the only circumstances in which FAI is not possible. If moral nihilism is true, then FAI is impossible even if value drift is not inevitable. In that circumstance, shouldn't we try to make any AI we decide to build "friendly" to present day humanity, even if it wouldn't be friendly to Aristotle or Plato or Confucius. Based on hidden complexity of wishes analysis, consistency with our current norms is still plenty hard.
My concerns are more that it will not be possible to adequately define "human", especially as, transhuman tech develops, and that there might not be a good enough way to define what's good for people.
As I understand it, the modest goal of building an FAI is that of giving an AGI a push in the "right" direction, what EY refers to as the initial dynamics. After that, all bets are off.

How do I work out what i want and what I should do?

Strictly speaking, this question may be a bit tangential to LessWrong, but this was never supposed to be a exclusive thread. The answer will depend on a lot of things, mostly specific to you personally. Bodies differ. Even your own single body changes over the course of time. You have certain specific goals and values, and certain constraints. Maybe what you're really looking for is recommendations for a proper physical fitness forum, which is relatively free of pseudoscience and what they call "woo." I can't advise, myself, but I'm sure that some LessWrongians can.
I think you've taken the wrong meaning of the words "work out".
[Literally laughing out loud at myself.]
Oscar is correct. A better phrasing might be: Given that I have difficulty inferring my own desires what might be useful methods for me to discover my pre-existing desires or choose long term goals? [On an unrelated note, reddits r/fitness is an extremely good source for scientifically based 'work out' advice. Which I believe would satisfy Contanza's criteria.]
Read things like this at least, and assume there are a lot of things like that that we don't know about. That's another of my stopgap solutions.
Interesting link. But in general, I would like just a tiny bit of context so I might know why I want to click on "this".
Devise and execute a highly precise ritual wherein you invoke the optimal decision theory. That's my stopgap solution.

I think I may be incredibly confused.

Firstly, if the universe is distributions of complex amplitudes in configuration space, then shouldn't we describe our knowledge of the world as probability distributions of complex amplitude distributions? Is there some incredibly convenient simplification I'm missing?

Secondly, have I understood correctly that the universe, in quantum mechanics, is a distribution of complex values in an infinite-dimensional space, where each dimension corresponds to the particular values some atribute of some particle in the universe t... (read more)

More or less. I find I get a lot of mileage out of using words. It does lose a lot of information - which I suppose is rather the point.
What I meant by that is that distributions of other distributions are the sort of thing you would kind of expect to be incredibly impractical to use, but also could have some handy mathematical ways to look at them. Since I am unfamiliar with the formalisms involved, I was wondering if anybody could enlighten me.

There's an argument that I run into occasionally that I have some difficulty with.

Let's say I tell someone that voting is pointless, because one vote is extremely unlikely to alter the outcome of the election. Then someone might tell me that if everyone thought the way I do, democracy would be impossible.

And they may be right, but since everyone doesn't think the way I do, I don't find it to be a persuasive argument.

Other examples would be littering, abusing community resources, overusing antibiotics, et cetera. They may all be harmful, but if only one add... (read more)

Try searching for "free rider problem" or "tragedy of the commons." Here are the relevant Wiki pages:
That's exactly it. I used to know that, can't believe I forgot it. Thanks!
The related behavior pattern where everyone contributes to the collective problem is sometimes referred to as the tragedy of the commons. I'm fonder of "no single raindrop feels responsible for the flood," myself.

How would I set up a website with a similar structure to less wrong? So including user submitted posts, comments and an upvote downvote system.

(Depending on your level of technical ability) You could get the source of LW and follow the setup instructions (changing the icons and styling etc as appropriate).