Previously: round 1, round 2, round 3

From the original thread:

This is for anyone in the LessWrong community who has made at least some effort to read the sequences and follow along, but is still confused on some point, and is perhaps feeling a bit embarrassed. Here, newbies and not-so-newbies are free to ask very basic but still relevant questions with the understanding that the answers are probably somewhere in the sequences. Similarly, LessWrong tends to presume a rather high threshold for understanding science and technology. Relevant questions in those areas are welcome as well.  Anyone who chooses to respond should respectfully guide the questioner to a helpful resource, and questioners should be appropriately grateful. Good faith should be presumed on both sides, unless and until it is shown to be absent.  If a questioner is not sure whether a question is relevant, ask it, and also ask if it's relevant.

Ask away!

New to LessWrong?

New Comment
181 comments, sorted by Click to highlight new comments since: Today at 2:34 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Can anyone attest to getting real instrumental benefit from SI/LW rationality training, whether from SI bootcamps or just from reading LessWrong?

I don't just mean "feeling better about myself," but identifiable and definite improvements, like getting a good job in one week after two years without success.

At the moment, LW has provided negative benefit to my life. I recently quit my job to start learning positive psychology. My initial goal was to blog about positive psychology, and eventually use my blog as a platform to sell a book.

LW has made me deeply uncertain of the accuracy of the research I read, the words I write on my blog, and the advice I am writing in the book I intend to sell. Long-term, the uncertainty will probably help me by making me more knowledgeable than my peers, but in the short-term, demotivates (e.g. if I was sure what I was learning was correct, I would enthusiastically proselytize, which is a much more effective blogging strategy).

Still, I read on, because I've passed the point of ignorance.

I also think that LW has provided negative benefit to my life. Since I decided that I wanted my beliefs to be true, rather than pleasing to me, I've felt less connected to my friendship group. I used to have certain politcal views that a lot of my friends approved of. Now, I think I was wrong about many things (not totally wrong, but I'm far less confident of the views that I continue to hold). Overall, I'd rather believe true things, but I think so far it's made me less happy.

Why would you rather believe true things?
1.I would just rather know the right answer! 2.I think believing true things has better consequences than the reverse, for many people. I'm not sure if it will for me. 3.It's too late. I can't decide to go back to believing things that aren't true to make me feel better, because I'd know that's what I was doing. Would you not prefer to believe true things?
No, I would not not-prefer to believe true things. That said I also don't experience believing true things as making me unhappy the way you describe. It's the combination of those statements that intrigues me: X makes you unhappy and you would rather do X. So I was curious as to why you would rather do it. I have to admit, though, your answers leave me even more puzzled.
Here are a couple of other reasons: 4.So, I suppose in some ways, feeling that my beliefs are more accurate has given me some sort of satisfaction. I don't know it it outweigh's feeling disconnected socially, though. 5.Altruism. I used to put a lot of energy into UK politics. I gained moral satisfaction and approval from my friends for this, but I've come to think that it's really not a very effective way of improving the world. I would rather learn about more effective ways of making the world better (eg, donating to efficient charity). Does that make sense? If you did feel that believing true things made you unhappy, would you try to make yourself belief not-true but satisfying things?
Altruism makes some sense to me as an answer... if you're choosing to sacrifice your own happiness in order to be more effective at improving the world, and believing true things makes you more effective at improving the world, then that's coherent. Unrelatedly, if the problem is social alienation, one approach is to find a community in which the things you want to do (including believe true things) are socially acceptable. There are areas in which I focus my attention on useful and probably false beliefs, like "I can make a significant difference in the world if I choose to take action." It's not clear to be that I believe those things, though. It's also not clear to me that it matters whether I believe them or not, if they are motivating my behavior just the same.
That's how I felt for the first few months after discovering that Jesus wasn't magic after all. At that moment, all I could see was that (1) my life up to that point had largely been wasted on meaningless things, (2) my current life plans were pointless, (3) my closest relationships were now strained, and (4) much of my "expertise" was useless. Things got better after a while.
I'm tempted to conclude that your current accumulated utility given LW is lower than given (counterfactual no-LW), but that in counterpart/compensation your future expected utility has risen considerably by unknown margins with a relatively high confidence. Is this an incorrect interpretation of the subtext? Am I reading too much into it?
That interpretation is correct. I've noticed that I don't even need to be knowledge to gain utility - there is a strong correlation between the signaling of my 'knowledgeableness' and the post popularity - the most popular had the largest number of references (38), and so on. When writing the post, I just hide the fact that I researched so much because of my uncertainty :)

Absence of evidence is evidence of absence :-) Most of us don't seem to get such benefits from reading LW, so learning about an individual case of benefit shouldn't influence your decisions much. It will probably be for spurious reasons anyway. Not sure about the camps, but my hopes aren't high.

Can anyone attest to getting real instrumental rationality benefits from reading Wikipedia? (As a control question; everyone seems to think that Wikipedia is obviously useful and beneficial, so is anyone getting "real instrumental rationality benefits" from it?)

I suspect that the "success equation", as it were, is something like expected_success = drive intelligence rationality, and for most people the limiting factor is drive, or maybe intelligence. Also, I suspect that changes in your "success equation" parameters take can take years to manifest as substantial levels of success, where people regard you as "successful" and not just "promising". And I don't think anyone is going to respond to a question like this with "reading Less Wrong made me more promising" because that would be dopey, so there's an absence of data. (And promising folks may also surf the internet, and LW, less.)

It's worth differentiating between these two questions, IMO: "does reading LW foster mental habits that make you better at figuring out what's true?" and "does being better at figuring out what's true make you significantly more successful?" I tend to assign more credence to the first than the second.

John, Wikipedia is generally valued for epistemic benefit, i.e., it teaches you facts. Only rarely does it give you practically useful facts, like the fact that lottery tickets are a bad buy. I agree that LW-rationality gives epistemic benefits. And as for "years to manifest": Diets can make you thinner in months. Likewise, PUA lessons get you laid, weightlifting makes you a bit stronger, bicycle repair workshops get you fixing your bike, and Tim Ferris makes you much better at everything, in months -- if each of these is all it's cracked up to be. Some changes do take years, but note also that LW-style rationality has been around for years, so at least some people should be reporting major instrumental improvements.
One point is that if a specific diet helps you, it's easy to give credit to that diet. But if LW changes your thinking style, and you make a decision differently years later, it's hard to know what decision you would have made if you hadn't found LW. Another point is that rationality should be most useful for domains where there are long feedback cycles--where there are shorter feedback cycles, you can just futz around and get feedback, and people who study rationality won't have as much of an advantage. I think I've gotten substantial instrumental benefits from reading LW. It makes me kind of uncomfortable to share personal details, but I guess I'll share one example: When I was younger, I was very driven and ambitious. I wanted to spend my time teaching myself programming, etc., but in actuality I would spend my time reading reddit and feeling extremely guilty that I wasn't teaching myself programming. At a certain point I started to realize that my feeling of guilt was counterproductive, and if I actually wanted to accomplish my goals then I should figure out what emotions would be useful for accomplishing my goals and try to feel those. I think it's likely that if I didn't read LW I wouldn't have had this realization, or would've had this realization but not taken it seriously. And this realization, along with others in the same vein, seems to have been useful for helping me get more stuff done.

I was at July rationality minicamp, and in addition to many "epiphanies", one idea that seems to work for me is this, very simplified -- forget the mysterious "willpower" and use self-reductionism, instead of speaking in far mode what you should and want to do, observe in near mode the little (irrational) causes that really make you do things. Then design your environment to contain more of those causes which make you do things you want to do. And then, if the theory is correct, you find yourself doing more of what you want to do, without having to suffer the internal conflict traditionally called "willpower".

Today it's almost one month since the minicamp, and here are the results so far. I list the areas where I wanted to improve myself, and assign a score from 0 to 5, where 5 means "works like a miracle; awesome" and 0 means "no change at all". (I started to work on all these goals in parallel, which may be a good or bad idea. Bad part is, there is probably no chance succeeding in all at once. Good part is, if there is success in any part, then there is a success.)

  • (5) avoiding sugar and soda
  • (4) sleeping regularly, avoiding sl
... (read more)
Hm, I've been trying to get rid of one particular habit (drinking while sitting at my computer) for a long time. Recently I've considered the possibility of giving myself a reward every time I go to the kitchen to get a beer and come back with something else instead. The problem was that I couldn't think of a suitable reward (there's not much that I like). I hadn't thought of just making something up, like pieces of paper. Thanks for the inspiration!

I was a July minicamp attendee. I did the big reading through the Sequences thing when lukeprog was doing it at Common Sense Atheism, so I'd day fewer of the benefits were rationality level-ups and more were life hacking. Post-minicamp I am:

  • doing sit-ups, push-ups, and squats every day (using the apps from the 200 situps guy), up from not doing this at all
  • martial arts training four times a week (aikido and krav) again, up from not doing things at all
  • using RTM to manage tasks which means
  • dropping way fewer small tasks
  • breaking tasks down into steps more efficiently
  • knocked off about three lagging tasks (not timebound, so I was making no progress on them) in the month that I got back
  • stopped using inbox as task manager, so I could actually only keep emails I was replying to in there
  • using beeminder to get down to inbox zero (currently three)
  • working in pomodoros has sped up my writing to the point where:
  • I miss doing a daily posts to my blog more rarely (one over two weeks compared to 0-2 a week) and have had more double post days than previously (which translates into higher page views and more money for me)
  • Less time writing left me more time for leisure reading

I should add that I had a bit of a crestfallen feeling for the first few days of minicamp, since being more efficient and organized feels like a really lame superpower. I expected a bit more of it to be about choosing awesome goals. But then I realized that I'd always be grateful for a genie that magically gave me an extra hour, and I shouldn't look a gift genie in the mouth, just because it wasn't magic.

So, now that I've got more time, it's up to me to do superheroic things with it. Once I finish my Halloween costume.

This. Holy cow, I worried I was the only one who felt a bit of a letdown during minicamp and then started noticing afterwards that my ways of dealing with problems had suddenly become more effective.
OK, those count as benefits. We shouldn't just give all the credit to the lifehacking community, since LW/SI successfully got you to implement lifehacking techniques. Of course, anything can be called instrumentally rational if it works, but I wonder how other approaches compare to explicit rationality in successfully convincing oneself to lifehack . For example, the sort of motivational techniques used for salespeople.
I'm not sure. One thing that worked pretty well for me at minicamp was that the instructors were pretty meticulous about describing levels of confidence in different hacks. Everything from "Here are some well-regarded, peer reviewed studies you can look at" to "It's worked pretty well for us, and most of the people who've tried, and here's how we think it fits into what we know about the brain" to "we don't know why this works, but it has for most people, so we think it's worth trying out, so make sure you tell us if you try and get bupkis so we're hearing about negative data" to "this is something that worked for me that you might find useful." I think this is a pretty audience-specific selling point, but it did a great job of mitigating the suspicious-seeming levels of enthusiasm most lifehackers open with.
How are you both posting more to your blog, and spending less time writing?

I'm writing faster when I work in pomodoros and when I write on the train on the long schlep to aikido.

Where I just broke my toe. Oh no, negative utility alert!
This topic has been raised dozens of times before, but the stories are scattered. Here's a sampling: * Louie's What I've Learned from Less Wrong * cousin_it on how LW helps him notice bullshit * FrankAdamek on gains from LW * cata's group rationality diary thread contains lots of stories of people benefiting from applying the lessons learned in rationality camps * A couple people have posted about how LW deconverted them from their religions, but I can't recall where But also see this comment from Carl Shulman.

That comment of mine was from 2010 and I disagree with it now. My current opinion is better expressed in the "Epiphany addiction" post and comments.

Are you saying you now don't think LW is "useful for noticing bullshit and cutting it away from my thoughts", or that the value of doing this isn't as high as you thought?

Looking back today, the improvement seems smaller than I thought then, and LW seems to have played a smaller role in it.

I used to be very skeptical of Eliezer's ideas about improving rationality when he was posting the Sequences, but one result that's hard to deny is that all of a sudden there is a community of people who I can discuss my decision theory ideas with, whereas before that I seemingly couldn't get them across to anyone except maybe one or two people, even though I had my own highly active mailing list.

I'd say that being able to achieve this kind of subtle collective improvement in philosophical ability is already quite impressive, even if the effect is not very dramatic in any given individual. (Of course ultimately the improvement has to be graded against what's needed to solve FAI and not against my expectations, and it seems to still fall far short of that.)

It's indeed nice to have a community that discusses decision-theoretic ideas, but a simpler explanation is that Eliezer's writings attracted many smart folks and also happened to make these ideas salient, not that Eliezer's writings improved people's philosophical ability.

6Wei Dai12y
Attracting many smart folks and making some particular ideas salient to them is no mean feat in itself. But do you think that's really all it took? That any group of smart people, if they get together and become interested in some philosophical topic, could likely make progress instead of getting trapped in a number of possible ways?
I think it's always helpful when a community has a vernacular and a common library of references. It's better if the references are unusually accurate, but even bland ones might still speed up progress on projects.
Eliezer's writings were certainly the focus of my own philosophical development. The current me didn't exist before processing them, and was historically caused by them, even though it might have formed on its own a few years later.
Hmm. Thanks for that update. I had been considering earlier today that since I started reading lesswrong I noticed a considerable increase in my ability to spot and discern bullshit and flawed arguments, without paying much attention to really asking myself the right questions in order to favor other things I considered more important to think about. Reading this made me realize that I've drawn a conclusion too early. Perhaps I should re-read those "epiphany addiction" posts with this in mind.
Thanks. In most of those links, the author says that he gained some useful mental tools, and maybe that he feels better. That's good. But no one said that rationality helped them achieve any goal other the goal of being rational. For example: * Launch a successful startup * Get a prestigious job * Break out of a long-term abusive relationship. * Lose weight (Diets are discussed, but I don't see that a discussion driven by LW/SI-rationality is any more successful in this area than any random discussion of diets.) * Get lucky in love (and from what I can tell, the PUAs do have testimonials for their techniques) * Avoid akrasia (The techniques discussed are gathered from elsewhere; so to the extent that rationality means "reading up on the material," the few successes attested in this area can count as confirmation.) * Break an addiction to drugs/gambling. ... and so on. Religious deconversion doesn't count for the purpose of my query unless the testimonial describes some instrumental benefit. Carl's comment about the need for an experiment is good; but if someone can just give a testimonial, that would be a good start!
There's also Zvi losing weight with TDT. :)
Losing weight is a core human value?
Thanks, I edited it.
I think LW-style thinking may have helped me persist better at going to the gym (which has been quite beneficial for me) than I otherwise would have, but obviously it's hard to know for sure.
Or even better: * "I used to buy lottery tickets every day but now I understand the negative expectation of the gamble and the diminishing marginal utility of the ticket, so I don't." * A doctor says "I now realize that I was giving my patients terrible advice about what it meant when a test showed positive for a disease. Now that I have been inducted into the Secret Order of Bayes, My advice on that is much better now." .... etc.
July minicamper here. My own life has had enough variance in the past few months over many variables (location, job, romantic status) with too many exogenous variables for me to be very confident about the effect of minicamp, aside from a few things (far fewer totally wasted days than I used to suffer from what I saw as being inescapably moody). But I've gained an identifiable superpower in the realm of talking helpfully to other people by modeling their internal conflicts more accurately, by steering them toward "making deals with themselves" rather than ridiculous memes like "using willpower", and by noticing confusion and getting to the root of it via brainstorming and thought experiments. And the results have absolutely floored people, in three different cases. If you're worried about epiphany addiction, then I suppose you might label me a "carrier" (although there's the anomalous fact that friends have followed through on my advice to them after talking to me).
Great, I'd love to have that superpower!
I think the probability of my having got my current job without LW etc. is under 20%.
Subjectively I feel happier and more effective, but there's not reliable external evidence for this. I've gotten better at talking to people and interacting in positive ways thanks to using metacognition, and my views have become more consistent. Timeless thinking has helped me adopt a diet and stick to it, as well as made me start wearing my seatbelt.

While we're on the subject of decision theory... what is the difference between TDT and UDT?

Maybe the easiest way to understand UDT and TDT is:

  • UDT = EDT without updating on sensory inputs, with "actions" to be understood as logical facts about the agent's outputs
  • TDT = CDT with "causality" to be understood as Pearl's notion of causality plus additional arrows for logical correlations

Comparing UDT and TDT directly, the main differences seem to be that UDT does not do Bayesian updating on sensory inputs and does not make use of causality. There seems to be general agreement that Bayesian updating on sensory inputs is wrong in a number of situations, but disagreement and/or confusion about whether we need causality. Gary Drescher put it this way:

Plus, if you did have a general math-counterfactual-solving module, why would you relegate it to the logical-dependency-finding subproblem in TDT, and then return to the original factored causal graph? Instead, why not cast the whole problem as a mathematical abstraction, and then directly ask your math-counterfactual-solving module whether, say, (Platonic) C's one-boxing counterfactually entails (Platonic) $1M? (Then do the argmax over the respective math-counterfactual consequences of C's candidate outputs.)

(Eliezer didn't give an answer. ETA: He did answer a related question here.)

I can see what updating on sensory updating does to TDT (causing it to fail counterfactual mugging). But what does it mean to say that TDT makes use of causality and UDT doesn't? Are there any situations where this causes them to give different answers?
7Wei Dai12y
(I added a link at the end of the grandparent comment where Eliezer does give some of his thoughts on this issue.) Eliezer seems to think that causality can help deal with Gary Drescher's "5-and-10" problem: But it seems possible to build versions of UDT that are free from such problems (such as the proof-based ones that cousin_it and Nesov have explored), although there are still some remaining issues with "spurious proofs" which may be related. In any case, it's unclear how to get help from the notion of causality, and as far as I know, nobody has explored in that direction and reported back any results.
I'm not an expert but I think this is how it works: Both decision theories (TDT and UDT) work by imagining the problem from the point of view of themselves before the problem started. They then think "From this point of view, which sequence of decisions would be the best one?", and then they follow that sequence of decisions. The difference is in how they react to randomness in the environment. When the algorithm is run, the agent is already midway through the problem, and so might have some knowledge that it didn't have at the start of the problem (e.g. whether a coinflip came up heads or tails). When visualising themselves at the start of the problem TDT assumes they have this knowledge, UDT assumes they don't. An example is Counterfactual Mugging: TDT visualises itself before the problem started, knowing that the coin the coin will come up tails. From this point of view the kind of agent that does well is the kind that refuses to give $100, and so that's what TDT does. UDT visualises itself before the problem started, and pretends it doesn't know what the coin does. From this point of view the kind of agent that does well is the kind that gives $100 in the case of tails, so that's what UDT does.
Why do we still reference TDT so much if UDT is better?
Many people think of UDT as being a member of the "TDT branch of decision theories." And in fact, much of what is now discussed as "UDT" (e.g. in A model of UDT with a halting oracle) is not Wei Dai's first or second variant of UDT but instead a new variant of UDT sometimes called Ambient Decision Theory or ADT.
Follow-up: Is it in how they compute conditional probabilities in the decision algorithm? As I understand it, that's how CDT and EDT and TDT differ.
I don't think that is how CDT and EDT differ, actually. Instead, it's that EDT cares about conditional probabilities and CDT doesn't. For instance, in Newcomb's problem, a CDT agent could agree that his expected utility is higher conditional on him one-boxing than it is conditional on him two-boxing. But he two-boxes anyway because the correlation isn't causal. A guess TDT/UDT does compute conditional probabilities differently in the sense that they don't pretend that their decisions are independent of the outputs of similar algorithms.

Why haven't SI and LW attracted or produced any good strategists? I've been given to understand (from someone close to SI) that various people within SI have worked on Singularity strategy but only produced lots of writings that are not of an organized, publishable form. Others have attempted to organize them but also failed, and there seems to be a general feeling that strategy work is bogged down or going in circles and any further effort will not be very productive. The situation on LW seems similar, with people arguing in various directions without much feeling of progress. Why are we so bad at this, given that strategic thinking must be a core part of rationality?

There are some but not lots of "writings" produced internally by SingInst that are not available to the public. There's lots of scribbled notes and half-finished models and thoughts in brains and stuff like that. We're working to push them out into written form, but that takes time, money, and people — and we're short on all three. The other problem is that to talk about strategy we first have to explain lots of things that are basic (to a veteran like you but not to most interested parties) in clear, well-organized language for the first time, since much of this hasn't been done yet (these SI papers definitely help, though: 1, 2, 3, 4, 5, 6). To solve this problem we are (1) adding/improving lots of articles on the LW wiki like you suggested a while back (you'll see a report on what we did, later), and (2) working on the AI risk wiki (we're creating the map of articles right now). Once those resources are available it will be easier to speak clearly in public about strategic issues. We hit a temporary delay in pushing out strategy stuff at SI because two of our most knowledgable researchers & strategists became unavailable for different reasons: Anna took over launching CFAR and Carl took an extended (unpaid) leave of absence to take care of some non-SI things. Also, I haven't been able to continue my own AI risk strategy series due to other priorities, and because I got to the point where it was going to be a lot of work to continue that sequence if I didn't already have clear, well-organized write-ups of lots of standard material. (So, it'll be easier for me to continue once the LW wiki has been improved and once the AI risk wiki exists, both of which we've got people working on right now.) Moreover, there are several papers in the works — mostly by Kaj (who is now a staff researcher), with some help from myself — but you won't see them for a while. You did see this and this, however. Those are the product of months of part-time work from several remote resear
7Wei Dai12y
Luke, with the existing people at SI and FHI's disposal, how long do you think it would take (assuming they're not busy with other projects) to produce a document that lays out a cogent argument for some specific Singularity strategy? An argument that takes into account all of the important considerations that have already been raised (for example my comment that Holden quoted)? I will concede that strategy work is not bogged down if you think it can be done in a reasonable time frame. (2 years, perhaps?) But if SI and FHI are merely producing writings that explain the strategic considerations, but which we can't foresee forming into an overall argument for some specific strategy, that seems very weak evidence at best against my claim that we are bad at strategic thinking.
I know that FHI plans to produce a particular set of policy recommendations relevant to superintelligence upon the release of Nick's book or shortly thereafter. FHI has given no timeline for Nick's book but I expect it to be published in mid or late 2013. The comparably detailed document from SI will be the AI risk wiki. We think the wiki format makes even more sense than a book for these purposes, though an OUP book on superintelligence from Nick Bostrom sounds great to us. Certainly, we will be busy with other projects, but even still I think the AI risk wiki (a fairly comprehensive version 1.0, anyway) could be finished within 2 years. I'm not that confident it will be finished in 2 years, though, given that we've barely begun. Six months from now I'll be more confidently able to predict the likelihood of finishing the AI risk wiki version 1.0 within 2 years. Despite this, I would describe the current situation as "bogged down" when it comes to singularity strategy. Luckily, the situation is changing due to 2 recent game-shifting events: (1) FHI decided to spend a few years focusing on AI risk strategy while Nick wrote a monograph on the subject, and (2) shortly thereafter, SI began to rapidly grow its research team (at first, mostly through part-time remote researchers) and use that team to produce a lot more research writing than before (only a small fraction of which you've seen thus far). And no, I don't know in advance what strategic recommendations FHI will arrive at, nor which strategic recommendations SI's scholarly AI risk wiki will arrive at, except to say that SI's proposals will probably include Friendly AI research as one of the very important things humanity should be doing right now about AI risk. ETA: My answer to your original question — "Why haven't SI and LW attracted or produced any good strategists?" — is that it's very difficult and time-consuming to acquire all the domain knowledge required to be good at singularity strategy, especially
6Wei Dai12y
What methodology will be used to produce SI's strategic recommendations (and FHI's, if you know the answer)? As far as I can tell, we currently don't have a way to make the many known strategic considerations/arguments commensurable (e.g., suitable for integrating into a quantitative strategic framework) except by using our intuitions which seem especially unreliable on matters related to Singularity strategy. The fact that you think the AI risk wiki can be finished in 2 years seems to indicate that you either disagree with this evaluation of the current state of affairs, or think we can make very rapid progress in strategic reasoning. Can you explain?
We certainly could integrate known strategic arguments into a quantitative framework like this, but I'm worried that, for example, "putting so many made-up probabilities into a probability tree like this is not actually that helpful." I think for now both SI and FHI are still in the qualitative stage that normally precedes quantitative analysis. Big projects like Nick's monograph and SI's AI risk wiki will indeed constitute "rapid progress" in strategic reasoning, but it will be rapid progress toward more quantitative analyses, not rapid progress within a quantitative framework that we have already built. Of course, some of the work on strategic sub-problems is already at the quantitative/formal stage, so quantitative/formal progress can be made on them immediately if SI/FHI can raise the resources to find and hire the right people to work on them. Two examples: (1) What do reasonable economic models of past jumps in optimization power imply about what would happen once we get self-improving AGI? (2) If we add lots more AI-related performance curve data to Nagy's Performance Curve Database and use his improved tech forecasting methods, what does it all imply about AI and WBE timelines?
3Wei Dai12y
There are many strategic considerations that greatly differ in nature from one another. It seems to me that at best they will require diverse novel methods to analyze quantitatively, and at worst a large fraction may resist attempts at quantitative analysis until the Singularity occurs. For example we can see that there is an upper bound on how confident a small FAI team, working in secret and with limited time, can be (assuming it's rational) about the correctness of an FAI design, due to the issue raised in my comment quoted by Holden, and this is of obvious strategic importance. But I have no idea what method we can use to derive this bound, other than to "make it up". Solving this problem alone could easily take a team several years to accomplish, so how do you hope to produce the strategic recommendations, which must take into account many such issues, in 2 years?
Two answers: 1. Obviously, our recommendations won't be final, and we'll try to avoid being overconfident — especially where the recommendations depend on highly uncertain variables. 2. In many (most?) cases, I suspect our recommendations will be for policies that play a dual role of (1) making progress in directions that look promising from where we stand now, and also (2) purchasing highly valuable information, like how feasible an NGO FAI team is, how hard FAI really is, what the failure modes look like, how plausible alternative approaches are, etc. SI, FHI, you, others — we're working on tough problems with many unknown and uncertain strategic variables. Those challenges are not unique to AI risk. Humans have many tools for doing the best they can while running on spaghetti code and facing decision problems under uncertainty, and we're gaining new tools all the time. I don't mean to minimize your concerns, though. Right now I expect to fail. I expect us all to get paperclipped (or turned off), though I'll be happy to update in favor of positive outcomes if (1) research shows the problem isn't as hard as I now think, (2) financial support for x-risk reduction increases, (3) etc.
2Wei Dai12y
I think you may have misunderstood my intent here. I'm not trying to make you more pessimistic about our overall prospects but arguing (i.e., trying to figure out) the absolute and relative importance of solving various strategic problems. Another point was to suggest that perhaps SI ought to give higher priority to recruiting/training "hero strategists" as opposed to "hero mathematicians". For example your So You Want to Save the World says: which fails to credit the importance of strategic contributions (even though later in the post there is a large section on strategic problems).
Sorry if that was unclear; I mean to identify the strategists as "philosophers", like this. As you say, I went on to include a large section on strategy. I certainly agree on the importance of strategy. Most of the research SI and FHI have done is strategic, after all — and most of the work in progress is strategic, too. I do tend to talk a lot about "hero mathematicians," though. Maybe that's because "hero mathematician" is more concrete (to me) than "hero strategist." Anyway, it seems like we may be failing to disagree on anything, here.
0Wei Dai12y
I see. I had interpreted you to mean philosophers as part of a team to build FAI. What do you mean by "more concrete", and do you think it's a good reason to talk a lot more about "hero mathematicians"?
That could also be true, but I'm not sure. Re: "hero mathematicians" and "hero strategists", here's a more detailed version of what I currently think. Result of saying we need "hero mathematicians"? A few mathematicians (perhaps primed by HPMoR to be rationality heroes) come to us and learn what the technical research program looks like, help put our memes into the math community, etc. Result of saying we need "hero strategists"? I'm inundated with people who say they can contribute to singularity strategy after thinking about the issues for one month and reading less than 100 pages on the subject. SI staff wastes valuable time trying to steer amateur strategists along more valuable paths before giving up due to low ROI. Basically, the recruiting problem is different for mathematicians and strategists, and I think these problems can be tackled more effectively by tackling them separately. Mathematicians can prove themselves useful rather quickly, by offering constructive comments on the problems we will (in the next 12 months) have written up somewhat formally, or by spreading our memes in their research communities. But to tell whether someone can be a useful strategist they need to read 500 pages of material and spend months chatting regularly with SI and/or FHI, and that's very costly for both them and for SI+FHI. The best result might be if some of the mathematicians themselves turn out to be good strategists. I don't know that I can count on that, but for example I already count both you and Paul Christiano as among the few strategists whose strategy work I would spend my time reading, even though your primary life work has been in math and compsci (and not, say, civil engineering, business management, political science, or economics).
2Wei Dai12y
You could direct them to LW and let them prove their mettle here?
I just tried to picture what "hero strategist" could mean, if distinct from 'person who knows LW rationality' or 'practical guy like Luke'. I came up with someone who could hire the world's best mathematicians plus a professional cat-herder and base the strategy on the result.
So, you're currently thinking hard about the best way to approach someone like Terence Tao? (Doesn't have to be him, someone else's blog might also have comments and give you a better opportunity to raise the issue.)
Actually, yes. We had a meeting about that a couple weeks ago. Tao was specifically named. :)
I love working on problems like these. If the specifics of the "circles" are written down anywhere, or if you care to describe them, I'd happily give it a whack. I won't claim to be an expert, but I enjoy complex problem solving tasks like this one too much not to offer.
1Wei Dai12y
My understanding is that there is a lot of writings produced internally by SingInst that are not available to the public. If you think you are up to the task of organizing and polishing them into publishable form (which I guess probably also requires filling in lots of missing pieces) you should contact them and volunteer. (But I guess they probably don't want to hand their unfinished work to just anyone so you'll have to prove yourself somehow.) If you just want to get an idea of the issues involved, here are some of my writings on the topic: * * * And see also the unfinished AI risk sequence by lukeprog.
From time to time people ask lukeprog about SI writings. He talk about AI papers they are working, and at some point, for the sake of security, he stop and says "it's confidential" or something similar. Evaluating who can do a good work is importante, 60% fail, besides the historical aversion to formated papers. Note: I'm still waiting EY books.

I finally decided it's worth some of my time to try to gain a deeper understanding of decision theory...

Question: Can Bayesians transform decisions under ignorance into decisions under risk by assuming the decision maker can at least assign probabilities to outcomes using some kind of ignorance prior(s)?

Details: "Decision under uncertainty" is used to mean various things, so for clarity's sake I'll use "decision under ignorance" to refer to a decision for which the decision maker does not (perhaps "cannot") assign probabilities to some of the possible outcomes, and I'll use "decision under risk" to refer to a decision for which the decision maker does assign probabilities to all of the possible outcomes.

There is much debate over which decision procedure to use when facing a decision under ignorance when there is no act that dominates the others. Some proposals include: the leximin rule, the optimism-pessimism rule, the minimax regret rule, the info-gap rule, and the maxipok rule.

However, there is broad agreement that when facing a decision under risk, rational agents maximize expected utility. Because we have a clearer procedure for dealing w... (read more)

You could always choose to manage ignorance by choosing a prior. It's not obvious whether you should. But as it turns out, we have results like the complete class theorem, which imply that EU maximization with respect to an appropriate prior is the only "Pareto efficient" decision procedure (any other decision can be changed so as to achieve a higher reward in every possible world).

This analysis breaks down in the presence of computational limitations; in that case it's not clear that a "rational" agent should have even an implicit representation of a distribution over possible worlds (such a distribution may be prohibitively expensive to reason about, much less integrate exactly over), so maybe a rational agent should invoke some decision rule other than EU maximization.

The situation is sort of analogous to defining a social welfare function. One approach is to take a VNM utility function for each individual and then maximize total utility. At face value it's not obvious if this is the right thing to do--choosing an exchange rate between person A's preferences and person B's preferences feels pretty arbitrary and potentially destructive (just like choosing prior odds between possible world A and possible world B). But as it turns out, if you do anything else then you could have been better off by picking some particular exchange rate and using it consistently (again, modulo practical limitations).

I found several books which give technical coverage of statistical decision theory, complete classes, and admissibility rules (Berger 1985; Robert 2001; Jaynes 2003; Liese & Miescke 2010), but I didn't find any clear explanation of exactly how the complete class theorem implies that "EU maximization with respect to an appropriate prior is the only 'Pareto efficient' decision procedure (any other decision can be changed so as to achieve a higher reward in every possible world)." Do you know any source which does so, or are you able to explain it? This seems like a potentially significant argument for EUM that runs independently of the standard axiomatic approaches, which have suffered many persuasive attacks.
The formalism of the complete class theorem applies to arbitrary decisions, the Bayes decision procedures correspond to EU maximization with respect to an appropriate choice of prior. An inadmissable decision procedure is not Pareto efficient, in the sense that a different decision procedure does better in all possible worlds (which feels analogous to making all possible people happier). Does that make sense? There is a bit of weasel room, in that the complete class theorem assumes that the data is generated by a probabilistic process in each possible world. This doesn't seem like an issue, because you just absorb the observation into the choice of possible world, but this points to a bigger problem: If you define "possible worlds" finely enough, such that e.g. each (world, observation) pair is a possible world, then the space of priors is very large (e.g., you could put all of your mass on one (world, observation) pair for each observation) and can be used to justify any decision. For example, if we are in the setting of AIXI, any decision procedure can trivially be described as EU maximization under an appropriate prior: if the decision procedure outputs f(X) on input X, it corresponds to EU maximization against a prior which has the universe end after N steps with probability 2^(-N), and when the universe ends after you seeing X, you receive an extra reward if your last output was f(X). So the conclusion of the theorem isn't so interesting, unless there are few possible worlds. When you argue for EUM, you normally want some stronger statement than saying that any decision procedure corresponds to some prior.
That was clear. Thanks!

What AlexMennen said. For a Bayesian there's no difference in principle between ignorance and risk.

One wrinkle is that even Bayesians shouldn't have prior probabilities for everything, because if you assign a prior probability to something that could indirectly depend on your decision, you might lose out.

A good example is the absent-minded driver problem. While driving home from work, you pass two identical-looking intersections. At the first one you're supposed to go straight, at the second one you're supposed to turn. If you do everything correctly, you get utility 4. If you goof and turn at the first intersection, you never arrive at the second one, and get utility 0. If you goof and go straight at the second, you get utility 1. Unfortunately, by the time you get to the second one, you forget whether you'd already been at the first, which means at both intersections you're uncertain about your location.

If you treat your uncertainty about location as a probability and choose the Bayesian-optimal action, you'll get demonstrably worse results than if you'd planned your actions in advance or used UDT. The reason, as pointed out by taw and pengvado, is that your probability of arriving at the second intersection depends on your decision to go straight or turn at the first one, so treating it as unchangeable leads to weird errors.

"Unchangeable" is a bad word for this, as it might well be thought of as unchangeable, if you won't insist on knowing what it is. So a Bayesian may "have probabilities for everything", whatever that means, if it's understood that those probabilities are not logically transparent and some of the details about them won't necessarily be available when making any given decision. After you do make a decision that controls certain details of your prior, those details become more readily available for future decisions. In other words, the problem is not in assigning probabilities to too many things, but in assigning them arbitrarily and thus incorrectly. If the correct assignment of probability is such that the probability depends on your future decisions, you won't be able to know this probability, so if you've "assigned" it in such a way that you do know what it is, you must have assigned a wrong thing. Prior probability is not up for grabs etc.
The prior probability is unchangeable. It's just that you make your decision based on the posterior probability taking into account each decision. At least, that's what you do if you use EDT. I'm not entirely familiar with the other decision theories, but I'm pretty sure they all have prior probabilities for everything.

So if you're a Bayesian decision-maker, doesn't that mean that you only ever face decisions under risk, because at they very least you're assigning ignorance priors to the outcomes for which you're not sure how to assign probabilities?

Correct. A Bayesian always has a probability distribution over possible states of the world, and so cannot face a decision under ignorance as you define it. Coming up with good priors is hard, but to be a Bayesian, you need a prior.

Bayesian decisions cannot be made under an inability to assign a probability distribution to the outcomes. As mentioned, you can consider a Bayesian probability distribution of what the correct distributions will be; if you have no reason to say which state, if any, is more probable, then they have the same meta-distribution as each other: If you know that a coin is unfair, but have no information about which way it is biased, then you should divide the first bet evenly between heads and tails, (assuming logarithmic payoffs). It might make sense to consider the Probability distribution of the fairness of the coin as a graph: the X axis, from 0-1 being the chance of each flip coming up heads, and the Y axis being the odds that the coin has that particular property; because of our prior information, there is a removable discontinuity at x=1/2. Initially, the graph is flat, but after the first flip it changes: if it came up tails, the odds of a two-headed coin are now 0, the odds of a .9999% heads coin are infinitesimal, and the odds of a tail-weighted coin are significantly greater: Having no prior information on how weighted the coin is, you could assume that all weightings (except fair) are equally likely. After the second flip, however, you have information about what the bias of the coin was- but no information about whether the bias of the coin is time-variable, such that it is always heads on prime flips, and always tails on composite flips. If you consider that the coin could be rigged to a sequence equally likely as that the result of the flip could be randomly determined each time, then you have a problem. No information can update some specific lacks of a prior probability.
This reminds me of a recent tangent on Kelly betting. Apparently it's claimed that the unusalness of this optimum betting strategy shows that you should treat risk and ignorance differently - but of course the difference between the two situations is entirely accounted for by two different conditional probability distributions. So you can sort of think of situations (that is, the probability distribution describing possible outcomes) as "risk-like" or "ignorance-like."
If you're talking about what I think you're talking about, then by "risk", you mean "frequentist probability distribution over outcomes", and by "ignorance", you mean "Bayesian probability distribution over what the correct frequentist probability distribution over outcomes is", which is not the way Luke was defining the terms.

This question may come off as a bit off topic : people often say cryonics is a scam. Which is the evidence for that, and to the contrary? How should I gather it?

The thing is, cryonics is a priori awfully suspect. It appeal to one of our deepest motive (not dying), is very expensive, has unusual payment plans, and is just plain weird. So the prior of it being a scam designed to rip us off is quite high. On the other hand, reading about it here, I acquired a very strong intuition that it is not a scam, or at least that Alcor and CI are serious. The problem is, I don't have solid evidence I can tell others about.

Now, I doubt the scam argument is the main reason why people don't buy it. But I'd like to get that argument out of the way.

I think cryonics is more likely to be a mistake than a scam, but that might just be my general belief that incompetence is much more common than malice.
I think there is a very good chance some cryonics organizations are in fact scams.
Good. Is this just an intuition, or can you communicate more precise reasons? A list of red flags could be useful (whether they are present or not).

Alcor: Improperly trained personnel, unkempt and ill-equipped facilities.

[...] Saul Kent invited me over to his home in Woodcrest, California to view videotapes of two Alcor cases which troubled him – but he couldn’t quite put his finger on why this was so.[...] Patients were being stabilized at a nearby hospice, transported to Alcor (~20 min away) and then CPS was discontinued, the patients were placed on the OR table and, without any ice on their heads, they were allowed to sit there at temperatures a little below normal body temperature for 1 to 1.5 hours, while burr holes were drilled, [...] smoke could be seen coming from the burr wound! Since the patient had no circulation to provide blood to carry away the enormous heat generated by the action of the burr on the bone, the temperature of the underlying bone (and brain) must have been high enough to literally cook an egg. In one case, a patient’s head was removed in the field and, because they had failed to use a rectal plug, the patient had defecated in the PIB. The result was that feces had contaminated the neck wound, and Alcor personnel were seen pouring saline over the stump of the neck whilst holding the patient’s seve

... (read more)
Okay, looks like I have to lower my probability that « Alcor and CI are serious ». Now this is from over a year ago. Maybe there's some sign things have changed since? I guess not, unless they acquired some Lukeprog like leadership. I'll read the whole thing to try and determine to what extent this is incompetence, and to what extent this is scammy (for instance, dust and dirt look like incompetence, but the hardened doors with plywood roof looks a bit more suspect).
It might be difficult to tell incompetence apart from malice, moreover, it is possible to transition from one to the other: Let's say you start a cryonics organization with all good intentions, then you start running into problems: costs are higher than expected, mishaps occur during the cryopreservation process, evidence that your process is flawed starts to accumulate and you have no idea on how to fix it, etc. So what do you do? Apologize for the bad service you sold, thaw and bury the frozen corpses (since you know they are already damaged beyond repair), disband the organization and find a new job, risking to face legal action? That's what a perfectly honest person would do. But if you are not perfectly honest, you might find yourself hiding or downplaying technical issues, cutting the costs at the expense of service quality, using deceitful marketing strategies, and so on. Maybe you could rationalize that the continued existence of your organization is so important that it should be preserved even at the cost of deceiving some people, maybe you could even deceive yourself into ignoring your essentially fraudolent behavior and maintain a positive self-image (if you were attracted to cryonics in the first place, chances are high that you are prone to wishful thinking). But, whatever your intentions are, at this point your business has become a de facto scam.
That's a mighty low bar to clear. Thank goodness CI and Alcor have standards.
Well, I have this theory that CI stores its neuropatients in the dewar with the dead cats in it.
In seriousness, it just floors me the degree to which every player worth speaking of in the field of cryonics seems to be managed (and micromanaged, at that) by Bad Decision Dinosaur. The concept of suspended animation is not inherently crackpot material; the idea that clinical death and information-theoretic death are different things (with implications for comparative medical treatment in different eras) is actually kind of profound -- yet the history of cryonics is a sordid tale full of expensive boondoggles, fraud, ethical nightmares and positively macabre events. And that's the stuff cryonicists will admit to! Look at that Alcor case: the only way I can avoid shuddering is by imagining it set to Yakety Sax.
To the best of my knowledge, doctors don't experiment on patients without their consent, drill burr holes without circulation, or generally just do anything they want without fear of prosecution (Since cryonics is considered a form of interment, whether the person was completely turned into a glass sculpture or straight-frozen like so many people were does not affect the organizations). Doctors may forget rectal plugs or leave patients if funds are unavailable, though. What do you define as 'very recently'?
Sure, if you leave out the much longer history and ignore that it was substantially leavened with good faith efforts to restore health, arrest decline and reduce suffering, a substantial number of which also succeed. (As for "until very recently" -- flagrant abuse still happens in medicine, that's not a thing that recently stopped happening. What I'm saying is that this simply means medicine isn't special as an endeavor... whereas cryonics seems to have little to show for it other than that some bodies are, in fact, vitrified or just garden-variety frozen, depending, many of them even standing a good chance of being reasonably intact after going through the handling process. There's such a vast asymmetry between the two fields; if they were really that comparable, most doctors would be this guy.
Things people are willing to pay lots of money for are a strong signal to unscrupulous people. Examples abound of people doing scams as investment advice, counterfeiting art, or selling knock-off designer jewelry. Cryonics is something where you pay a lot of money for a service many years down the line. Someone could easily take in cryonics payments for years without ever having to perform a cryopreservation, and only have it become known after they've disappeared with the profits. Alternately, the impossibility of checking results means that a cryonics provider can profit off of shoddy service and equipment, and you might never realize. On these lines, any organization that is unwilling to let you inspect their preservation equipment etc. is suspect in my eyes. Cryonics organizations are also susceptible to drift in motives of their owners. Maybe the creators 10 years ago were serious about cryonics, but if the current CEO or board of directors cares more about optimizing cheap equipment and profits, then that group might become a de facto scam.

If I understand correctly, I can extract those flags, in descending order of redness:

  • Their cryopreservation facility does not exist (yet).
  • Their cryopreservation facility is not open to scrutiny.
  • Governance shows signs of "for profit" behaviour, or fail to demonstrate "non profit" behaviour.
  • Governance merely changed, while you trusted the previous one.

That also suggest signs of trustworthiness:

  • Their cryopreservation facility exists and is open to scrutiny.
  • This is a non profit with open and clean accounts.
  • They are researching or implementing technical improvements.

I'd like to have more such green and red flags, but this is starting to look actionable. Thank you.

One strong signal that I think some cryonics orgs implement is preferentially hiring people who have family members in storage.
Or pets.
In the longer run, the governance of a cryo organization should be designed to try and prevent drift. I like how Alcor requires board members to be signed up as well as to have relatives or significant others signed up, but this still doesn't work against someone who's actually unscrupulous.

Question: Why don't people talk about Ems / Uploads as just as disastrous as uncontrolled AGI? Has there been work done or discussion about the friendliness of Ems / Uploads?

Details: Robin Hanson seems to describe the Em age like a new industrial revolution. Eliezer seems to, well, he seems wary of them but doesn't seem to treat them like an existential threat. Though Nick Bostrom sees them as an existential threat. A lot of people on Lesswrong seem to talk of it as the next great journey for humanity, and not just a different name for uFAI. For my pa... (read more)

6Wei Dai12y
* * Why can't the first upload FOOM, but in a nice way? Some people suggest uploads only as a stepping stone to FAI. But if you read Carl's paper (linked above) there are also ideas for how to create stable superorganisms out of uploads that can potentially solve your regulation problem.
Thank you for the links, they were exactly what I was looking for. As for friendly upload FOOMs, I consider the chance of them happening at random about equivalent to FIA happening at random.
6Wei Dai12y
(I guess "FIA" is a typo for "FAI"?) Why talk about "at random" if we are considering which technology to pursue as the best way to achieve a positive Singularity? From what I can tell, the dangers involved in an upload-based FOOM are limited and foreseeable, and we at least have ideas to solve all of them: 1. unfriendly values in scanned subject (pick the subject carefully) 2. inaccurate scanning/modeling (do a lot of testing before running upload at human/superhuman speeds) 3. value change as a function of subjective time (periodic reset) 4. value change due to competitive evolution (take over the world and form a singleton) 5. value change due to self-modification (after forming a singleton, research self-modification and other potentially dangerous technologies such as FAI thoroughly before attempting to apply them) Whereas FAI could fail in a dangerous way as a result of incorrectly solving one of many philosophical and technical problems (a large portion of which we are still thoroughly confused about) or due to some seemingly innocuous but erroneous design assumption whose danger is hard to foresee.
Wei, do you assume uploading capability would stay local for long stretches of subjective time? If yes, why? (WBE seems to require large-scale technological development, which I'd expect to be fueled by many institutions buying the tech and thus fueling progress -- compare genome sequencing -- so I'd expect multiple places to have the same currently-most-advanced systems at any point in time, or at least being close to the bleeding edge.) If no, why expect the uploads that go FOOM first to be ones that work hard to improve chances of friendliness, rather than primarily working hard to be the first to FOOM?
0Wei Dai12y
No, but there are ways for this to happen that seem more plausible to me than what's needed for FAI to be successful, such as a Manhattan-style project by a major government that recognizes the benefits of obtaining a large lead in uploading technology.
Ok, thanks for clarifying!
This one is a little silly. Humans get hijacked by meme-viruses as well, all the time; it does cause problems, but mostly other humans manage to keep them line. But as for the rest, yes, I agree with you that an upload scenario would have huge risks as well. Not to mention the fact that there might be a considerable pressure towards uploads merging together and ceasing to be individuals in any meaningful sense of the term. Humanity's future seems pretty hopeless to me.
Human uploads have been discussed as dangerous. But a friendly AI is viewed as an easier goal than a friendly upload, because an AI can be designed.
Now, I have to admit I'm not too familiar with the local discourse re:uploading, but if a functional upload requires emulation down to individual ion channels (PSICS-level) and the chemical environment, I find it hard to believe we'll have the computer power to do that, a million times faster, and in a volume of space small enough that we don't have to put it under a constant waterfall of liquid Helium. I don't expect femtotechnology or rod logic any time soon, the former may not even be possible at all and the latter is based on some dubious math from Nanosystems; so where does that leave us in terms of computing power? (Assuming, of course, that Clarke's law is a wish-fulfilling fantasy). I understand the reach of Bremermann's Limit, but it may not be possible to reach it, or there may be areas in between zero and the Limit that are unreachable for lack of a physical substrate for them.
Ems have similar human psychology, with adds. I presume they can't escalate well as AIs, even in coalescence cases. Possible dangers come with the same dangers of now but with more structural changes. If they have artificial agents in they realm, some cheap nanotech, etc. Conflicts has costs too.

Person A and B hold a belief about proposition X.

Person A has purposively sought out, and updated, on evidence related to X since childhood.

Person B has sat on her couch and played video games.

Yet both A and B have arrived at the same degree-of-belief in proposition X.

Does the Bayesian framework equip its adherents with an adequate account of how Person A should be more confident in her conclusion than Person B?

The only viable answer I can think of is that every reasoner should multiply every conclusion with some measure of epistemic confidence, and re-normalize. But I have not yet encountered such a pervasive account of confidence-measurement from leading Bayesian theorists.

If X is just a binary proposition that can be true or false once and for all, and A and B have arrived at the same degree-of-belief, they are equally confident. A has updated on evidence related to X since childhood, and found that it's perfectly balanced in either direction. The only way A can be said to be "more confident" than B is that A has seen a lot of evidence already, so she won't update her conclusion upon seeing the same evidence again; on the other hand, all evidence is new to B. Things get more interesting if X is some sort of random variable. Let's say we have a bag of black and white marbles. A has seen people draw from the bag 100 times, and 50 of them ended up with white marbles. B only knows the general idea. Now, both of them expect a white marble to come up with 50% probability. But actually, they each have a probability distribution on the fraction of white marbles in the bag. The mean is 1/2 for both of them, but the distribution is flat for B, and has a sharp peak at 1/2 for A. This is what determines how confident they are. If C comes along and says "well, I drew a white marble", then B will update to a new distribution, with mean 2/3, but A's distribution will barely shift at all.
The example of stochastic evidence is indeed interesting. But I find myself stuck on the first example. If a new reasoner C were to update Pc(X) based on the testimony of A, and had an extremely high degree of confidence in her ability to generate correct opinions, he would presumably strongly gravitate towards Pa(X). Alternatively, suppose C is going to update Pc(X) based on the testimony of B. Further, C has evidence outlining B's apathetic proclivities. Therefore, he would presumably only weakly gravitate towards Pb(X). The above account may be shown to be confused. But if it is not, why can C update based on evidence of infomed-belief, but A and B are precluded from similarly reflecting on their own testimony? Or, if such introspective activity is not non-normative, should they not strive to perform such an activity consistently?
They essentially have already updated on their own testimony.
Okay. I'm assuming everyone has the same prior. I'm going to start by comparing the case where C talks to A and learns everything A knows, to the case where C talks to B and learns everything B knows; that is, when C ends up conditioning on all the same things. If you already see why those two cases are very different, you can skip down to the second section, where I talk about what this implies about how C updates when just hearing that A knows a lot and what Pa(X) is, compared to how he updates when learning what B thinks. It's the same scenario as you described: knowlegable A, ignorant B, Pa(X) = Pb(X). What happens when C learns everything B knows depends on what evidence C already has. If C knows nothing, then after talking to B, Pc(X) = Pb(X), because he'll be conditioning on exactly the same things. In other words, if C knows nothing, then C is even more ignorant than B is. When he talks to B, he becomes exactly as ignorant as B is, and assigns the probability that you have in that state of ignorance. It's only if C already has some evidence that talking to A and talking to B becomes different. As Kindly said, Pa(X) is very stable. So once C learns everything that A knows, C ends up with the probability Pa(X|whatever C knew), which is probably a lot like Pa(X). To take an extreme case, if A is well-informed enough, then she already knows everything C knows, and Pa(X|whatever C knew) is equal to Pa(X), and C comes out with exactly the same probability as A. But if C's info is new to A, then it's probably a lot like telling your biochemistry professor about a study that you read weighing in on one side of a debate: she's seen plenty of evidence for both sides, and unless this new study is particularly conclusive, it's not going to change her mind a whole lot. However, B's probability is not stable. That biochemistry study might change B's mind a lot, because for all she knows, there isn't even a debate, and she has this pretty good evidence for one side of

When discussing the repugnanat conclusion, Eliezer commented:

I have advocated that "lives barely worth living" always be replaced with "lives barely worth celebrating" in every discussion of the 'Repugnant' Conclusion, to avoid equilibrating between "lives almost but not quite horrible enough to imply that a pre-existing person should commit suicide despite their intrinsic desire to live" versus "lives which we celebrate as good news upon learning about them, and hope to hear more such news in the future, but only to a

... (read more)
Nick Bostrom in Infinite Ethics terms this "the causal approach" to the problem of infinities, and comments: ...though this might not be relevant to Eliezer's actual reasons to reject total utilitarianism, because infinite ethics a la Bostrom would make average utilitarianism just as infeasible:
The domain of a utility function is possible states of the world. The whole world, not just the parts you can physically affect. Some utility functions (such as total utilitarianism) can be factored into an integral over spacetime (and over other stuff for Tegmark IV) of some locally-supported function, and some can't. If you have a non-factorable utility function, then even if the world is partitioned into non-interacting pieces x and y and you're in x, the value of y still affects ∂U/∂x, and is thus relevant to decisions.

Do you know any game (video or board game, singleplayer or multiplayer, for adults or kids, I'm interested in all) that makes good use of rationality skills, and train them ?

For example, we could imagine a "Trivial Pursuit" game in which you give your answer, and how confident you're in it. If you're confident in it, you earn more if you're right, but you lose more if you're wrong.

Role-playing games do teach quite some on probabilities, it helps "feel" what is a 1% chance, or what it means to have higher expectancy but higher deviation. Card games like poker probably do too, even if I never played much poker.


The board game "Wits and Wagers" might qualify for what you are looking for. Game play is roughly as follows: A trivia question is asked and the answer is always a number (e.g., "How many cups of coffee does the average American drink each year?", "How wide, in feet, is an American football field?"). All the players write their estimate on a slip of paper and then then they are arranged in numerical order on the board. Everybody then places a bet on the estimate they like the best (it doesn't have to be your own). The estimates near the middle have a low payback (1:1, 2:1) and the estimates near the outside have a larger payback (4:1). If your estimate is closest to the actual number or if you bet on that one, will get a payback on your bet.

I'll second Wits and Wagers.Great for learning how to calibrate yourself.
Zendo, Nomic, Eleusis, Master Mind ... and yeah, probably Poker for probability. Petals around the rose: I asked a similarish question here: - for example, these games for simple stats.
Settlers of Catan isn't a rationality game, but it's great for teaching economics. I play Catan with my little brothers and it has helped them understand concepts like comparative advantage, supply and demand, cartels, opportunity cost, time value of money, and derivatives markets. Just make sure you play with everyone showing what resource cards they have, instead of keeping their resource cards hidden. More interesting trades that way.

In the discussion about AI-based vs. upload-based singularities, and the expected utility of pushing for WBE (whole-brain emulation) first, has it been taken into account that an unfriendly AI is unlikely to do something worse than wiping out humanity, while the same isn't necessarily true in an upload-based singularity? I haven't been able to find discussion of this point, yet (unless you think that Robin's Hardscrapple Frontier scenario would be significantly worse than nonexistence, which it doesn't feel like, to me).

[ETA: To be clear, I'm not trying to... (read more)

5Wei Dai12y
"Yes" in the sense that people are aware of the argument, which goes back at least as far as Vernor Vinge, 1993, but "no" in the sense that there are also arguments that it may not be highly unlikely that a failed attempt at FAI will be worse than extinction (especially since some of the FAI proposals, such as Paul Christiano's, are actually very closely related to uploading), and also "no" in the sense that we don't know how to take into account considerations like this one except by using our intuitive judgments which seem extremely unreliable.
The non-negligible chance of waking up to a personal hell-world (including, partial+failed revivification) is the main non-akratic reason I'm not signed up for cryonics. I currently think AGI is coming sooner than WBE, but if WBE starts pulling ahead then I would be even more disinclined to sign up for cryonics. Wei, do you know of any arguments better than XiXiDu's that a failed attempt at FAI could very well be worse than extinction?
3Wei Dai12y
I'm not aware of an especially good writeup, but here's a general argument. Any attempt to build an AGI induces a distribution of possible outcomes, and specifically the distribution induced by an attempt at FAI can be thought of as a circle of uncertainty around an FAI in design space. AGIs that cause worse-than-extinction outcomes are clustered around FAIs in design space. So an attempt at FAI may be more likely to hit one of these worse-than-extinction AGIs than an attempt to build an AGI without consideration of Friendliness.
Yes, that's the part I'd like to see developed more. Maybe SI or FHI will get around to it eventually, but in the meantime I wouldn't mind somebody like Wei Dai taking a crack at it.
Part of the problem in developing the argument is that you need a detailed concept of what a successful FAI design would look like, in order to then consider what similar-but-failed designs are like. One approach is to think in terms of the utility function or goal system. Suppose that a true FAI has a utility function combining some long list of elemental values with a scheme for rating their importance. Variations away from this miss an essential value, add a false value, and/or get the recipe for combining elementary values wrong. Another way to fail is to have the values right in principle but then to apply them wrongly in practice. My favorite example was, what if the AI thinks that some class of programs is conscious, when actually they aren't. It might facilitate the creation of an upload civilization which is only a simulation of utopia and not actually a utopia. It might incorrectly attach moral significance to the nonexistent qualia of programs which aren't conscious but which fake it. (Though neither of these is really "worse than extinction". The first one, taken to its extreme, just is extinction, while the worst I can see coming from the second scenario is a type of "repugnant conclusion" where the conscious beings are made to endure privation for the sake of vast sim-populations that aren't even conscious.) Still another way to conceptualize "successful FAI design", in order to then think about unsuccessful variations, is to think of the FAI as a developmental trajectory. The FAI is characterized by a set of initial conditions, such as a set of specific answers to the questions: how does it select its utility function, how does it self-modify, how does it obtain appropriate stability of values under self-modification. And then you would consider what goes wrong down the line, if you get one or more of those answers wrong.
1Wei Dai12y
I'm not sure what more can be said about "AGIs that cause worse-than-extinction outcomes are clustered around FAIs in design space". It's obvious, isn't it? I guess I could write about some FAI approaches being more likely to cause worse-than-extinction outcomes than others. For example, FAIs that are closely related to uploading or try to automatically extract values from humans seem riskier in this regard than FAIs where the values are coded directly and manually. But this also seems obvious and I'm not sure what I can usefully say beyond a couple of sentences.
FWIW, that superhuman environment-optimizers (e.g. AGIs) that obtain their target values from humans using an automatic process (e.g., uploading or extraction) are more likely to cause worse-than-extinction outcomes than those using a manual process (e.g. coding) is not obvious to me.

WRT CEV: What happens if my CEV is different than yours? What's the plan for resolving differences between different folks' CEVs? Does the FAI put us all in our own private boxes where we each think we're getting our CEVs, take a majority vote, or what?

I've asked this several times before. As far as I can make out, no (published) text answers this question. (If I'm wrong I am very interested in learning about it.) The CEV doc assumes without any proof, not just that we (or a superintelligent FAI) will find a reconciling strategy for CEV, but that such a strategy exists to be found. It assumes that there is a unique such strategy that can be defined in some way that everyone could agree about. This seems to either invite a recursion (everyone does not agree about metaethics, CEV is needed to resolve this, but we don't agree about the CEV algorithm or inputs); or else to involve moral realism.
Individuals have Volitions and (hopefully) Extrapolatable Volitions. If many people have EVs that 'agree', they interfere constructively (like waves), and that becomes part of the group's Coherant Extrapolated Volition. If they 'disagree' on some issue, they interfere distructively, and CEV has nothing to say on the issue. (I'd be nice to be able to explain this by saying that individuals have EVs but not CEVs, excelent clearly they have a degenerate case of CEV if they have an EV)
It needn't be as degenerate as all that, actually, depending on just how coherent the mechanisms generating an individual's volition(s) is/are.
Then the "coherent" qualifier does not apply, does it? Are you asking how to construct CEV from the multitude of PEVs (P for personal)? Presumably those folks whose PEV does not mind boxing will get boxed, and the rest will have to be reconciled into the CEV, if possible. Or maybe commensurate PEVs get boxed together into partial CEV worlds. The hard part is what to do with those whose PEV is incompatible with other people having different ideas from theirs. Eh, maybe not that hard. Trickery or termination is always an option when nothing better is available.

Is there a complete list of known / theoretical AI risks anywhere? I searched and couldn't find one.

I can see how the money pump argument demonstrates the irrationality of an agent with cyclic preferences. Is there a more general argument that demonstrates the irrationality of an agent with intransitive preferences of any kind (not merely one with cyclic preferences)?

I don't understand what you mean. Can you give me an example of preferences that are intransitive but not cyclic?

A little bit of googling turned up this paper by Gustafsson (2010) on the topic, which says that indifference allows for intransitive preferences that do not create a strict cycle. For instance, A>B, B>C, and C=A.

The obvious solution is to add epsilon to break the indifference. If A>B, then there exists e>0 such that A>B+e. And if e>0 and C=A, then C+e>A. So A>B+e, B+e>C+e, and C+e>A, which gives you a strict cycle that allows for money pumping. Gustafsson calls this the small-bonus approach.

Gustafsson suggests an alternative, using lotteries and applying the principle of dominance. Consider the 4 lotteries:

Lottery 1: heads you get A, tails you get B
Lottery 2: heads you get A, tails you get C
Lottery 3: heads you get B, tails you get A
Lottery 4: heads you get C, tails you get A

Lottery 1 > Lottery 2, because if it comes up tails you prefer Lottery 1 (B>C) and if it comes up heads you are indifferent (A=A).
Lottery 2 > Lottery 3, because if it comes up heads you prefer Lottery 2 (A>B) and if it comes up tails you are indifferent (C=A)
Lottery 3 > Lottery 4, because if it comes up heads you prefer Lottery 3 (B>C) and if it comes up tails you are indifferent (A=A)
Lottery 4 > Lottery 1, because if it comes up tails you prefer Lottery 4 (A>B) and if it comes up heads you are indifferent (C=A)

This is the kind of thing I was looking for; thanks!
Just in case - synchronising the definitions. I usually consider something transitive if "X≥Y, Y≥Z then X≥Z" holds for all X,Y,Z. If this holds, preferences are transitive. Otherwise, there are some X,Y,Z: X≥Y, Y≥Z, Z>X. I would call that cyclical.

Don't know if this has been answered, or where to even look for it, but here goes.

Once FAI is achieved and we are into the Singularity, how would we stop this superintelligence from rewriting its "friendly" code to something else and becoming unfriendly?

We wouldn't. However, the FAI knows that if it changed its code to unFriendly code, then unFriendly things would happen. It's Friendly, so it doesn't want unFriendly things to happen, so it doesn't want to change its code in such a way as to cause those things - so a proper FAI is stably Friendly. Unfortunately, this works both ways: an AI that wants something else will want to keep wanting it, and will resist attempts to change what it wants.

There's more on this in Omohundro's paper "Basic AI Drives"; relevant keyword is "goal distortion". You can also check out various uses of the classic example of giving Gandhi a pill that would, if taken, make him want to murder people. (Hint: he does not take it, 'cause he doesn't want people to get murdered.)

Dragging up anthropic questions and quantum immortality: suppose I am Schrodinger's cat. I enter the box ten times (each time it has a .5 probability of killing me), and survive. If I started with a .5 belief in QI, my belief is now 1024/1025.

But if you are watching, your belief in QI should not change. (If QI is true, the only outcome I can observe is surviving, so P_me(I survive | QI) = 1. But someone else can observe my death even if QI is true, so P_you(I survive | QI) = 1/1024 = P_you(I survive | ~QI).)

Aumann's agreement theorem says that if we share ... (read more)

I don't think Aumann's agreement theorem is the problem here. What does it mean for QI to be true or false? What would you expect to happen differently? Certainly, whether or not QI is true, the only outcome you can observe is surviving, so I don't see how you're updating your belief.
If QI is true, I expect to observe myself surviving. If QI is false, I expect not to be able to observe anything. I don't know exactly what that means, but I don't feel like this confusion is the problem. I think that surviving thousand-to-one odds must be strong evidence that I am somehow immortal (if you disagree, we can make it 3^^^^3-to-one), and QI is the only form of immortality that I currently assign non-neglible probability to. I briefly thought that this made QI a somehow priveleged hypothesis, because I can't observe the strongest evidence against it (my death). But I don't think that's the case, because there are other observations that would reduce my belief in QI. For example, if wavefunction collapse turns out to be a thing, I understand that would make QI much less likely. (But I don't actually know quantum mechanics beyond Eliezer's sequence, so the actual observations would be along the lines of "people who know QM saying that QI is incompatible with other observations that have been made, and appearing to know what they're talking about".)
If QI is true, you still don't observe anything in 1023/1024 of all worlds. Nothing makes the 1-in-1024 event happen in any case, you just happen to only wake up in the situation where you legitimately get to be surprised about it happening.
If QI is true then my probability of observing myself survive is 1. That's pretty much what QI is. It is true that most of my measure does not survive, but I don't think it's relevant in this case.
In 1023/1024 worlds your observer doesn't update on QI, and neither do you. In 1/1024 worlds, you update on QI and so does the version of the person you interact with. ;)
The person watching me gives 1/1024 chance of my survival, regardless of whether QI is true or false. So if I survive, he does not update his belief in QI. (That said, if I observed a 1/3^^^^3 probability, that might well increase my belief in MWI (I'm not sure if it should do, but it would be along the lines of "there's no way I would have observed that unless all possible outcomes were observed by some part of my total measure"). And I'm not sure how MWI could be true but QI false, so it would also increase my belief in QI. So maybe 1/1024 would do the same, but certainly not to anything like the same extent as personally surviving those odds.)

what is anthropic information? what is indexical information? Is there a difference?

The use of external computation (like a human using a computer to solve a math problem or an AI expanding its computational resources) is a special case of inferring information about mathematical statements from your observations about the universe.

What is the general algorithm for accomplishing this in terms of pure observations (no action, observation cycles)? How does the difficulty of the mathematical statements you can infer to be probably true relate to the amount of computation you have expended approximating solomonoff induction?

In MWI, do different Everett worlds share the same spacetime?

As far as I understand, there is still no satisfactory theory that would include both quantum mechanics and general relativity (i.e. the possibility for spacetime not to be same). I would expect that in unified theory spacetime structure would be a part of the state undergoing quantum superposition.
That's true. I was wondering what the standard claim that "MWI is just decoherence" has to say about the spacetime. Does it also decohere into multiple outcomes? If so, how? Does it require quantum gravity to understand? In this case "just decoherence" is not a valid claim.
Probably it does lose coherence. What specifically that means has to be shown in the future by working theory that accepts GR and QM as its limit cases... Whether it will be any of the current research directions called quantum gravity or something else is hard to predict. I have no intuitions here, as I am between a mathematician and a programmer and have catastrophically not enough knowledge of physics to try to predict unknown areas of it.

I'm a bit late on this, obviously, but I've had a question that I've always felt was a bit too nonsensical (and no doubt addressed somewhere in the sequences that I haven't found) to bring up but it kinda bugs me.

Do we have any ideas/guesses/starting points about whether or not "self-awareness" is some kind of weird quirk of our biology and evolution or if would be be an inevitable consequence of any general AI?

I realize that's not a super clear definition- I guess I'm talking about that feeling of "existing is going on here" and you c... (read more)

The question makes sense, but the answers probably won't. Questions like this are usually approached in an upside-down way. People assume, as you are doing, that reality is "just neurons" or "just atoms" or "just information", then they imagine that what they are experiencing is somehow "just that", and then they try to live with that belief. They will even construct odd ways of speaking, in which elements of the supposed "objective reality" are substituted for subjective or mentalistic terms, in order to affirm the belief. You're noticing that "self-awareness" or "the feeling that something is happening" or "the feeling that I exist" doesn't feel like it's the same thing as "neurons"; though perhaps you will tell yourself - as Wittgenstein may have done - that you don't actually know what being a pack of neurons should feel like, so how do you know that it wouldn't feel exactly like this? But if you pay attention to the subjective component of your thought, even when you're thinking objectively or scientifically, you'll notice that the reduction actually goes in the other direction. You don't have any direct evidence of the "objective existence" of neurons or atoms or "information". The part of reality that you do know about is always some "experience" that is "happening", which may include thoughts about an objective world, that match up in some way with elements of the experience. In other words, you don't know that there are neurons or atoms, but you can know that you are having thoughts about these hypothetical objects. If you're really good at observing and analyzing your thoughts, you may even be able to say a lot about the conscious mental activity which goes into making the thought and applying it to experience. The fundamental problem is that the physical concept of reality is obtained by taking these conscious states and amputating the subjective part, leaving only the "object" end. Clearly there is a sense in which the conscious subject is itself an
Your labeling of physicalism as an "upside-down" approach reminded me of this quote from Schopenhauer, which you would no doubt approve of: I still think it is a confused philosophy, but it is a memorable and powerful passage.
Although Schopenhauer set himself against people like Hegel, his outlook still seems to have been a sort of theistic Berkeleyan idealism, in which everything that exists owes its existence to being the object of a universal consciousness; the difference between Hegel and Schopenhauer being, that Hegel calls this universal consciousness rational and good, whereas Schopenhauer calls it irrational and evil, a cosmic Will whose local manifestation in oneself should be annulled through the pursuit of indifference. But I'm much more like a materialist, in that I think of the world as consisting of external causal interactions between multiple entities, some of which have a mindlike interior, but which don't owe their existence to their being posited by an overarching cosmic mind. I say a large part of the problem is just that physicalism employs an insufficiently rich ontology. Its categories don't include the possibility of "entity with a mindlike interior". To put it another way: Among all the entities that the world contains, are entities which we can call subjects or persons or thinking beings, and these entities themselves "contain" "ideas of objects" and "experiences of objects". My problem with physicalism is not that it refuses to treat all actual objects as ideas, or otherwise embed all objects into subjects; it is just that it tries to do without the ontological knowledge obtained by self-reflection, which is the only way we know that there are such things as conscious beings, with their specific properties. Somehow, we possess the capacity to conceive of a self, as well as the capacity to conceive of objects independent of the self. Physicalism tries to understand everything using only this second capacity, and as such is methodologically blind to the true nature of anything to do with consciousness, which can only be approached through the first capacity. This bias produces a "mechanistic, materialistic" concept of the universe, and then we wonder where the
Have you read Zen and the Art of Motorcycle Maintenance? The climactic realization is gung vzzrqvngr rkcrevrapr vf havgnel, ohg gur zvaq dhvpxyl qvivqrf vg vagb jung vf vafvqr gur frys naq bhgfvqr gur frys.

The climactic realization is gung vzzrqvngr rkcrevrapr vf havgnel, ohg gur zvaq dhvpxyl qvivqrf vg vagb jung vf vafvqr gur frys naq bhgfvqr gur frys.

...That's the sound made by a poorly maintained motorcycle.

No, I haven't read it; thanks for the recommendation.

Question on posting norms: What is the community standard for opening a discussion thread about an issue discussed in the sequences? Are there strong norms regarding minimum / maximum length? Is formalism required, or frowned on, or just optional? Thanks

The general rule seems to be "if your post is interesting and well-written enough, it's fine"; hard to say anything more specific than that. No strong norms about length (even quite short posts have been heavily upvoted), and formalism is optional, but a post that's heavy on math or formal logic will probably get less readers.

Say you start from merely the axioms of probability. From those, how do you get to the hypothesis that "the existence of the world is probable"? I'm curious to look at it in more detail because I'm not sure if it's philosophically sound or not.

Has Eliezer written about what theory of meaning he prefers? (Or does anyone want to offer a guess?)

I've also been doing searches for topics related to the singularity and space travel (this thought came up after playing a bit of Mass Effect ^ _ ^). It would seem to me that biological restrictions on space travel wouldn't apply to a sufficiently advanced AI. This AI could colonize other worlds using near speed of light travel with minimal physical payload and harvest the raw materials on some new planet using algorithms programmed in small harvesting bots. If this is possible then it seem to me that unfriendly AI might not be that much of a threat since ... (read more)

Pretty much right. We would eventually like to inhabit the currently uninhabitable planets. Terraforming, self modification, sealed colonies, or some combination of those will eventually make this feasable. At that time, we would rather that those planets not fight back. Symmetrically, an unfriendly process will not be satisfied with taking merely Mercury, Venus, Mars, Jupiter, Saturn, Uranus, Neptune, and the rest of the universe; it will want to do its thing on earth as well. The choice between "kill the humans and take over earth" and "don't kill the humans and don't take over earth" is independent of the existence of other territory, so it doesn't matter and it will kill us. (the short answer is that there is no "satisfied" or "enough" among nonhuman agents.) You mean the fermi paradox? You'll have to expand, but note that a singularity will expand at lightspeed (=we wouldn't see it until it were here), and it will consume all resources (= if it had been here, we wouldn't).

Do people in these parts think that creating new people is a moral good? If so, is it because of utilitarian considerations; i.e. "overall utility is the sum of all people's utility; therefore we should create more people?" Conversely, if you are a utilitarian who sums the utilities of all people, why aren't you vigorously combating falling birth-rates in developed countries? Perhaps you are? Perhaps most people here are not utilitarians of this sort?

That topic is full of a lot of confusion and everyone seems to have different intuitions. I for one am not proper utilitarian because it seems unnaturally simple (when we have no reason to suspect that human value should be simple). But an additional awesome person seems like a good thing, if it doesn't make the world suck more. I am (as much as I can) vigorously trying to make lots of money to fund positive-singularity research.

What's that quote, from an Ancient Greek I think, about how the first question in argument should be "What do you mean by that?" and the second question should be "And how do you know that?"

Sounds like Socrates.
I thought there was some pithy quote from him (well, Plato) about it, but I can't find the pithy quote version of the idea.

[duplicate comment; deleted]

[This comment is no longer endorsed by its author]Reply