On the subject of how an FAI team can avoid accidentally creating a UFAI, Carl Shulman wrote:

If we condition on having all other variables optimized, I'd expect a team to adopt very high standards of proof, and recognize limits to its own capabilities, biases, etc. One of the primary purposes of organizing a small FAI team is to create a team that can actually stop and abandon a line of research/design (Eliezer calls this "halt, melt, and catch fire") that cannot be shown to be safe (given limited human ability, incentives and bias).

In the history of philosophy, there have been many steps in the right direction, but virtually no significant problems have been fully solved, such that philosophers can agree that some proposed idea can be the last words on a given subject. An FAI design involves making many explicit or implicit philosophical assumptions, many of which may then become fixed forever as governing principles for a new reality. They'll end up being last words on their subjects, whether we like it or not. Given the history of philosophy and applying the outside view, how can an FAI team possibly reach "very high standards of proof" regarding the safety of a design? But if we can foresee that they can't, then what is the point of aiming for that predictable outcome now?

Until recently I haven't paid a lot of attention to the discussions here about inside view vs outside view, because the discussions have tended to focus on the applicability of these views to the problem of predicting intelligence explosion. It seemed obvious to me that outside views can't possibly rule out intelligence explosion scenarios, and even a small probability of a future intelligence explosion would justify a much higher than current level of investment in preparing for that possibility. But given that the inside vs outside view debate may also be relevant to the "FAI Endgame", I read up on Eliezer and Luke's most recent writings on the subject... and found them to be unobjectionable. Here's Eliezer:

On problems that are drawn from a barrel of causally similar problems, where human optimism runs rampant and unforeseen troubles are common, the Outside View beats the Inside View

Does anyone want to argue that Eliezer's criteria for using the outside view are wrong, or don't apply here?

And Luke:

One obvious solution is to use multiple reference classes, and weight them by how relevant you think they are to the phenomenon you're trying to predict.

[...]

Once you've combined a handful of models to arrive at a qualitative or quantitative judgment, you should still be able to "adjust" the judgment in some cases using an inside view.

These ideas seem harder to apply, so I'll ask for readers' help. What reference classes should we use here, in addition to past attempts to solve philosophical problems? What inside view adjustments could a future FAI team make, such that they might justifiably overcome (the most obvious-to-me) outside view's conclusion that they're very unlikely to be in the possession of complete and fully correct solutions to a diverse range of philosophical problems?

New to LessWrong?

New Comment
60 comments, sorted by Click to highlight new comments since: Today at 8:11 AM

In the history of philosophy, there have been many steps in the right direction, but virtually no significant problems have been fully solved, such that philosophers can agree that some proposed idea can be the last words on a given subject.

This is a selection effect. Those problems that once were considered "philosophy", and that have been solved, have largely ceased to be considered fitting subjects for philosophizing. They are now regarded as the subject matter of sciences — or, in the cases where the solution was to explain away the problem, superstitions.

Here's David Chalmers addressing that claim:

the 2009 PhilPapers Survey surveyed around 1000 professional philosophers on answers to thirty important questions in philosophy, and typically found that answers to major questions were distributed something like 50-50 or 60-40 or 70-30, once agnostics and other intermediate options were removed. This suggests that at least where these questions are concerned, large collective convergence has not been achieved.

Now, you might say that these are the big questions of the moment and therefore are precisely those that are unanswered, so the result is no surprise. There is correspondingly little agreement on the current big questions of physics: the status of string theory, for example. To avoid this worry, it is important that the big questions be individuated not by current debate but by past importance.

To properly address this issue, we would need analogs of the PhilPapers survey in (for example) 1611, 1711, 1811, 1911, and 2011, asking members of the community of philosophers at each point first, what they take to be the big questions of philosophy, and second, what they take to be the answers to those questions (and also the answers to any big questions from past surveys). We would also need to have analogous longitudinal surveys in other fields: the MathPapers Survey, the PhysPapers survey, the ChemPapers Survey, the BioPapers Survey, and so on. And we would need a reasonable measure of convergence to agreement over time. I predict that if we had such surveys and measures, we would find much less convergence on answers to the big questions suggested by past surveys of philosophers than we would find for corresponding answers in other fields.

Did people in 1711 classify their work into "Math, Phys, Chem, Bio, and Phil"? What if ideas that we call Philosophy now are a subset of what someone in 1711 would be working on?

They wouldn't classify their work that way, and in fact I thought that was the whole point of surveying these other fields. Like, for example, a question for philosophers in the 1600s is now a question for biologists, and that's why we have to survey biologists to find out if it was resolved.

The way I see this, among the problems once considered philosophical, there are some subsets that turned out to be much easier than others, and which are no longer considered part of philosophy. These are generally problems where a proposed solution can be straightforwardly verified, for example by checking a mathematical proof, or through experimental testing.

Given that the philosophical problems involved in designing FAI do not seem to fall into these subsets, it doesn't obviously make sense to include "problems once considered philosophical" in the reference class for the purposes I described in the OP, but maybe I should give this some more thought. To be clear, are you actually making this suggestion?

It seems to me that we can't — in the general case — tell in advance which problems will turn out to be easier and which harder. If it had turned out that the brain wasn't the engine of reasoning, but merely a conduit for the soul, then cognitive science would be even harder than it actually is.

For the examples I can think of (mostly philosophy of mind), it seems to me that the sciences would have emerged whether or not any progress was made while it was still considered the domain of philosophy. Are there better examples, where the "philosophical" progress was actually important for the later "scientific" progress?

It's my impression that many scholars whom we now might regard as astronomers, mathematicians, or physicists — such as Galileo, Descartes, or Newton — thought of their own work as being in the tradition of philosophy, and were thought of as philosophers by their contemporaries.

For instance: Galileo expounded his astronomy (or Copernicus's) in a work with the style of Socratic dialogues. Descartes' Geometry was an appendix to his philosophical Discourse on Method. The social role of "scientist" didn't exist until much later.

Do you agree or disagree or remain neutral on my arguments for why we should expect that 'hard' philosophical problems aren't really hard in the way that protein structure prediction is hard? In other words, we should expect the real actual answers to be things that fit on a page or less, once all confusion is dispelled? It seems to me that this is a key branch point in where we might disagree here.

What does the length of the answer have to do with how hard a problem is? The answer to P=NP can fit in 1 bit, but that's still a hard problem, I assume you agree?

Perhaps by "answer" you also mean to include all the the justifications necessarily to show that the answer is correct. If so, I don't think we can fit the justification to an actual answers to a hard philosophical problem on one page or less. Actually I don't think we know how to justify a philosophical answer (in the way that we might justify P!=NP by giving a mathematical proof), so the best we can do is very slowly gain confidence in an idea, by continuously trying (and failing) to poke holes in it or trying (and failing) to find better solutions.

In a PM you imply that you've found the true answers to 'free will', 'does a tree fall in the forest', 'the nature of truth'. I'll grant you 'does a tree fall in the forest' (since your solution appears to be the standard answer in philosophy, although note how it says the problem is "untypically simple"). However I have strong reservations about 'free will' and 'the nature of truth' from both the inside-view perspective and (more relevant to the current post) the outside-view perspective. Given the history of philosophy and the outside view, I don't see how you can be as confident about your ideas as you appear to be. Do you think the outside view is inapplicable here, or that I'm using it wrong?

Well, given what you seem to believe, you must either be more impressed with the alleged unsolvability of the problems than I am (implying that you think I would need more of a hero license than I think I would need to possess), or we agree about the problems being ultimately simple but you think it's unreasonable to try to solve some ultimately simple problems with the fate of the world at stake. So it sounds like it's mostly the former fork; but possibly with a side order of you thinking that it's invalid for me to shrug and go 'Meh' at the fact that some other people taking completely different approaches failed to solve some ultimately simple problems, because the fact that they're all arguing with each other means I can't get into an epistemic state where I know I'm right, or something like that, whereas I don't particularly see them as being in my reference class one way or another - their ways of thinking, the way they talk, the way they approach the problem, etc., all seem completely unlike anything I do or would ever consider trying.

Let's say when I'd discovered Gary Drescher, he'd previously solved 'free will' the same way I had, but had spent decades using the same type of approaches I would intend to use on trying to produce a good nonperson predicate. Then although it would be only N=1, and I do kinda intend to surpass Drescher, I would still be nervous on account of this relevant evidence. The philosophers who can't agree on free will seem like entirely different sorts of creatures to me.

but you think it's unreasonable to try to solve some ultimately simple problems with the fate of the world at stake

To be clear, I'm not afraid that you'll fail to solve one or more philosophical problems and waste your donors' money. If that was the only worry I'd certainly want you to try. (ETA: Well, aside from the problem of shortening AI timelines.) What I'm afraid of is that you'll solve them incorrectly while thinking that you've solved them correctly.

I recall you used to often say that you've "solved metaethics". But when I looked at your solution I was totally dissatisfied, and wrote several posts explaining why. I also thought you were overconfident about utilitarianism and personal identity, and wrote posts pointing out holes in your arguments about those. "Free will" and "nature of truth" happen to be topics that I've given less time to, but I could write down my inside view of why I'm not confident they are solved problems, if you think that would help with our larger disagreements.

The philosophers who can't agree on free will seem like entirely different sorts of creatures to me.

Is there anyone besides Gary Drescher who you'd consider to be in your reference class? What about the people who came up with the same exact solution to "tree falls in forest" as you? (Did you follow the links I provided?) Or the people who originally came up with utilitarianism, Bayesian decision theory, and Solomonoff induction (all of whom failed to notice the problems later discovered in those ideas)? Do you consider me to be in your reference class, given that I independently came up with some of the same decision theory ideas as you?

Or if it's just Drescher, would it change your mind on how confident you ought to be in your ideas if he was to express disagreement with several of them?

"Free will" and "nature of truth" happen to be topics that I've given less time to, but I could write down my inside view of why I'm not confident they are solved problems

I'd be really interested in reading that.

For "truth" see this comment. The problem with understanding "free will" is that it has a dependency on "nature of decisions" which I'm not entirely sure I understand. The TDT/UDT notion of "decisions as logical facts" seems to be a step in the right direction, but there are still unresolved paradoxes with that approach that make me wonder if there isn't a fundamentally different approach that makes more sense. (Plus, Gary Drescher, when we last discussed this, wasn't convinced to a high degree of confidence that "decisions as logical facts" is the right approach, and was still looking for alternatives, but I suppose that's more of an outside-view reason for me to not be very confident.)

"Free will" and "nature of truth" happen to be topics that I've given less time to, but I could write down my inside view of why I'm not confident they are solved problems

This depends on the threshold of "solved", which doesn't seem particularly relevant to this conversation. What philosophy would consider "solved" is less of an issue than its propensity to miss/ignore available insight (as compared to, say, mathematics). "Free will" and "nature of truth", for example, still have important outstanding confusions, but they also have major resolved issues, and those remaining confusions are subtle, hard to frame/notice when one is busy arguing on the other sides of the resolved issues.

Free will and Nature of Truth are subjects I have devoted plenty of time to, and it terms to .me that conclusion -and overconfidence - abound on the Less Wrong side of the fence,

This depends on the threshold of "solved", which doesn't seem particularly relevant to this conversation.

As the ultimate question seems to be: "Is this FAI design safe?" I think solved should have a high bar.

Solomonoff induction

(Of algorithmic information theorists, I'm familiar with only Chaitin's writings on the philosophy thereof; I think that though he wouldn't have found some of the problems later found by others, he also wouldn't have placed the confidence in it that would lead to premature AGI development failure modes. (I am, as usual, much too lazy to give references.))

My main objection is that securing positive outcomes doesn't seem to inherently require solving hard philosophical problems (in your sense). It might in principle, but I don't see how we can come to be confident about it or even why it should be much more likely than not. I also remain unconvinced about the conceptual difficulty and fundamental nature of the problems, and don't understand the cause for confidence on those counts either.

To make things more concrete: could you provide a hard philosophical problem (of the kind for which feedback is impossible) together with an argument that this problem must be resolved before human-level AGI arrives? What do you think is the strongest example?

To try to make my point clearer (though I think I'm repeating myself): we can aim to build machine intelligences which pursue the outcomes we would have pursued if we had thought longer (including machine intelligences that allow human owners to remain in control of the situation and make further choices going forward, or bootstrap to more robust solutions). There are questions about what formalization of "thought longer" we endorse, but of course we must face these with or without machine intelligence. For the most part, the questions involved in building such an AI are empirical though hard-to-test ones---would we agree that the AI basically followed our wishes, if we in fact thought longer?---and these don't seem to be the kinds of questions that have proved challenging, and probably don't even count as "philosophical" problems in the sense you are using the term.

I don't think it's clear or even likely that we necessarily have to resolve issues like metaethics, anthropics, the right formalization of logical uncertainty, decision theory, etc. prior to building human-level AI. No doubt having a better grasp of these issues is helpful for understanding our goals, and so it seems worth doing, but we can already see plausible ways to get around them.

In general, one reason that doing X probably doesn't require impossible step Y is that there are typically many ways to accomplish X, and without a strong reason it is unlikely that they will all require solving Y. This view seems to be supported by a reasonable empirical record. A lot of things have turned out to be possible.

(Note: in case it's not obvious, I disagree with Eliezer on many of these points.)

I suspect I also object to your degree of pessimism regarding philosophical claims, but I'm not sure and that is probably secondary at any rate.

It's hard for me to argue with multiple people simultaneously. When I argue with someone I tend to adopt most of their assumptions in order to focus on what I think is the core disagreement, so to argue with someone else I have to "swap in" a different set of assumptions and related arguments. The OP was aimed mostly at Eliezer, so it assumed that intelligence explosion is relatively easy. (Would you agree that if intelligence explosion was easy, then it would be hard to achieve a good outcome in the way that you imagine, by incrementally solving "the AI control problem"?)

If we instead assume that intelligence explosion isn't so easy, then I think the main problem we face is value drift and Malthusian outcomes caused by competitive evolution (made worse by brain emulations and AGIs that can be easily copied), which can only be prevented by building a singleton. (A secondary consideration involves other existential risks related to technological progress, such as physics/nanotech/biotech disasters.) I don't think humanity as a whole is sufficiently strategic to solve this problem before it's too late (meaning a lot of value drift has already occurred or building a singleton becomes impossible due to space colonization). I think the fact that you are much more optimistic about this accounts for much of our disagreement on overall strategy, and I wonder if you can explain why. I don't mean to put the burden of proof on you, but perhaps you have some ready explanation at hand?

I don't think that fast intelligence explosion ---> you have to solve the kind of hard philosophical problems that you are alluding to. You seem to grant that there are no particular hard philosophical problems we'll have to solve, but you think that nevertheless every approach to the problem will require solving such problems. Is it easy to state why you expect this? Is it because approaches we can imagine in detail today involve solving hard problems?

Regarding the hardness of defining "remain in control," it is not the case that you need to be able to define X formally in order to accomplish X. Again, perhaps such approaches require solving hard philosophical problems, but I don't see why you would be confident (either about this particular approach or more broadly). Regarding my claim that we need to figure this out anyway, I mean that we need to implicitly accept some process of reflection and self-modification as we go on reflecting and self-modifying.

I don't see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here? See e.g. Carl's post on this and mine. I agree there is a problem to be solved, but it seems to involve faithfully transmitting hard-to-codify values (again, perhaps implicitly).

I'll just respond to part of your comment since I'm busy today. I'll respond to the rest later or when we meet.

I don't see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here?

Not sure if this argument is original to me. I may have read it from Nick Bostrom or someone else. When I said "value drift" I meant value drift of humanity in aggregate, not necessarily value drift of any individual. Different people have values that have different levels of difficulty of transmitting. Or some just think that their values are easy to transmit, for example those who think they should turn the universe into hedonium, or should maximize "complexity". Competitive evolution will favor (in the sense of maximizing descendents/creations of) such people since they can take advantage of new AGI or other progress more quickly than those who think their values are harder to transmit.

I think there's an additional argument that says people who have shorter planning horizons will take advantage of new AGI progress more quickly because they don't particularly mind not transmitting their values into the far future, but just care about short term benefits like gaining academic fame.

Yes, if it is impossible to remain in control of AIs then you will have value drift, and yes a singleton can help with this in the same way they can help with any technological risk, namely by blocking adoption of the offending technology. So I concede they aren't completely orthogonal, in the sense that any risk of progress can be better addressed by a singleton + slow progress. (This argument is structurally identical to the argument for danger from biology progress, physics progress, or even early developments in conventional explosives.) But this is a very far cry from "can only be prevented by building a singleton."

To restate how the situation seems to me: you say "the problems are so hard that any attempt to solve them is obviously doomed," and I am asking for some indication that this is the case besides intuition and a small number of not-very-representation-examples, which seems unlikely to yield a very confident solution. Eliezer makes a similar claim, with you two disagreeing about how likely Eliezer is to solve the problems but not about how likely the problems are to get solved by people who aren't Eliezer. I don't understand either of your arguments too well; it seems like both of you are correct to disagree with the mainstream by identifying a problem and noticing that it may be an unusually challenging one, but I don't see why either of you is so confident.

To isolate a concrete disagreement, if there was an intervention that sped up the onset of serious AI safety work twice as much as it sped up the arrival of AI, I would tentatively consider that a positive (and if it sped up the onset of serious AI work ten times as much as it sped up the arrival of AI it would seem like a clear win; I previously argued that 1.1x as much would also be a big win, but Carl convinced me to increase the cutoff with a very short discussion). You seem to be saying that you would consider it a loss at any ratio, because speeding up the arrival of AI is so much worse than speeding up the onset of serious thought about AI safety, because it is so confidently doomed.

Yes, if it is impossible to remain in control of AIs then you will have value drift

Wait, that's not my argument. I was saying that while people like you are trying to develop technologies that let you "remain in control", others with shorter planning horizons or think they have simple, easy to transmit values will already be deploying new AGI capabilities, so you'll fall behind with every new development. This is what I'm suggesting only a singleton can prevent.

You could try to minimize this kind of value drift by speeding up "AI control" progress but it's really hard for me to see how you can speed it up enough to not lose a competitive race with those who do not see a need to solve this problem, or think they can solve a much easier problem. The way I model AGI development in a slow-FOOM scenario is that AGI capability will come in spurts along with changing architectures, and it's hard to do AI safety work "ahead of time" because of dependencies on AI architecture. So each time there is a big AGI capability development, you'll be forced to spend time to develop new AI safety tech for that capability/architecture, while others will not wait to deploy it. Even a small delay can lead to a large loss since AIs can be easily copied and more capable but uncontrolled AIs would quickly take over economic niches occupied by existing humans and controlled AIs. Even assuming secure rights for what you already own on Earth, your share of the future universe will become smaller and smaller as most of the world's new wealth goes to uncontrolled AIs or AIs with simple values.

Where do you see me going wrong here? If you think I'm just too confident in this model, what alternative scenario can you suggest, where people like you and I (or our values) get to keep a large share of the future universe just by speeding up the onset of serious AI safety work?

I'll respond to the rest later or when we meet.

Did you talk about this at the recent workshop? If you're willing to share publicly, I'd be curious about the outcome of this discussion.

A singleton (even if it is a world government) is argued to be a good thing for humanity by Bostrom here and here

could you provide a hard philosophical problem (of the kind for which feedback is impossible) together with an argument that this problem must be resolved before human-level AGI arrives?

I can't provide a single example because it depends on the FAI design. I think multiple design approaches are possible but each involves its own hard philosophical problems.

To try to make my point clearer (though I think I'm repeating myself): we can aim to build machine intelligences which pursue the outcomes we would have pursued if we had thought longer (including machine intelligences that allow human owners to remain in control of the situation and make further choices going forward, or bootstrap to more robust solutions). There are questions about what formalization of "thought longer" we endorse, but of course we must face these with or without machine intelligence.

At least one hard problem here is, at you point out, how to formalize "thought longer", or perhaps "remain in control". Obviously an AGI will inevitably influence the options we have and the choices we end up making, so what does "remain in control" mean? I don't understand your last point here, that "we must face these with or without machine intelligence". If people weren't trying to build AGI and thereby forcing us to solve these kinds of problems before they succeed, we'd have much more time to work on them and hence a much better chance of getting the answers right.

For the most part, the questions involved in building such an AI are empirical though hard-to-test ones---would we agree that the AI basically followed our wishes, if we in fact thought longer?---and these don't seem to be the kinds of questions that have proved challenging, and probably don't even count as "philosophical" problems in the sense you are using the term.

If we look at other empirical though hard-to-test questions (e.g., what security holes exist in this program) I don't see much reason to be optimistic either. What examples are you thinking of, that makes you say "these don't seem to be the kinds of questions that have proved challenging"?

I suspect I also object to your degree of pessimism regarding philosophical claims, but I'm not sure and that is probably secondary at any rate.

I'm suspecting that even the disagreement we're current discussing isn't the most important one between us, and I'm still trying to figure out how to express what I think may be the most important disagreement. Since we'll be meeting soon for the decision theory workshop, maybe we'll get a chance to talk about it in person.

Since we'll be meeting soon for the decision theory workshop, maybe we'll get a chance to talk about it in person.

If you get anywhere, please share your conclusions here.

(I am not satisfied with the state of discourse on this question (to be specific, I don't think MIRI proponents have adequately addressed concerns like those expressed by Wei Dai here and elsewhere), so I don't want to be seen as endorsing what might naively seem to be the immediate policy-relevant implications of this argument, but:) Bla bla philosophical problems once solved are no longer considered philosophical bla bla [this part of the argument is repeated ad nauseam and is universally overstated], and then this. Steve's comment also links to quite similar arguments made by Luke Muehlhauser. Also it links to one of Wei Dai's previous posts closely related to this question.

Wei Dai, I noticed on the MIRI website that you're slotted to appear at some future MIRI workshop. I find this a little bit strange—given your reservations, aren't you worried about throwing fuel on the fire?

Thanks for the link. I don't think I've seen that comment before. Steve raises the examples of Bayesian decision theory and Solomonoff induction to support his position, but to me both of these are examples of philosophical ideas that looked really good at some point but then turned out to be incomplete / not quite right. If the FAI team comes up with new ideas that are in the same reference class as Bayesian decision theory and Solomonoff induction, then I don't know how they can gain enough confidence that those ideas can be the last words in their respective subjects.

Wei Dai, I noticed on the MIRI website that you're slotted to appear at some future MIRI workshop. I find this a little bit strange—given your reservations, aren't you worried about throwing fuel on the fire?

Well I'm human which means I have multiple conflicting motivations. I'm going because I'm really curious what direction the participants will take decision theory.

Wei Dai, I noticed on the MIRI website that you're slotted to appear at some future MIRI workshop. I find this a little bit strange—given your reservations, aren't you worried about throwing fuel on the fire?

I don't see why that would be strange. Maybe Wei Dai thinks he can improve the way MIRI handles their FAI endgame from the inside, or that helping MIRI make progress on decision theory will not make MIRI more likely to screw up and develop UFAI, or both.

Showing up at a workshop is probably a good way to check the general sanity level/technical proficiency of MIRI. I would anticipate having a largish update (although I don't know in which direction, obviously).

That's not my main motivation for attending. I think I already have a rough idea of their sanity level/technical proficiency from online discussions and their technical writings, so I'm not anticipating having a particularly large update. (I've also attended a SIAI decision theory workshop a few years ago and met some SIAI people at that time.)

(to be specific, I don't think MIRI proponents have adequately addressed concerns like those expressed by Wei Dai here and elsewhere)

I do wonder why MIRI people often do not respond to my criticisms about their strategy. For example the only MIRI-affiliated person who responded to this post so far is Paul Christiano (but given his disagreements with Eliezer, he isn't actually part of my intended audience for this post). The upcoming workshop might be a good opportunity to see if I can get MIRI people to take my concerns more seriously, if I talk to them face to face. If you or anyone else has any ideas on what else I should try, please let me know.

I do wonder why MIRI people often do not respond to my criticisms about their strategy.

Speaking for myself...

Explaining strategic choices, and replying to criticisms, takes enormous amounts of time. For example, Nick Bostrom set out to explain what MIRI/FHI insiders might consider to be "10% of the basics about AI risk" in a clear and organized way, and by the time he's done with the Superintelligence book it will have taken him something like 2.5 years of work just to do that, with hundreds of hours of help from other people — and he was already an incredibly smart, productive academic writer who had a strong comparative advantage writing exactly that book. It would've taken me, or Carl, or anybody else besides Nick a lot more time and effort to write that book at a similar level of quality.

Which of your many discussion threads on AI risk strategy do you most wish would be engaged further by somebody on staff at MIRI?

This seems like a generic excuse you've developed, and it's not a bad one to use when waving off random comments from people who have little idea what they're talking about. But my particular arguments already share most of the same assumptions as MIRI, with each post focusing only on one or two key points of disagreement. If it's not worth your time to reply to my criticisms, then I don't see whose criticisms you could possibly find it worthwhile to respond to.

Which of your many discussion threads on AI risk strategy do you most wish would be engaged further by somebody on staff at MIRI?

I was going to say "this post" but now that Eliezer has responded I'm satisfied with MIRI's level of engagement (assuming he doesn't abruptly disappear at some point as he occasionally does during our previous discussions).

Looking at my other top level posts, I'd be interested to know what MIRI thinks about this and this.

I'd be interested to know what MIRI thinks about this and this.

My initial replies are here and here.

Which other writings of yours would you most like at least an initial reply to? Or, if there were discussions that were dropped by the MIRI party too soon (from your perspective), I could try to continue them, at least from my own perspective.

Thanks, I don't currently have a list of "writings that I wish MIRI would respond to", but I'll certainly keep your offer in mind in the future.

Explaining strategic choices, and replying to criticisms, takes enormous amounts of time.

It's also enormously important.

Personally, I didn't respond to this post because my reaction to it was mostly "yes, this is a problem, but I don't see a way by which talking about it will help at this point; we'll just have to wait and see". In other words, I feel that MIRI will just have to experiment with a lot of different strategies and see which ones look like they'll have promise, and then that experimentation will maybe reveal a way by which issues like this one can be solved, or maybe MIRI will end up pursuing an entirely different strategy. But I expect that we'll actually have to try out the different strategies before we can know.

I'm not sure what kind of strategies you are referring to. Can you give some examples of strategies that you think MIRI should experiment with?

For instance, MIRI's 2013 strategy mostly involves making math progress and trying to get mathematicians in academia interested in these kinds of problems, which is a different approach from the "small FAI team" one that you focus on in your post. As another kind of approach, the considerations outlined in AGI Impact Experts and Friendly AI Experts would suggest a program of generally training people with an expertise in AI safety questions, in order to have safety experts involved in many different AI projects. There have also been various proposals about eventually pushing for regulation of AI, though MIRI's comparative advantage is probably more on the side of technical research.

I thought "making math progress and trying to get mathematicians in academia interested in these kinds of problems" was intended to be preparation for eventually doing the "small FAI team" approach, by 1) enlarging the talent pool that MIRI can eventually hire from, and 2) offloading the subset of problems that Eliezer thinks are safe onto the academic community. If "small FAI team" is not a good idea, then I don't see what purpose "making math progress and trying to get mathematicians in academia interested in these kinds of problems" serves, or how experimenting with it is useful. The experiment could be very "successful" in making lots of math progress and getting a lot of mathematicians interested, but that doesn't help with the endgame problem that I point out in the OP.

Generally training people with an expertise in AI safety questions and pushing for regulation of AI both sound good to me, and I'd be happy to see MIRI try them. You could consider my post as an argument for redirecting resources away from preparing for "small FAI team" and into such experiments.

I thought "making math progress and trying to get mathematicians in academia interested in these kinds of problems" was intended to be preparation

Yes, I believe that is indeed the intention, but it's worth noting that the things that MIRI's currently doing really allow them to pursue either strategy in the future. So if they give up on the "small FAI team" strategy because it turns out to be too hard, they may still pursue the "big academic research" strategy, based on the information collected at this and other steps.

If "small FAI team" is not a good idea, then I don't see what purpose "making math progress and trying to get mathematicians in academia interested in these kinds of problems" serves, or how experimenting with it is useful.

"Small FAI team" might turn out to be a bad idea because the problem is too difficult for a small team to solve alone. In that case, it may be useful to actually offload most of the problems to a broader academic community. Of course, this may or may not be safe, but there may come a time when it turns out that it is the least risky alternative.

I think "big academic research" is almost certainly not safe, for reasons similar to my argument to Paul here. There are people who do not care about AI safety due to short planning horizons or because they think they have simple, easy to transmit values, and will deploy the results of such research before the AI safety work is complete.

Of course, this may or may not be safe, but there may come a time when it turns out that it is the least risky alternative.

This would be a fine argument if there weren't immediate downsides to what MIRI is currently doing, namely shortening AI timelines and making it harder to create a singleton (or get significant human intelligence enhancement, which could help somewhat in the absence of a singleton) before AGI work starts ramping up.

immediate downsides to what MIRI is currently doing, namely shortening AI timelines

To be clear, based on what I've seen you write elsewhere, you think they are shortening AI timelines because the mathematical work on reflection and decision theory would be useful for AIs in general, and are not specific to the problem of friendliness. Is that right?

This isn't obvious to me. In particular, the reflection work seems much more relevant to creating stable goal structures than to engineering intelligence / optimization power.

[-][anonymous]11y00

I do wonder why MIRI people often do not respond to my criticisms about their strategy.

I'm not sure why you're wondering, when in the history of MIRI and its predecessor they've only ever responded to about two criticisms about their strategy.

On problems that are drawn from a barrel of causally similar problems, where human optimism runs rampant and unforeseen troubles are common, the Outside View beats the Inside View.

Does anyone want to argue that Eliezer's criteria for using the outside view are wrong, or don't apply here?

"Optimism" is one kind of distortion - and "paranoia" is another kind.

Highlighting "optimism" distortions while ignoring "paranoid" ones is a typical result of paranoid distortions.

One suggestion I'd make is to minimise (you can't eliminate it) your reliance on philosophy. One example of problematic usage of philosophical concepts is definitely "consciousness".

http://citizensearth.wordpress.com/2014/08/23/is-placing-consciousness-at-the-heart-of-futurist-ethics-a-terrible-mistake-are-there-alternatives/

I'd much prefer we replace it with far more definable goals like "preservation of homo sapiens / genetic species" and the like.

In terms of inside-outside, I think you may have a point, but it's important to consider the options available for us. Permanently preventing an intelligence explosion, if an explosion is possible, might be extremely difficult. So the level of safety would have to be considered relative to other developments.

I'd much prefer we replace it with far more definable goals like "preservation of homo sapiens / genetic species" and the like.

Then how about going and defining it? How much genes do you have to exchange via gene therapy for someone to stop being part of the genetic species homo sapiens?

I didn't say I'D define it lol. Merely that it seems quite reasonable to say its more definable. I'm not sure I'm capable of formalising in code at anywhere near my current level of knowledge. However, it does occur to me that we categorise species in biology quite often - my current inclination is to go with a similar definition. Genetic classification of species is a relatively well explored scientific topic, and while I'm sure there are methodological disagreements, it's NOTHING compared to philosophy. So it's a very feasible improvement.

EDIT> Philosophically speaking, I think I might take a crack at defining it at some point, but not yet.

Transhumanism is a thing. With increasing technology we do have the ability of exchange a bunch of genes. Our ancestors in 10,000 years might share less DNA with us than Neanderthalers.

In general transhumanism thought genetic change isn't something worth fighting. If we exchange a bunch of genes and raise our IQ's to 200 while restoring native Vitamin C productions, that's a good thing. We might move away from homo sapiens but there no reason that an FAI has to ensure that we stay at a certain suboptimal DNA composition.

Our ancestors in 10,000 years might share less DNA with us than Neanderthalers.

Well for that to occur we'll almost certainly need a FAI long before then. So I'd suggest optimise for that first, and think about the fun stuff once survival is ensured.

certain suboptimal DNA composition.

Suboptimal to what goal? There no such thing as an "optimal" DNA composition that I'm aware of. Genetic survival is totaly contextual.

Well for that to occur we'll almost certainly need a FAI long before then.

No.

I have seen multiple people face to face who have implants to be able to perceive magnetic fields today.

We have the technology to make grown monkeys perceive an additional color via gene therapy.

Cloning isn't completely trivial today but it's possible to clone mammals today. I would be pretty confident that we solve the technological issues in the next decades that make cloning harder than growing a normal human being.

At that point reading a Vitamin C gene is trivial. For a lot of enzyms we can search for the best version in neighboring species and replace the human version with that.

In the West we might not legally allow such products but I can very well imagine a few scientists in a freer legal environment to go along with such a plan.

So I'd suggest optimise for that first, and think about the fun stuff once survival is ensured.

If you write into the FAI that it prevents the fun stuff because the fun stuff is bad, we might not have that option.

OK fair comment, although I note that the genetic approach doesn't (and imo shouldn't) only consider the welfare of humans, but also other species. Human genetics would probably have to be the starting point for prioritising them though, otherwise we might end with a FAI governing a planet of plankton or something.

While I'm quite interested in the potential of things like wearable tech and cyborgism, I feel we ought to be fairly cautious with the gene side of things, because the unintentional potential for fashion eugenics, branching off competing species etc. I feel existential risk questions have to come first even if that's not always the fun option. I see what you're saying though, and I hope we find a way to have our cake and eat it too if possible.

Human genetics would probably have to be the starting point for prioritising them though, otherwise we might end with a FAI governing a planet of plankton or something.

Plankton doesn't have what we consider consciousness. That's why that goal is in the mission statement.

I feel we ought to be fairly cautious with the gene side of things, because the unintentional potential for fashion eugenics, branching off competing species

Given that you aren't the only person who thinks that way Western countries might indeed be cautious but that doesn't mean that the same goes for East Asia or entrepreneurs in Africa.

Do you expect a "best effort" FAI to be worse than UFAI or human extinction? Also, could there be a solution to the problem of philosophical principles being "fixed forever as governing principles for a new reality"?

In the history of philosophy, there have been many steps in the right direction, but virtually no significant problems have been fully solved, such that philosophers can agree that some proposed idea can be the last words on a given subject.

That's a strange definition of "solved".