How can we ensure that a Friendly AI team will be sane enough?

Wei Dai

How can we ensure that a Friendly AI team will be sane enough? — LessWrong

16 How can we ensure that a Friendly AI team will be sane enough?

16th May 2012

1 min read

16

One possible answer to the argument "attempting to build FAI based on Eliezer's ideas seems infeasible and increases the risk of UFAI without helping much to increase the probability of a good outcome, and therefore we should try to achieve a positive Singularity by other means" is that it's too early to decide this. Even if our best current estimate is that trying to build such an FAI increases risk, there is still a reasonable chance that this estimate will turn out to be wrong after further investigation. Therefore, the counter-argument goes, we ought to mount a serious investigation into the feasibility and safety of Eliezer's design (as well as other possible FAI approaches), before deciding to either move forward or give up.

(I've been given to understand that this is a standard belief within SI, except possibly for Eliezer, which makes me wonder why nobody gave this counter-argument in response to my post linked above. ETA: Carl Shulman did subsequently give me a version of this argument here.)

This answer makes sense to me, except for the concern that even seriously investigating the feasibility of FAI is risky, if the team doing so isn't fully rational. For example they may be overconfident about their abilities and thereby overestimate the feasibility and safety, or commit sunken cost fallacy once they have developed lots of FAI-relevant theory in the attempt to study feasibility, or become too attached to their status and identity as FAI researchers, or some team members may disagree with a consensus of "give up" and leave to form their own AGI teams and take the dangerous knowledge developed with them.

So the question comes down to, how rational is such an FAI feasibility team likely to be, and is that enough for the benefits to exceed the costs? I don't have a lot of good ideas about how to answer this, but the question seems really important to bring up. I'm hoping this post this will trigger SI people to tell us their thoughts, and maybe other LWers have ideas they can share.

Personal Blog

16

New Comment

65 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:53 PM

[-]steven046114y210

This post highlights for me that we don't have a good understanding of what things like "more rational" and "more sane" mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do. I think more understanding here would be highly valuable, and I mostly don't think we can get it from studies of the general population. (We can locally define "more sane" as referring to whatever properties are needed to get the right answer on this specific question, of course, but then it might not correspond to definitions of "more sane" that we're using in other contexts.)

Not that this answers your question, but there's a potential tension between the goal of picking people with a deep understanding of FAI issues, and the goal of picking people who are unlikely to do things like become attached to the idea of being an FAI researcher.

Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?

[-]Wei Dai14y20

This post highlights for me that we don't have a good understanding of what things like "more rational" and "more sane" mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do.

I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of "understanding" you're talking about, or something else?

Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?

I think probably there would just be an FAI research team that is told to continually reevaluate feasibility/safety as it goes. I just called it "FAI feasibility team" to emphasize that at the start its most important aim would be to evaluate feasibility and safety. Having an actual separate feasibility team might buy some additional overall sanity (but how, besides that attachment to being FAI researchers won't be an issue since they won't continue to be FAI researchers either way?). It seems like there would probably be better ways to spend the extra resources if we had them though.

[-]steven046114y10

I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of "understanding" you're talking about, or something else?

I think that falls under my parenthetical comment in the first paragraph. Understanding what rationality-type skills would make this specific thing go well is obviously useful, but it would also be great if we had a general understanding of what rationality-type skills naturally vary together, so that we can use phrases like "more rational" and have a better idea of what they refer to across different contexts.

It seems like there would probably be better ways to spend the extra resources if we had them though.

Maybe? Note that if people like Holden have concerns about whether FAI is too dangerous, that might make them more likely to provide resources toward a separate FAI feasibility team than toward, say, a better FAI team, so it's not necessarily a fixed heap of resources that we're distributing.

[-]Dr_Manhattan14y20

Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?

You're implying possibility of official separation between the two teams, which seems like a good idea. Between "finding/filtering more rational people" and "social mechanisms" I would vote for mechanisms.

[-]John_Maxwell14y20

If humans are irrational because of lots of bugs in our brain, it could be hard to measure how many bugs have been fixed or worked around, and how reliable these fixes or workarounds are.

[-]steven046114y110

There's how strong one's rationality is at its peak, and how strong one's rationality is on average, and the rarity and badness of lapses in one's rationality, and how many contexts one's rationality works in, and to what degree one's mistakes and insights are independent of others' mistakes and insights (independence of mistakes means you can correct each other, independence of insights means you don't duplicate insights). All these measures could vary orthogonally.

[-]ghf14y20

Good point. And, depending on your assessment of the risks involved, especially for AGI research, the level of the lapses might be more important than the peak or even the average. A researcher who is perfectly rational (hand waving for the moment about how we measure that) 99% of the time but has, say, fits of rage every so often might be even more dangerous than the slightly less rational on average colleague who is nonetheless stable.

[-]khafra14y30

Or, proper mechanism design for the research team might be able to smooth out those troughs and let you use the highest-EV researchers without danger.

[-]John_Maxwell14y00

You seem to be using rationality to refer to both bug fixes and general intelligence. I'm more concerned about bug fixes myself, for the situation Wei Dai describes. Status-related bugs seem potentially the worst.

[-]steven046114y20

I meant to refer to just bug fixes, I think. My comment wasn't really responsive to yours, just prompted by it, and I should probably have added a note to that effect. One can imagine a set of bugs that become more fixed or less fixed over time, varying together in a continuous manner, depending on e.g. what emotional state one is in. One might be more vulnerable to many bugs when sleepy, for example. One can then talk about averages and extreme values of such a "general rationality" factor in a typical decision context, and talk about whether there are important non-standard contexts where new bugs become important that one hasn't prepared for. I agree that bugs related to status (and to interpersonal conflict) seem particularly dangerous.

[-]John_Maxwell14y100

Why not have team members whose entire job is to prevent this sort of thing from happening? At least one team writing software for NASA has software testers who have a semi-antagonistic relationship with the software engineers.

http://www.fastcompany.com/magazine/06/writestuff.html

What would a good checks and balances style structure for an FAI team look like?

[-]Wei Dai14y20

In the open, non-classified crypto world, we pick standard crypto algorithms by getting competing designs from dozens of teams, who then attack each other's designs, with the rest of the research community joining in. This seems like a good model for FAI as well, if only the FAI-building organization had enough human and financial resources, which unfortunately probably won't be the case.

[-]gRR14y20

if only the FAI-building organization had enough human and financial resources, which unfortunately probably won't be the case

Why do you think so? Do you expect an actual FAI-building organization to start working in the next few years? Because, assuming the cautionary position is actually the correct one, then FAI organization will surely get lots of people and resources in time?

[-]DanArmak14y40

Many people are interested in building FAI, or just AGI (which at least isn't harder to build than specifically Friendly AGI). Assuming available funds increase slowly over time, a team trying to build a FAI with few safeguards will be able to be funded before a team that requires many safeguards, and will also work faster (fewer constraints on result), and so will likely finish first.

[-]gRR14y00

But if AGI is not closer than several decades ahead, then, assuming the cautionary position is the correct one, it will become wide-spread and universally accepted. Any official well-funded teams will work with many safeguards and lots of constraints. Only stupid cranks will work without these, and they'll work without funding too.

[-]DanArmak14y20

You're not addressing my point about a scenario where available funds increase slowly.

Concretely (with arbitrary dates): suppose that in 2050, FAI theory is fully proven. By 2070, it is universally accepted, but still no-one knows how to build an AGI, or maybe no-one has sufficient processing power.

In 2090, several governments reach the point of being able to fund a non-Friendly AGI (which is much cheaper). In 2120, they will be able to fund a fully Friendly AGI.

What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even "rule the world forever and reshape it in your image" territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?

[-]gRR14y-40

What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even "rule the world forever and reshape it in your image" territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?

Remember, we're describing the situation where the cautionary position is provably correct. So your "greatest temptation ever" is (provably) a temptation to die a horrible death together with everyone else. Anyone smart enough to even start building AI would know and understand this.

[-]JoshuaZ14y00

That one has a provably Friendly AI is not the same thing as that any other AI is provably going to do terrible things.

[-]gRR14y00

My conditional was "cautionary position is the correct one". I meant, provably correct.

[-]Vladimir_Nesov14y20

It's like with dreams of true universal objective morality: even if in some sense there is one, some agents are just going to ignore it.

[-]Wei Dai14y20

My conditional was "cautionary position is the correct one". I meant, provably correct.

Leaving out the "provably" makes a big difference. If you add "provably" then I think the conditional is so unlikely that I don't know why you'd assume it.

[-]gRR14y00

Well, assuming EY's view of intelligence, the "cautionary position" is likely to be a mathematical statement. And then why not prove it? Given several decades? That's a lot of time.

[-]JoshuaZ14y00

One is talking about a much stronger statement than provability of Friendliness (since one is talking about AI), so even if it is true, proving, or even formalizing, is likely to be very hard. Note that this is under the assumption that it is true: this seems wrong. Assume that one has a Friendliness protocol, and then consider the AI that has the rule "be Friendly but give 5% more weight to the preferences of people that have an even number of letters in their name" or even subtler "be Friendly, but if you ever conclude within 1-1/(3^^^^3) that confidence that 9/11 was done by time traveling aliens, then destroy humanity". The second will likely act identically to a Friendly AI.

[-]DanArmak14y00

I thought you were merely specifying that the FAI theory was proven to be Friendly. But you're also specifying that any AGI not implementing a proven FAI theory, is formally proven to be definitely disastrous. I didn't understand that was what you were suggesting.

Even then there remains a (slightly different) problem. An AGI may Friendly to someone (presumably its builders) at the expense of someone else. We have no reason to think any outcome an AGI might implement would truly satisfy everyone (see other threads on CEV). So there will still be a rush for the first-mover advantage. The future will belong to the team that gets funding a week before everyone else. These conditions increase the probability that the team that makes it will have made a mistake, a bug, cut some corners unintentionally, etc.

[-][anonymous]14y20

From reading that article I can confirm I've worked on programming projects that had a Validated, spec everything, someone explicitly rechecks everything, document everything, sign off on everything approach and also projects which had more of that Diet Coke and Pizza feel, and I didn't even start working until almost 10 years after that article was written. That article allowed me to pull together some disparate points about checking risk management and expressed them clearly, so thank you for posting it.

For your question, I tried to start answering it, but I'm not sure I was able to finish in a timely manner, (The team structure was hard to lay out in text, I feel like a flow chart might have been better) so here is the layout of teams I have so far:

I think you would probably have at least some elements of a hierarchical approach, by trying to break FAI down into smaller categories and assign them each teams, in addition to having a team that checked the integration of the elements itself. And at each step, having a part of the team that both tried to make code for it, and a part of the team that tested the existing codes handling, and then iterating that step, repeatedly, depending on how large the remaining problem space was.

To attempt to give an example, if the first step is to break down FAI into "Values" "Problem Solving" and "Knowledge" (This is just a hypothetical, I'm not proposing this specific split) The Problem Solving team would realize they are going to have to further break down "Problem Solving." Because the amount of time it would take to test all of the FAI's problem solving abilities in all fields is far too large. You would also have to have a process to account for "We attempted to break it down into these three things, but now we realize we're going to need this fourth thing."

You would then need another overteam for "Well, this particular problem is intractable for now, so can we work on other elements in the mean time and put in some kind of temporary solution?" For instance, if in problem solving, "Black Swan Events" gets it's own category, the overteam might realize that the overall category is intractable for now, (Because even if you code it to handle Black Swan Events, how could the validation team validate the Black Swans Events code?) and then you might have to say something along the lines of "Well, if we can at least get the FAI to recognize a Black Swan, we can have it call for external help if it sees one, because that is sufficiently better than the existing alternatives even though it clearly isn't perfect." or "Damn, there does not seem to be a way to solve this right now, and there isn't a worthwhile 'Good Enough' solution, so we'll have to put the project on hold until we figure this out." The overteam would of course also have to have it's own validation team, which would almost certainly be the most important part, so you would probably want it to be validated again, just to make more sure.

Also, since I just coded that, I should now try to get someone else to validate it before I assume it is correct.

[-]ghf14y10

Strongly seconded. While getting good people is essential (the original point about rationality standards), checks and balances are a critical element of a project like this.

The level of checks needed probably depends on the scope of the project. For the feasibility analysis, perhaps you don't need anything more than splitting your research group into two teams, one assigned to prove, the other disprove the feasibility of a given design (possibly switching roles at some point in the process).

[-]philh14y100

If you had two such teams, working independently, who came to the same conclusions for the same reasons, that would be at least weak evidence that they're both being rational.

[-]khafra14y00

Perhaps it would be best to have as many teams as possible working on different pieces independently, with some form of arithmetic coding operating over their output.

[-]philh14y00

Could you clarify what you mean by "arithmetic coding operating over their output"?

The point of having teams work independently on the same project is that they're unlikely to make exactly the same mistakes. Publishers do this for proofreading: have two proofreaders return error-sets A and B, and estimate the number of uncaught errors as a function of |A\B|, |B\A| and |A∩B|. If A=B, that would be strong evidence that there are no errors left. (But depending on priors, it might not bring P(no errors left) close to 1.) If two proofreaders worked on different parts of the book, you couldn't use the same technique.

Could the arithmetic coding make checks like this unnecessary?

[-]gwern14y70

Unlikely, but not independent. "Are N average software versions better than 1 good version?", Hatton 1997:

The concept of using N parallel versions accompanied with some kind of voting system is a long-established one in high-integrity engineering. The independence of the channels produces a system which is far more reliable than one channel could be. In recent years, the concept has also been applied to systems containing software using diversity of design. In such systems, it is attractive to assume that software systems, like their hardware counterparts also fail independently.

However, in a widely-quoted experiment, [1], [2] showed that this assumption is incorrect, and that programmers tended to commit certain classes of mistake dependently. It can then be argued that the benefit of having N independently developed software channels loses at least some of its appeal as the dependence of certain classes of error means that N channels are less immune to failure than N equivalent independent channels, as occurs typically in hardware implementations.

The above result then brings into question whether it is more cost-effective to develop one exceptionally good software channel or N less good channels. This is particularly relevant to aerospace systems with the Boeing 777 adopting the former policy and the Airbus adopting at least partially, the latter.

This paper attempts to resolve this issue by studying existing systems data and concludes that the evidence so far suggests that the N-version approach is significantly superior and its relative superiority is likely to increase in future.

[-]khafra14y20

Could the arithmetic coding make checks like htis unnecessary?

No, it would just be a more efficient and error-resistant way to do the checks that way--with overlapping sections of the work--than with a straight duplication of effort. Arithmetic coding has a wikipedia article; error-correcting output coding doesn't seem to; but it's closer to the actual implementation a team of teams could use.

(edited because the link to ECOC disappeared)

[-]lukeprog13y80

Speaking for myself...

I don't think I have anything special or insightful to say about this. Basically, I hope this all becomes clearer as the situation develops. Right now MIRI is still years away from successfully recruiting an "FAI team," let alone "building FAI."

John Maxwell, philh, and yourself all presented reasonable ideas, and I expect additional opportunities to present themselves as time goes on. Shulman's "caged AGIs/WBEs working on small pieces of the problem, including critiquing each others' results" concept is another idea.

Do you think the question in the OP is significantly and immediately policy-relevant? As explained previously, I'm less confident than (e.g.) Eliezer that MIRI should eventually try to build FAI itself, but in the meantime, it looks really useful to collect a bunch of young people with top cognitive ability and turn their attention to concrete research problems in FAI theory, x-risk strategy, effective altruism, etc. It also looks pretty useful to directly attack the problem of FAI, because I expect strategic information from the exercise, because philosophy is best done from within a science, etc.

[-]Wei Dai13y70

Do you think the question in the OP is significantly and immediately policy-relevant?

Yes, because it implies that building FAI is even harder than it looked before. All the "reasonable ideas" presented so far require doubling the required resources or multiplying them several times, and further slowing down FAI progress relative to people unconcerned with Friendliness. Together with the apparent difficulty of solving the FAI-specific problem over and above the AGI problems, and the outside-view conclusion that there's no way to become confident in a bunch of new philosophical ideas in a short amount of time, it suggests that the positive impact of working on FAI now is low relative to the harm (i.e., shortening of AI timelines and hence reducing time to effect other interventions) caused by it.

it looks really useful to collect a bunch of young people with top cognitive ability and turn their attention to concrete research problems in FAI theory, x-risk strategy, effective altruism

I would agree with the second two categories here, but all your workshops are focused on the first one. I think people initially drawn by FAI theory can subsequently become interested in strategy and altruism, but they can also become interested in AGI capability work. For example some may become intellectually (i.e., intrinsically as opposed to instrumentally) interested in an FAI problem and continue to work on it even when it subsequently appears to be helpful for AGI in general, or find that the FAI problem they're interested in depends on solving some AGI capability problem first, and switch over to doing that.

Given the obvious lack of sufficient number of highly talented people working on strategy today, and the lack of any apparent downsides to making people more interested in it, I don't know why you (i.e., MIRI) are not trying to draw people into that work more directly, for example by making more LW posts on important strategic problems, or by hosting workshops on it. Or when other people (like me) make posts about strategy, it seems like you could at least join in the discussions more enthusiastically. It's really hard for me to understand MIRI's current approach of making a big push to popularize FAI work (which I think has net negative impact, or at least highly uncertain impact) while making almost no efforts to make people more interested in strategy. Are you doing something behind the scenes that I don't know about, for example talking with workshop participants privately to try to turn their attentions to strategy?

It also looks pretty useful to directly attack the problem of FAI, because I expect strategic information from the exercise, because philosophy is best done from within a science, etc.

What strategic information have you obtained from the FAI work (e.g., workshops held) so far? What further strategic information are you hoping for at this point and what strategic problems are you hoping to solve with that information? I don't understand the relevance of the linked post to the issue at hand. Can you explain?

[-]lukeprog13y100

Yes, because it implies that building FAI is even harder than it looked before.

I'm not sure this is true for me. I've always had the model that FAI is super-hard and probably requires enormous resources, including resources for information security, for scrutinizing philosophical reasoning far more thoroughly and effectively than is done in philosophy departments, for doing the strictly-harder job of FAI before others build uFAI, etc. The chances that a non-profit will build FAI before the whole world builds AGI, or even that a non-profit will meaningfully accelerate AGI progress, look pretty slim to me. That's probably not where MIRI's value is coming from, at least as I see things today.

Maybe this is another case where the OP was written for Eliezer, but now somebody else is responding from a different perspective.

almost no efforts to make people more interested in strategy.

First: I think "almost no efforts" is wrong. Remember, I'm the one who began writing AI Risk & Opportunity: A Strategic Analysis, who writes fairly thoroughly-researched strategic analyses on MIRI's blog, who interviews experts mostly about strategic issues, etc. I've been supervising additional strategic research that hasn't been published yet, too. MIRI has also been paying Carl's salary for years, and Carl mostly does strategic research. Most of MIRI's publications are about strategy, not about FAI.

Second: Some reasons we ended up emphasizing math research in 2013 are given here, and in an earlier strategic document (which you were shown, since I asked for your feedback during our 2013 planning).

Third: Even if FAI is super-hard, wouldn't you rather have an AI that had 120 points of safety/friendliness effort put into it over an AI that had 40 points of safety/friendliness effort put into it? If MIRI or a successor successfully builds FAI before the world builds uFAI, it sure as heck isn't going to be with a big lead time. (I suspect you've responded to this point elsewhere; I just don't recall. Feel free to link me to another comment.)

What strategic information have you obtained from the FAI work (e.g., workshops held) so far?

Very little so far; we haven't been doing this for long.

What further strategic information are you hoping for at this point and what strategic problems are you hoping to solve with that information?

What kinds of researchers are needed for FAI progress? How mathematically hard is FAI progress? How philosophically hard is FAI progress? How subtle is FAI when you actually try to make progress toward building it? (e.g. maybe it turns out to be not as subtle as Eliezer expects, and we can call in Google to help with most of it) How "outsourceable" is FAI work, and which parts are most outsourceable? Which problems contain hidden dangers, such that they should be kept secret if possible?

Also, some kinds of technical progress (e.g. on decision theory and anthropic reasoning) are not just useful for FAI, but also useful for thinking about the relevant strategic issues.

I don't understand the relevance of the linked post to the issue at hand. Can you explain?

The chosen focus on technical research is also, in part, drawing from priors about what kinds of work typically lead to progress. Example 1: when you're starting a highly uncertain business venture, you should do some analysis, but at some point you just need to start building the product because that will teach you things you couldn't get from a prior-to-building-work analysis. Example 2: philosophers commenting on another domain often do less well than scientists working within the domain who are capable of some high-level thinking about it, because it often "takes intimate involvement with the scientific domain in order to do the philosophical thinking." E.g. if you reason rather distantly about what's needed for safe AI you might come up with Tool AI, and it's only when you think through the details of what it would take to build a safe AI that you notice some subtler complications like those in section 2 of this post. Example 3: one might reason from a distance that the greatest risk from molecular nanotechnology is the "grey goo" scenario, but when you think through the physical details of what it would take to build self-replicating machines capable of a grey goo scenario, you realize that grey goo is probably easier to avoid than some other dangers from molecular nanotechnology, and this has strategic implications for what kind of risk mitigation work to focus one's efforts on.

[-]Wei Dai13y80

Maybe this is another case where the OP was written for Eliezer, but now somebody else is responding from a different perspective.

Good point. I need to keep this in mind more in the future. I guess at this point I understand my disagreement with Eliezer and even Paul relatively well but I don't know where the core of the disagreement between the two of us lies. Do you have a better idea of this? Is there one or two things that you think I'm wrong about, such that if I were to change my mind about them, we'd be in much better agreement on overall strategy? If not then I guess we need to talk a bunch more to figure it out.

who writes fairly thoroughly-researched strategic analyses on MIRI's blog, who interviews experts mostly about strategic issues, etc

Ah, availability bias on my part, since I follow LW more closely than MIRI's blog. I wonder why you don't post/crosspost those things here though (when Eliezer did post his open FAI questions here), given that the average LW post draws way more comments than the average MIRI blog post, and probably has a higher readership or at least an additional nonoverlapping readership. I mean, why not take advantage of all of MIRI's (multiple years of) hard work in building up this community?

Even if FAI is super-hard, wouldn't you rather have an AI that had 120 points of safety/friendliness effort put into it over an AI that had 40 points of safety/friendliness effort put into it?

No that's not clear. An almost-Friendly AI could be worse than a completely indifferent AI, since "bad" (as opposed to "neutral", e.g. suffering maximizers as opposed to paperclip maximizers) utility functions are close in design space to "good" utility functions. I think somebody made this argument before on LW but I forgot who/where. There's also an argument that if acausal control/trading is possible, more "philosophically sophisticated" UFAI can be bad because they are harder for FAIs to control or provide more competition for FAIs in the acausal economy.

I think what may be useful is "AI safety" (for lack of a better term) research that is done explicitly under the assumption that we may have to deploy a not-quite-Friendly AI to head off a greater danger, which would involved approaches quite different Eliezer's current one. I made some suggestions along these lines previously, but again this isn't MIRI's current focus.

Very little so far; we haven't been doing this for long.

In that case it seems like the cost of the strategic information you're seeking is really high (both in terms of resources and in terms of potential negative impact), and I'm having trouble understanding your current strategy of seeking this information. Again I'm not sure where the core of our disagreement lies so perhaps I should wait for your reply on that before going further.

[-]lukeprog13y50

Quick note: I'm glad we're doing this. Let's keep going.

I guess at this point I understand my disagreement with Eliezer and even Paul relatively well but I don't know where the core of the disagreement between the two of us lies. Do you have a better idea of this?

Is it easy for you to sum up your core disagreements with Eliezer and with Paul? That would be pretty useful to my own strategic thinking.

As for where our core strategic disagreements are… I skimmed through your posts that looked fairly strategy-relevant: Cynical explanations of FAI critics, Work on security instead of Friendliness?, How can we ensure that a Friendly AI team will be sane enough, Reframing the problem of AI progress, Against "AI risk", Modest superintelligences, Wanted: backup plans for "see AI turns out to be easy", Do we want more publicity, and if so how?, Some thoughts on singularity strategies, Outline of possible singularity scenarios (that are not completely disastrous), Metaphilosophical Mysteries, Hacking the CEV for fun and profit, Late great filter is not bad news, Complexity of value ≠ complexity of outcome, Value uncertainty and the singleton scenario, Non-Malthusian scenarios, Outside view(s) and MIRI's FAI endgame, and Three approaches to "Friendliness".

My first idea is that we might disagree about the plausibility of the alternatives to "AI-foom disaster" you list here. My second idea is that some of our core disagreements are about the stuff we're talking about in this thread already.

I wonder why you don't post/crosspost those things here

I did this once and didn't get much of a response. But maybe I could do it more anyway.

No that's not clear. An almost-Friendly AI could be worse than a completely indifferent AI, since "bad" ...utility functions are [closer] in design space to "good" utility functions. I think somebody made this argument before on LW but I forgot who/where.

Right. I think I was persuaded of this point when we discussed it here. I think the question does deserve more analysis (than I've seen written down, anyway), but I could easily see it being one of the questions that is unwise to discuss in great detail in public. I'd definitely like to know what you, Eliezer, Carl, Bostrom, etc. think about the issue.

The key questions are: how much greater is P(eAI | FAI attempted) than P(eAI | FAI not-attempted), and what tradeoff are we willing to accept? The first part of this question is another example of a strategic question I expect work toward FAI to illuminate more effectively than any other kind of research I can think of to do.

I notice that many of your worries seem to stem from a worry not about the math work MIRI is doing now, but perhaps from a worry about mission lock-in (from cognitive dissonance, inertia, etc.). Is that right? Anyway, I don't think even Eliezer is so self-confident that he would make a serious attempt at Yudkowskian FAI even with mounting evidence that there was a pretty good chance we'd get eAI from the attempt.

There's also an argument that if acausal control/trading is possible

I have no comment on this part since I haven't taken much time to familiarize myself with acausal trade arguments. And I stand by that choice.

I think what may be useful is "AI safety" (for lack of a better term) research that is done explicitly under the assumption that we may have to deploy a not-quite-Friendly AI to head off a greater danger, which would involved approaches quite different Eliezer's current one

It's not MIRI's focus, but e.g. Carl has spent a lot of time on that kind of thing over the past couple years, and I'm quite happy for FHI to be working on that kind of thing to some degree.

In that case it seems like the cost of the strategic information you're seeking is really high

You might be underestimating how costly it is to purchase new strategic insights at this stage. I think Bostrom & Yudkowsky & company picked up most of the low-hanging fruit over the past 15 years (though, most of it hasn't been written up clearly anywhere). Example 1 (of difficulty of purchasing new strategic insights): Bostrom's book has taken many person-years of work to write, but I'm not sure it contains any new-to-insiders strategic insights, it's just work that makes it easier for a wider population to build on the work that's already been done and work toward producing new strategic insights. Example 2: Yudkowsky (2013)+Grace (2013) represents quite a lot of work, but again doesn't contain any new strategic insights; it merely sets up the problem and shows what kind of work would need to be done on a much larger scale to (maybe!) grab new strategic insights.

Also, remember that MIRI's focus on math work was chosen because it purchases lots of other benefits alongside some expected strategic progress – benefits which seem to be purchased less cheaply by doing "pure" strategic research.

[-]Wei Dai13y70

Quick note: I'm glad we're doing this. Let's keep going.

Thanks, that's good to know.

Is it easy for you to sum up your core disagreements with Eliezer and with Paul?

I guess I would describe my overall view as being around 50/50 uncertain about whether the Singularity will be Yudkowsky-style (fast local FOOM) or Hanson-style (slower distributed FOOM). Conditioning on Yudkowsky-style Singularity, I agree with Eliezer that the default outcome is probably a paperclipper-style UFAI, and disagree with him on how hard the FAI problems are (I think they are harder). Conditioning on Hanson-style Singularity, I agree with Hanson's competitive evolution / burning of the cosmic commons scenario, but disagree with him in that I think that would be a terrible outcome rather than an ok outcome. Paul seems to be similarly uncertain about the speed and locality of the intelligence explosion, but apparently much more optimistic than me (or Eliezer and Robin) about the outcome of both scenarios. I'm not entirely sure why yet.

My first idea is that we might disagree about the plausibility of the alternatives to "AI-foom disaster" you list here.

Disagree in which direction?

I did this once and didn't get much of a response. But maybe I could do it more anyway.

You got 11 comments here versus none at MIRI's blog. Seems like a no-brainer to me...

I notice that many of your worries seem to stem from a worry not about the math work MIRI is doing now, but perhaps from a worry about mission lock-in (from cognitive dissonance, inertia, etc.). Is that right? Anyway, I don't think even Eliezer is so self-confident that he would make a serious attempt at Yudkowskian FAI even with mounting evidence that there was a pretty good chance we'd get eAI from the attempt.

I'm not so much worried about Eliezer's own FAI attempts specifically (although I'm not as confident about Eliezer's rationality as you are), as worried about MIRI making a lot of people interested in various problems that it thinks are safe for people to work on in public. But since such safety is hard to predict in advance and FAI problems are intimately connected with AGI problems, and those people will publish interesting technical papers and draw even more people in who might be interested purely in the technical problems, you'll have a lot of people working on FAI/AGI problems, that wouldn't exist if MIRI hadn't gotten the snowball started.

Suppose at some point MIRI decides that P(eAI | FAI attempted) is too high and FAI shouldn't be attempted, or that people are doing a lot of unwanted AGI capability work as a result of your current activities, how do you get everyone to stop?

You might be underestimating how costly it is to purchase new strategic insights at this stage.

I find it relatively easy to write down my strategic thoughts in blog form, and they tend to be well-received on LW, so they seem to be a counterexample to your "difficulty of purchasing new strategic insights". Unless all my ideas/arguments were already thought of before and taken into account by "insiders", but just never written down somewhere public? But if that's the case, why haven't they been written down? Again, it wasn't that hard for me to write down my thoughts in a way that's understandable to more than just "insiders".

[-]lukeprog13y60

Disagree in which direction?

My guess was that you think the things on this list are more probable and/or bad than I do, especially the first three. But to keep things focused, I suggest we not add those possible disagreements to the current thread.

You got 11 comments here versus none at MIRI's blog. Seems like a no-brainer to me…

In part, I'm also trying to see whether we can cause discussion to happen on MIRI's blog. If not, we'll have to do it on LW. But that'll be better when we've had enough development resources go into LW that we can have subreddits (e.g. one for astronomical stakes stuff). Right now the thing under development is a dashboard that will make it easier to follow the content you want to follow across multiple subreddits, and then after that we plan to create subreddits so that people can more easily follow only the things they care about, rather than having everything get dumped into Discussion, such that busier and less interested people can't be bothered to follow everything in order to occasionally find snippets of things they care about.

Suppose at some point MIRI decides that P(eAI | FAI attempted) is too high and FAI shouldn't be attempted, or that people are doing a lot of unwanted AGI capability work as a result of your current activities, how do you get everyone to stop?

Ok, gotcha. Since you're in town in a few days, Eliezer suggested that we three chat about this when you get here. See you soon!

I find it relatively easy to write down my strategic thoughts in blog form, and they tend to be well-received on LW, so they seem to be a counterexample to your "difficulty of purchasing new strategic insights". Unless all my ideas/arguments were already thought of before and taken into account by "insiders", but just never written down somewhere public? But if that's the case, why haven't they been written down? Again, it wasn't that hard for me to write down my thoughts in a way that's understandable to more than just "insiders".

My impression is that the ideas in your strategic posts are mostly ones already considered by Eliezer, Carl, Bostrom, and perhaps other people, though the three of them could confirm or deny this. And there's a lot more that hasn't been written down. E.g. whenever we have strategy meetings at MIRI, we very quickly find ourselves discussing arguments that people like Eliezer and Carl have known about for 5+ years but which haven't been written down anywhere that I know of.

Why haven't these things been written down? Because there's so much to write up, because other projects often seem like higher value, because sometimes people are hesitant to publish half-baked reasoning under their own name… lots of reasons. I've been passionate about getting things written up since I arrived in April 2011, and tried lots of different things to get the people who know this stuff to write it down, but it's been a very slow process with many detours, and in the end I've had to do quite a bit of it myself. Beckstead's thesis helps. Bostrom's book will help, too.

[-]Wei Dai13y100

Is it just me, or is the situation of Eliezer and Carl having thought of all of these things but never written them down anywhere crazy? If Eliezer and Carl are unwilling or unable to write down their ideas, then the rest of us have no choice but to try to do strategy work ourselves even if we have to retrace a lot of their steps. The alternative is to for us to go through the Singularity with only two or three people having thought deeply about how best to make it turn out well. It's hard to imagine getting a good outcome while the world is simultaneously that crazy.

I guess my suggestion to you is that if you agree with me that we need a vibrant community of talented people studying and openly debating what is the best strategy for achieving a positive Singularity, then MIRI ought to be putting more effort into this goal. If it encounters problems like Eliezer and Carl being too slow to write down their ideas, then it should make a greater effort to solve such problems or to work around them, like encouraging independent outside work, holding workshops to attract more attention to strategic problems, or trying to convince specific individuals to turn their attention to strategy.

Since you're in town in a few days, Eliezer suggested that we three chat about this when you get here. See you soon!

While I look forward to talking to Eliezer and you, I do have a concern, namely that I find Eliezer to be much better (either more natively talented, or more practiced, probably both) than I am at making arguments in real time, while I tend to be better able to hold my own in offline formats like email/blog discussions where I can take my time to figure out what points I want to make. So keep that in mind if the chat ends up being really one-sided.

[-]lukeprog13y10

Is it just me, or is the situation of Eliezer and Carl having thought of all of these things but never written them down anywhere crazy?

You're preaching to the choir, here...

And you might be underestimating how many different things I tried, to encourage various experts to write things up at a faster pace.

As for why we aren't spending more resources on strategy work, I refer you to all my previous links and points about that in this thread. Perhaps there are specific parts of my case that you don't find compelling?

[-]Wei Dai13y80

You're preaching to the choir, here...

But here's what you said in the 2013 strategy post:

But it’s not clear that additional expository work is of high value after (1) the expository work MIRI and others have done so far, (2) Sotala & Yampolskiy’s forthcoming survey article on proposals for handling AI risk, and (3) Bostrom’s forthcoming book on machine superintelligence. Thus, we decided to not invest much in expository research in 2013.

Seems a bit inconsistent?

Perhaps there are specific parts of my case that you don't find compelling?

Yes, again quoting from that strategy post:

Valuable strategic research on AI risk reduction is difficult to purchase. Very few people have the degree of domain knowledge and analytic ability to contribute. Moreover, it’s difficult for others to “catch up,” because most of the analysis that has been done hasn’t been written up clearly. (Bostrom’s book should help with that, though.)

My point is that publicly available valuable strategic research on AI risk reduction isn't that difficult to purchase, and that's what we need. All that information locked up in Eliezer and Carl's heads isn't doing much good as far as contributing to the building of a vibrant research community on Singularity strategies. (I would argue it's not even very good for guiding MIRI's own strategy since it's not available for external review/vetting.) To create new publicly available strategic research, we don't need people to catch up to their level, just to catch up to whatever is publicly available now. (Note that you're wrongly discouraging people from doing strategy research by saying that they need to catch up to insiders' unpublished knowledge when they really don't.) The fact that you've tried many different things and failed to get them to write stuff down faster argues more strongly for this, since it means we can't expect them to write much stuff down in the foreseeable future.

Math research can get academic “traction” more easily than strategic research can.

I don't see a compelling argument that getting academic traction for FAI math research is of net positive impact and of similar magnitude compared to getting academic traction for strategic research, so the fact that it's easier to do isn't a compelling argument for preferring it over strategic research.

[-]lukeprog13y50

Seems a bit inconsistent?

Ah. Yes.

What I should have written is: "it's not clear that additional expository work, of the kind we can easily purchase, is of high value..." (I've changed the text now.) What I had in mind, there, is the very basic stuff that is relatively easy to purchase because 20+ people can write it, and some of them are available and willing to help us out on the cheap. But like I say in the post, I'm not sure that additional exposition on the super-basics is of high value after the stuff we've already done and Bostrom's book. (BTW, we've got another super-basics ebook coming out soon that we paid Stuart Armstrong to write, but IIRC that was written before the strategy post.)

I don't remember this far back, but my guess is that I left out the clarification so as to avoid "death by a thousand qualifications and clarifications." But looking back, it does seem like a clarification that should have gone in anyway, so I'm sorry about any confusion caused by its absence.

See, explaining is hard. :)

My point is that publicly available valuable strategic research on AI risk reduction isn't that difficult to purchase, and that's what we need

Turning completed but publicly unavailable strategic research into publicly available strategic research is very difficult to purchase. I tried, many many times. Paying Eliezer and Carl more would not cause them to write things up any faster. Paying people to talk to them and write up notes mostly didn't work, unless the person writing it up was me, but I was busy running the organization. I think there are people who could schedule a 1-hour chat with Eliezer or Carl, take a bunch of notes, and then write up something good, but those people are rarer than you might expect, and so skilled that they're already busy doing other high value work, like Nick Beckstead at FHI.

In any case, "turning completed but publicly unavailable strategic research into publicly available strategic research" is what I was calling "expository research" in that post, not what I was calling "strategic research."

I should also remind everyone reading this that it's not as if Eliezer & Carl have been sitting around doing nothing instead of writing up their strategy knowledge. First, they've done some strategy writing. Second, they've been doing other high-value work. Right now we're talking about, and elevating the apparent importance of, strategy exposition. But if we were having a conversation about the importance of community-building and fundraising, it might feel obvious that it was better for Eliezer to spend some time this summer to write more HPMoR rather than write up strategy exposition. "HPMoR" is now the single most common answer I get when I ask newly useful people (e.g. donors, workshop participants) how they found MIRI and came to care about its work.

Note that you're wrongly discouraging people from doing strategy research by saying that they need to catch up to insiders' unpublished knowledge when they really don't.

What makes you say that? I believe you can reinvent much of what Eliezer and Carl and Bostrom and a few others already know but haven't written down. Not sure that's true for almost most everyone else.

Still, you're right that I don't want to discourage people from doing strategy work. There are places people can contribute to the cutting edge of our strategic understanding without needing to rediscover what the experts have already discovered (see one example below).

The fact that you've tried many different things and failed to get them to write stuff down faster argues more strongly for this, since it means we can't expect them to write much stuff down in the foreseeable future.

I'm not so sure. I mean, the work is getting out there, in MIRI blog posts, in stuff that FHI is writing, etc. - it's just not coming out as quickly as any of us would like. There's enough out there already such that people could contribute to the cutting edge of our understanding if they wanted to, and had the ability and resources to do so. E.g. Eliezer's IEM write-up + Katja's tech report describe in great detail what data could be collected and organized to improve our understanding of IEM, and Katja has more notes she could send along to anyone who wanted to do this and asked for her advice.

I don't see a compelling argument that getting academic traction for FAI math research is of net positive impact and of similar magnitude compared to getting academic traction for strategic research, so the fact that it's easier to do isn't a compelling argument for preferring it over strategic research.

Right, this seems to go back to that other disagreement that we're meeting about when you arrive.

[-]Benya13y10

Note that you're wrongly discouraging people from doing strategy research by saying that they need to catch up to insiders' unpublished knowledge when they really don't.

What makes you say that? I believe you can reinvent much of what Eliezer and Carl and Bostrom and a few others already know but haven't written down. Not sure that's true for almost most everyone else.

I read the idea as being that people rediscovering and writing up stuff that goes 5% towards what E/C/N have already figured out but haven't written down would be a net positive and it's a bad idea to discourage this. It seems like there's something to that, to the degree that getting the existing stuff written up isn't an available option -- increasing the level of publicly available strategic research could be useful even if the vast majority of it doesn't advance the state of the art, if it leads to many more people vetting it in the long run. I do think there is probably a tradeoff, where Eliezer &c might not be motivated to comment on other people's posts all that much, making it difficult to see what is the current state of the art and what are ideas that the poster just hasn't figured out the straight-forward counter-arguments to. I don't know how to deal with that, but encouraging discussion that is high quality compared to currently publicly available strategy work still seems quite likely to be a net positive?

[-]lukeprog13y20

One way to accelerate the production of strategy exposition is to lower one's standards. It's much easier to sketch one's quick thoughts on an issue than it is to write a well-organized, clearly-expressed, well-referenced, reader-tested analysis (like When Will AI Be Created?), and this is often enough to provoke some productive debate (at least on Less Wrong). See e.g. Reply to Holden on Tool AI and Do Earths with slower economic growth have a better chance at FAI?.

So, in the next few days I'll post my "quick and dirty" thoughts on one strategic issue (IA and FAI) to LW Discussion, and see what comes of it.

[-]Benya13y00

Glad to hear that & looking forward to seeing how it works! I very much understand that one might be concerned about posting "quick and dirty" thoughts (I find it so very difficult to lower my own standards even when it's obviously blocking me from getting stuff done), but there seems to be little cost of trying it with a Discussion post and seeing how it goes -- yay value of information! :-)

[-]lukeprog13y20

The experiment seems to have failed.

[-]Benya13y00

Drats. But also, yay, information! Thanks for trying this!

ETA: Worth noting that I found that post useful, though.

[-]Kaj_Sotala13y40

Ok, gotcha. Since you're in town in a few days, Eliezer suggested that we three chat about this when you get here. See you soon!

If any of you three have the time to write up a brief summary of that chat afterwards, I expect that a lot of people would be interested in reading that. (Where "a lot of people" = "I know that I would be, and generalizing from my own example obviously means that other people must be, too". :-))

[-]NancyLebovitz14y60

Would anyone care to compare the risks from lack of rationality to the risks from making as good an effort as possible, but just plain being wrong?

[-]erratio14y00

Are they relevantly different? Actually, now that I think about it it seems that 'lack of rationality' should be a subset of 'trying hard and failing'

[-]NancyLebovitz14y50

I think there's a difference between falling prey to one of the usual biases and just not having enough information.

[-]roll14y00

Of course, but one can lack information and conclude "okay, I don't have enough information", or one may not arrive at such conclusion due to the overconfidence (for example).

[-]James_Miller14y50

Neurofeedback could play a role. I'm a few weeks into a program and will probably eventually write a top-level post on Neurofeedback, intelligence enhancement, and rationality.

[-]Vladimir_Nesov14y100

Sounds irrelevant.

[-]James_Miller14y50

From here:

"After the EEG data are obtained...the neurotherapist can often tell [the client] why they have come for treatment without any additional information about their condition."

[-]gwern14y70

I'd believe that... about gross pathologies that could prompt outright therapy by the norms of ordinary society.

[-]James_Miller14y10

Or about stuff like anxiety, sleeplessness, and difficulties concentration. I'm hoping neurofeedback will make me "better than well."

[-]Jack R4y20

How'd this go? Just searched LW for "neurofeedback" since I recently learned about it

[-]James_Miller4y40

I stopped doing it years ago. At the time I thought it reduced my level of anxiety. My guess now is that it probably did but I'm uncertain if the effect was placebo.

[-]Alex_Altair14y20

I am very interested in this.

[-]roll14y00

That's an interesting question. Eliezer has said on multiple occasions that most AI researchers now are lunatics, and he is probably correct; how would outsider distinguish Friendly AI team from the most? The fact of concern with safety, alone, is a poor indicator of sanity; many insane people are obsessed with safety of foods, medications, air travel, safety from the government, etc etc.

[-]djcb14y00

The self-improving AI will not suddenly appear; I would expect a number of different stages of increasingly powerful sub-self-improving AIs with a decreasing amount of direct human interaction. The key would be to use formal methods and theorem proving, and ensure that each stage can be formally proved by the stage below it.

Since even formal proofs / theorem provers could contain bugs, using parallel teams (as Gwern mentions) can reduce that risk.

The FAI (or UFAI) level seems much too advanced for any human to comprehend directly, let alone understand its friendliness.

[-]CasioTheSane14y-20

To play devil's advocate, what if it's irrational given the current available information to think that self-improving AI can be developed in the near future, or is more important to work on than other existential risks?

[+]Thomas14y-60

[+]private_messaging14y-90

Moderation Log