[Link] If we knew about all the ways an Intelligence Explosion could go wrong, would we be able to avoid them?

by the-citizen1 min read23rd Nov 201415 comments


Personal Blog


I submitted this a while back to the lesswrong subreddit, but it occurs to me now that most LWers probably don't actually check the sub. So here it is again in case anyone that's interested didn't see it.

15 comments, sorted by Highlighting new comments since Today at 2:02 AM
New Comment

I would have preferred that you just copy and paste into LW. Why did you put a link?

As a sidenote, why is there a lesswrong subreddit?

Because some guy on reddit (now inactive) made the subreddit 4 years ago.

Edit: I'm serious, that's actually why it exists. u/therationalfuturist made the subreddit, hasn't posted in 4 years, was apparently a student at Yale. Most of the posts made from the account were quoting stuff people said on LW (4 years ago).

Because I don't yet have enough karma to post here.

[-][anonymous]6y 1

Answer: clearly, no. If you know all the ways things can go wrong, but don't know how to make them go right, then your knowledge is useless for anything except worrying.

Thanks for comment. I will reply as follows:

  • Knowing how things could go wrong gives useful knowledge about scenarios/pathways to avoid
  • Our knowledge of how to make things go right is not zero

My intention with the article is to draw attention to some broader non-technical difficulties in implementing FAI. One worrying theme in the reponses I've gotten is a conflation between knowledge of AGI risk and building a FAI. I think they are separate projects, and that success of the second relies on comprehensive prior knowledge of the first. Apparently MIRI's approach doesn't really acknowledge the two as separate.

If you know all the ways things can go wrong, but don't know how to make them go right, then your knowledge is useless for anything except worrying.

May I recommend the concept of risk management to you? It's very useful.

It's generally easier to gain the knowledge of how to make things go right when your research is anchored by potential problems.

Eliezer has expressed that ultimately, the goal of MIRI is not just research how to make FAI, but to be the one's to make it.

In many ways it's a race. While the pubic is squabbling, someone is going to build the first recursively self-improving system. We're trying to maneuver the situation so that the people that do it first are the people who know what they're doing.

Eliezer has expressed that ultimately, the goal of MIRI is not just research how to make FAI, but to be the one's to make it.

Hmm..I wasn't aware of that. Is there any source for that statement? Is MIRI actually doing any general AI research? I don't think that you can easily jump from one specific field of AI research (ethics) to general AI research&design.

From here.

9. Is your pursuit of a theory of FAI similar to, say, Hutter's AIXI, which is intractable in practice but offers an interesting intuition pump for the implementers of AGI systems? Or do you intend on arriving at the actual blueprints for constructing such systems? I'm still not 100% certain of your goals at SIAI.

Definitely actual blueprint, but, on the way to an actual blueprint, you probably have to, as an intermediate step, construct intractable theories that tell you what you’re trying to do, and enable you to understand what’s going on when you’re trying to do something. If you want a precise, practical AI, you don’t get there by starting with an imprecise, impractical AI and going to a precise, practical AI. You start with a precise, impractical AI and go to a precise, practical AI. I probably should write that down somewhere else because it’s extremely important, and as(?) various people who will try to dispute it, and at the same time hopefully ought to be fairly obvious if you’re not motivated to arrive at a particular answer there. You don’t just run out and construct something imprecise because, yeah, sure, you’ll get some experimental observations out of that, but what are your experimental observations telling you? And one might say along the lines of ‘well, I won’t know that until I see it,’ and suppose that has been known to happen a certain number of times in history; just inventing the math has also happened a certain number of times in history.

We already have a very large body of experimental observations of various forms of imprecise AIs, both the domain specific types we have now, and the sort of imprecise AI constituted by human beings, and we already have a large body of experimental data, and eyeballing it... well, I’m not going to say it doesn’t help, but on the other hand, we already have this data and now there is this sort of math step in which we understand what exactly is going on; and then the further step of translating the math back into reality. It is the goal of the Singularity Institute to build a Friendly AI. That’s how the world gets saved, someone has to do it. A lot of people tend to think that this is going to require, like, a country’s worth of computing power or something like that, but that’s because the problem seems very difficult because they don’t understand it, so they imagine throwing something at it that seems very large and powerful and gives this big impression of force, which might be a country-size computing grid, or it might be a Manhattan Project where some computer scientists... but size matters not, as Yoda says.

What matters is understanding, and if the understanding is widespread enough, then someone is going to grab the understanding and use it to throw together the much simpler AI that does destroy the world, the one that’s build to much lower standards, so the model of ‘yes, you need the understanding, the understanding has to be concentrated within a group of people small enough that there is not one defector in the group who goes off and destroys the world, and then those people have to build an AI.’ If you condition on that the world got saved, and look back and within history, I expect that that is what happened in the majority of cases where a world anything like this one gets saved, and working back from there, they will have needed a precise theory, because otherwise they’re doomed. You can make mistakes and pull yourself up, even if you think you have a precise theory, but if you don’t have a precise theory then you’re completely doomed, or if you don’t think you have a precise theory then you’re completely doomed.


Aside from that, though, I think that saving the human species eventually comes down to, metaphorically speaking, nine people and a brain in a box in a basement, and everything else feeds into that. Publishing papers in academia feeds into either attracting attention that gets funding, or attracting people who read about the topic, not necessarily reading the papers directly even but just sort of raising the profile of the issues where intelligent people wonder what they can do with their lives think artificial intelligence...

I get the sense that Eliezer wants to be one of the nine people in that basement, if he can be, but I might be streching the evidence little to say "Eliezer has expressed that ultimately, the goal of MIRI is not just research how to make FAI, but to be the one's to make it."

Thanks! Haven't seen that before. I still think it would be better to specialize on ethics issue and than apply its result on AGI sytsem developed by other (hopefully friendly) party. But It would be awesome if someone who is genuinely ethical develops AGI first. I'm really hoping that some big org which went furthest in AI research like google decides to cooperate with MIRI on that issue when they reach the critical point in AGI buildup.

This is something that I think is neglected (in part because it's not the relevant problem yet) in thinking about friendly AI. Even if we had solved all of the problems of stable goal systems, there could still be trouble, depending on who's goals are implemented. If it's a fast take-off, whoever cracks recursive self-improvement first, basically gets Godlike powers (in the form a genii that reshapes the world according to your wish). They define the whole future of the expanding visible universe. There are a lot of institutions who I do not trust to have the foresight to think "We can create utopia beyond anyone's wildest dreams" and instead to default to "We'll skewer the competition in the next quarter."

However, there are unsubstantiated rumors that Google has taken some ex-MIRI people for work on a project of some kind.

I'm pretty new to this although I've read Kurzweil's book and Bostrom's Superintelligence, and a couple of years worth of mostly lurking on LW, so if there's if there's a shitload of thinking about this I hope to be corrected civilly

If friendly AI is to be not just a substitute for but our guardian against unfriendly AI, won't we end up thinking of all sorts of unfriendly AI tactics, and putting them into the friendly AI so it can anticipate and thwart them? If so, is there any chance of self-modification in the friendly AI turning all that against us? Ultimately, we'd count on the friendly AI itself trying to imagine and develop countermeasures against unfriendly AI tactics that are beyond our imagination, but then same problem maybe.

I've been pondering for some time, especially prompted by the book Boyd: The Fighter Pilot Who Changed the Art of War by Robert Coram, how one might distinguish qualities of possible knowledge that make them more or less likely to be of general benefit to humankind. Conflict knowledge seems to have a general problem. It is often developed under the optimistic assumption that it will give "us" who are well-intentioned the ability to make everybody else behave -- or what is close to the same thing, it is developed under existential threat such that it is difficult to think a few years out -- we need it or the evil ones will annihilate us. Hence the US and the atom bomb from 1945-49. Note that this kind of situation also motivates some (who have anticipated where I'm going) to insist "We have this advantage today -- we probably won't have it a few years from now -- lets maximize our advantage while we have it (i.e. bomb the hell out of the USSR in 1946).

Another kind of knowledge might be called value-added knowledge -- knowledge that disproves assumptions about economics being a zero-sum game. Better agriculture, house construction, health measures ... One can always come up with counterexamples and some are quite non-trivial -- the Internet facilitates formation of terrorist groups and other "echo chambers" of people with destructive or somehow non-benign belief systems. Maybe indeed media development falls in some middle-ground between value-added and conflict-oriented knowledge. Almost anything that can be considered "beneficial to humankind" might just advantage one supremely evil person, but I still think we can meaningfully speak of its general tendency to be beneficial, while the tendency of conflict-knowledge seems mostly in the long run to be neutral at best

Boyd, while developing a radically new philosophy of war-fighting got few rewards in the way of promotion and he was always embattled in the military establishment, but he collected around him a few strong acolytes, and did really if inadequately affect the design of fighter planes and their tactics, and his thought grew more and more ambitious until they embraced the art of war generally, and Coram strongly suggests he was as the side of the planners of the first Gulf War, and had a huge impact on how that was waged.

Unfortunately, as the book was being written several years later, there was speculation that people like Al Qaeda had incorporated some of the lessons of new warfare doctrines developed by the US.

It is generally problematic to predict where knowledge construction is going -- because by definition we are making predictions about stuff the nature of which we don't understand because it hasn't been thought up yet -- yet it seems we had better try, and Moore's law gives one bit of encouragement. MIRI seems to be in part a huge exercise in this problematic sort of thinking.

If I have anything more than maybe "food for thought", it may be to look for general tendencies (perhaps unprovable tendencies like Moore's law) in the way kinds of knowledge affect conflict.