Safety Culture and the Marginal Effect of a Dollar

by jimrandomh3 min read9th Jun 2011110 comments


Personal Blog

We spent an evening at last week's Rationality Minicamp brainstorming strategies for reducing existential risk from Unfriendly AI, and for estimating their marginal benefit-per-dollar. To summarize the issue briefly, there is a lot of research into artificial general intelligence (AGI) going on, but very few AI researchers take safety seriously; if someone succeeds in making an AGI, but they don't take safety seriously or they aren't careful enough, then it might become very powerful very quickly and be a threat to humanity. The best way to prevent this from happening is to promote a safety culture - that is, to convince as many artificial intelligence researchers as possible to think about safety so that if they make a breakthrough, they won't do something stupid.

We came up with a concrete (albeit greatly oversimplified) model which suggests that the marginal reduction in existential risk per dollar, when pursuing this strategy, is extremely high. The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously. In this model, the goal is to convince as many researchers as possible to take safety seriously. So the question is: how many researchers can we convince, per dollar? Some people are very easy to convince - some blog posts are enough. Those people are convinced already. Some people are very hard to convince - they won't take safety seriously unless someone who really cares about it will be their friend for years. In between, there are a lot of people who are currently unconvinced, but would be convinced if there were lots of good research papers about safety in machine learning and computer science journals, by lots of different authors.

Right now, those articles don't exist; we need to write them. And it turns out that neither the Singularity Institute nor any other organization has the resources - staff, expertise, and money to hire grad students - to produce very much research or to substantially alter the research culture. We are very far from the realm of diminishing returns. Let's make this model quantitative.

Let A be the probability that an AI will be created; let R the fraction of researchers that would be convinced to take safety seriously if there were a 100 good papers in about it in the right journals; and let C be the cost of one really good research paper. Then the marginal reduction in existential risk per dollar is A*R/100*C. The total cost of a grad student-year (including recruiting, management and other expenses) is about $100k. Estimate a 10% current AI risk, and estimate that 30% of researchers currently don't take safety seriously but would be convinced. That gives is a marginal existential risk reduction per dollar of 0.1*0.3/100*100k = 3*10^-9. Counting only the ~7 billion people alive today, and not any of the people who will be born in the future, this comes to a little over two expected lives saved per dollar.

That's huge. Enormous. So enormous that I'm instantly suspicious of the model, actually, so let's take note of some of the things it leaves out. First, the "one researcher at random determines the fate of humanity" part glosses over the fact that research is done in groups; but it's not clear whether adding in this detail should make us adjust the estimate up or down. It ignores all the time we have between now and the creation of the first AI, during which a safety culture might arise without intervention; but it's also easier to influence the culture now, while the field is still young, rather than later. In order for promoting AI research safety to not be an extraordinarily good deal for philanthropists, there would have to be at least an additional 10^3 penalty somewhere, and I can't find one.

As a result of this calculation, I will be thinking and writing about AI safety, attempting to convince others of its importance, and, in the moderately probable event that I become very rich, donating money to the SIAI so that they can pay others to do the same.


110 comments, sorted by Highlighting new comments since Today at 1:28 AM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

It worries me a tad that nobody in the discussion group corrected what I consider to be the obvious basic inaccuracy of the model.

Success on FAI is not a magical result of a researcher caring about safety. The researcher who would have otherwise first created AGI does not gain the power to create FAI just by being concerned about it. They would have to develop a stably self-improving AI which learned an understandable goal system which actually did what they wanted. This could be a completely different set of design technologies than what would have gone into something unstable that improved itself by ad-hoc methods well enough to go FOOM and end the game. The researcher who would have otherwise created AGI might not be good enough to do this. The best you might be able to convince them to do would be to retire from the game. It's a lot harder to convince someone to abandon the incredibly good idea they're enthusiastic about, and start over from scratch or leave the game, then to persuade people to be "concerned about safety", which is really cheap (you just put on a look of grave concern).

If I thought all you had to do to win was to convince the otherwise-first cre... (read more)

The main advantage of convincing mainstream AI people that FAI is a problem worth worrying about appears to be not that you will have mainstream AI people thinking twice before they build their AGI, but that you will then have mainstream AI people working on FAI. More people working on a given problem seems to make it massively more likely that the problem will be solved.

If there are rigorous arguments that FAI is worth worrying about, and that there are interesting questions about which people could be doing useful incremental research, then convincing people who work in universities to start doing this research has to be such a massive win than it would take something pretty huge to outweigh it - there are a lot of very clever people working in universities, massively more than will ever work at SingInst, and they already have a huge network in place to give them money to think about the things they find interesting.

7jimrandomh10yIndeed, all of this was discussed at the time, and these complexities do indeed make the model produce an overestimate. However, I really don't think think the difference is whole orders of magnitude, and this is definitely wrong. While there is a great deal more that needs to be figured out in order for an AI to be friendly, much of it is research that academia could do, too, if only they thought it was worthwhile. I plan to write an article about just what "being safety conscious" would mean, but it's not "spending a few extra days on safety features before flipping the switch", it's more like handing the whole project over to friendliness researchers experts and taking advantage of whatever friendliness research has been done up to that point. Those experts and that research need to exist, but I don't think those differences are on the margin of current existential risk reduction spending, since the limiting resource there isn't money.
3GuySrinivasan10yAfter reading Eliezer's comment and yours, I now think the "30%" figure for "being safety conscious" needs unpacking. In particular I think there's a tendency to picture the most safety conscious of the converts, and say the entire 30% looks like that, even though (for me at least) the intuitions which let 30% be plausible are based on researchers intellectually believing safety consciousness is very important rather than researchers taking actions as if safety consciousness is very important.
3Wei_Dai10yGuySrinivasan's comment [] seems to suggest that the estimated marginal effect of a dollar could be at least 2 orders of magnitude smaller if additional considerations are taken into account. See "down by a factor of 10" and "10%-50% that being safety conscious works".
4Dr_Manhattan10yI agree with you that we're stuck in (arguably unpleasant) position of having to actually go ahead with the FAI as a project; still, academic persuasion might get you funds and some of the best brains for your project.
4CarlShulman10ySafety-speed tradeoffs, the systematic bias in "one randomly selected researcher," and AGI vs FAI difficulty were discussed at the time.
0[anonymous]10yI think you can infer from GuySrinivasan's comment [] that they did (unfortunately the evidence is presented in an overly cryptic way).
-4Perplexed10yThis comment seems to argue that "trying to assemble a team to solve basic FAI problems over however-many years and then afterward build FAI" is the real goal here and that "convincing someone to take something seriously" is barely worth thinking about. However, it certainly seems to me that convincing people to take the problem seriously is a productive (and perhaps an essential) first step toward assembling a team. However, reading the subtext in that comment, it certainly appears that the real fear expressed here is that if safety consciousness should become endemic in the AGI community, there is a real risk that someone else might produce FAI before Eliezer.
4Stuart_Armstrong10yThat makes no sense. If safety consciousness means that the AGI community is likely to produce FAI before Eliezer, then without safety consciousness, the AGI community is even more likely to produce UFAI before Eliezer produces FAI. Either way, Eliezer gets scooped; but in the second case, we're very dead.
3Nisan10yThat's hardly charitable.
0[anonymous]10yI disagree strongly, on several points. * That anyone would attempt to implement FAI with any definition similar to that of SIAI seems highly unlikely, regardless of safety concern. * That Eliezer would be upset if someone got it right before he did seems obviously absurd. * That there's a fear of safety consciousness being too good, rather than safety consciousness being a farce put on for grant applications and PR purposes, makes no sense. Finally, the tone of your post, and almost any other post you have regarding the topic of FAI, provokes responses in me which seem out of proportion to what they should be.
7jimrandomh10yAs explicit reasoning, yes, that would be absurd. But we are all primates, and the thought of being overshadowed feels bad on a subconscious level, even if that feeling disagrees with conscious beliefs. "I will do it right and anyone else who tries will do it wrong" is an unlikely thing to believe, but a likely thing to alieve.

We spent an evening at last week's Rationality Minicamp... We came up with a concrete (albeit greatly oversimplified) model...

Just to be clear: this model was drafted by a couple of mini-camp participants, not by the workshop as a whole, and isn't advocated by the Singularity Institute. For example, when I do my own back-of-the-envelopes I don't expect nearly a 30% increase in existential safety from convincing 30% of AI researchers that risk matters. Among other things, this is because there's a distance between "realize risk matters" and "successfully avoid creating UFAI" (much less "create FAI")", since sanity and know-how also play roles in AI design; and partly because there are more players than just AI researchers.

Still, it is good to get explicit models out there where they can be critiqued -- I just want to avoid folks having the impression that this is SingInst's model, or that it was taught at minicamp.

I agree that there is a lot of room for more and better academic work on this topic to reduce existential risk (including other channels like more academic research into AI safety strategies, influence on other actors like large corporations and governments, etc), but as I said at the minicamp, I think the assumptions of this model systematically lead to overestimates of effectiveness of this channel (EDIT: and would lead to overestimates of other strategies as well, including the "FAI team in a basement" strategy as I mention in my comment below).

One of the primary reasons for concern about AI risk is the likelihood of tradeoffs between safety and speed of development. Commercial or military competition make it plausible that quite extensive tradeoffs along these lines will be made, so that reckless (or self-deceived) projects are more likely to succeed first than more cautious ones. So the "random selection" assumption disproportionately favors safety.

The assumption that safety-conscious researchers always succeed in making any AI they produce safe is also fairly heroic and a substantial upward bias. There may be some cheap and simple safety measures that any ... (read more)

8whpearson10yWhat makes the SIAI team, that will be assembled, any different?
5CarlShulman10yI think many of the same assumptions also lead to overestimates of the success odds of an SIAI team in creating safe AI. In general, some features that I would think conduce to safety and could differ across scenarios include: * Internal institutions and social epistemology of a project that makes it possible to slow down, or even double back, upon discovering a powerful but overly risky design, rather than automatically barreling ahead because of social inertia or releasing the data so that others do the same * The relative role of different inputs, like researchers of different ability levels, abundant computing hardware, neuroscience data, etc, in designing AI; with some patterns of input favoring higher understanding by designers of the likely behavior of their systems * Dispersion of project success, i.e. the longer a period after finding the basis of a design in which one can expect other projects not to reach the same point; the history of nuclear weapons suggests that this can be modestly large (nukes were developed [] by the first five powers in 1945, 1949, 1952, 1960, 1964) under some development scenarios, although near-simultaneous development is also common in science and technology * The type of AI technology: whole brain emulation looks like it could be relatively less difficult to control initially by solving social coordination problems, without developing new technology, while de novo AGI architectures may vary hugely in the difficulty of specifying decision algorithms with needed precision Some shifts along these dimensions do seem plausible given sufficient resources and priority for safety (and suggest, to me, that there is a large spectrum of safety investments to be made beyond simply caring about).
1whpearson10yAnother factor to consider, the permeability of the team, how much they are likely to leak information to the outside world. However if the teams are completely impermeable then it becomes hard for external entities to evaluate the other factors for evaluating the project. Does SIAI have procedures/structures in place to shift funding between the internal team and more promising external teams if they happen to arise?
1CarlShulman10yMost potential funding exists in the donor cloud, which can reallocate resources easily enough; SIAI does not have large reserves or an endowment that would be encumbered by the nonprofit status. Ensuring that the donor cloud is sophisticated and well-informed contributes to that flexibility, but I'm not sure what other procedures you were thinking about. Formal criteria to identify more promising outside work to recommend?
0whpearson10yI think that might help. In this matter it all seems to be about trust. * People doing outside work have to trust that SIAI will look at their work and may be supportive. Without formal guidelines, they might suspect that their work will be judged subjectively and negatively due to potential conflict of interest due to funding. * SIAI also need to be trusted not to leak information from other projects as they evaluate them, having a formal vetted well known evaluation team might help with that. * The Donor cloud needs to trust SIAI to look at work and make a good decision about it, not just based on monkey instincts. Formal criteria might help instill that trust. SIAI doesn't need all this now as there aren't any projects that need evaluating. However it is something to think about for the future.
1timtyler10yI don't think the SIAI has much experience writing code, or programming machine learning applications. Superficially, that makes them less likley to know what they are doing, and more likely to make mistakes and screw up.
4CarlShulman10yEliezer's FAI team currently consists of 2 people: himself and Marcello Herreshoff. Whatever its probability of success, most would seem to come from actually recruiting enough high-powered folk for a team. Certainly he thinks so, thus his focus on Overcoming Bias and then the rationality book as a tool to recruit a credible team. Sure, ceteris paribus, although coding errors seem less likely than architectural screwups to result in catastrophic harm rather than the AI not working.

It's hard for me to imagine 100 good papers on the subject of AI safety (as opposed to say, FAI design). Once you have 10 good papers with variations of "AGI is dangerous, please be careful!", what can you say in the 11th one that you haven't already said? Also, 100 papers all carrying the same basic message, all funded by the same organization... that seems a bit surreal.

ETA: Sorry, I'm being overly skeptical and nitpicking. On reflection I think something like this probably is a good idea and should be pursued (unless money is a constraint and someone can come up with better use for it).

ETA2: If someone has done serious thinking about the feasibility of convincing a substantial fraction of AGI researchers about the need for safety, by "publishing X good quality papers", could they please explain their thoughts in more detail? (My mind keeps changing about whether this is feasible or not.)

It's hard for me to imagine 100 good papers on the subject of AI safety (as opposed to say, FAI design). Once you have 10 good papers with variations of "AGI is dangerous, please be careful!", what can you say in the 11th one that you haven't already said?

There's a lot to say at one layer remove - things like stability analyses of particular strategies for implementing goal systems, general safety measures such as fake network interfaces, friendliness analyses of hypothetical programs, and so on. A paper can impart the idea that safety is important, without being directly about safety. (In fact, there's some reason to suspect that articles one layer removed may be better than articles that are directly about safety).

5CarlShulman10yThis seems right. One additional thing to note, however, is that while it looks quite likely that good papers lead to improvements at the margin, high-publicity bad work can harm a developing field's prospects and reputation, and thus outsiders' desire to affiliate with it. Robin Hanson emphasizes this point a lot.
2khafra10yCarl, are you saying that the non-SIAI-affiliated qualified academics among us should attempt to get high-publicity, bad papers published advocating anything-goes GAI design, without regard for safety?

No, for many reasons, including the following:

  • Such things are very likely to backfire, and moreso than they seem; we live in a world of substantial transparency, and dirty laundry gets found
  • Being the kind of people who would do such things would have bad effects and sabotage friendly cooperation with the very AI folk whose cooperation is so important
  • There is already a lot of stuff along these lines
  • Folk actually in a position to do such things would better use their limited time, reputation, and commitment on other projects
3timtyler10yMy impression is that the bridges are mostly burned there. For years, the SIAI has been campaigning against other projects, in the hope of denying them mindshare and funding. We have Yudkowsky saying []: "And if Novamente should ever cross the finish line, we all die." and saying [] he will try to make various other AI projects "look merely stupid". I expect the SIAI looks to most others in the field like a secretive competing organisation, who likes to use negative marketing [] techniques. Implying that your rivals will destroy the world is an old marketing trick that takes us back to the Daisy Ad []. This is not necessarily the kind of organisation one would want to affiliate with.

What is the status of the academic papers from the 2010 Singularity Research Challenge?

In general, there seems to have been substantial planning fallacy on the ease of getting skilled people to make progress on them via the Visiting Fellows program and other means. Versions of many of them have eventually come into being (as discussed below) but with great delays. And it seems that delivery of the planned reporting infrastructure failed badly. With respect to the individual papers:

.Containing superintelligence led to this paper which was accepted for a subsequently-cancelled conference and is now seeking a venue, as well as (I believe) an accepted Singularity Hypothesis chapter by Daniel Dewey.

The WBE-AGI one has lagged, but is a submission to the JCS special issue Chalmers' Singularity paper (by myself and Anders Sandberg), with presentations of the content at FHI, San Diego State University, and the AGI-11 workshop on the future of AI.

Collective Action Problems and AI Risk led to another Singularity Hypothesis submission.

AI risk philanthropy was taken on by an external author who never delivered, and subsequently had to be transferred to a different person who hasn't finished it yet.

There is an incarnation of the Singularity FAQ, and lukeprog, along with Anna Sala... (read more)

3steven046110yNote that that one wasn't actually funded. The ECAP paper is online here [].
2CarlShulman10yRight, it didn't get earmarked donations, only two papers were specifically funded in the challenge grant. In general, mostly people weren't interested in funding specific projects, and the challenge primarily went to general funds.

The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously.

A human alone can't build a superintelligence. So, companies and other organisations are what we should mostly be concerned with. Targetting the engineering talent with the message is probably the wrong approach - you mostly want the managers and directors, since they are more likely to be the ones who willl dec... (read more)

Does anyone know of a historical example of a concerted effort to convince people in an academic discipline to pay attention to something, by funding a bunch of papers on or related to the topic?

If so, how well did it work?

1RichardKennaway10yI believe the tobacco companies tried this (and maybe they still do). How much difference it made I don't know.
6Benquo10yWere they trying to get people to pay attention to something that was neglected before? I thought they were just trying to sow confusion around the smoking-illness connection, which was already being studied.
2satt10yAs part of their efforts to kick up dirt around the smoking-illness link, they did fund some research to try building up fringe hypotheses (as opposed to knocking down mainstream hypotheses). They gave Hans Eysenck money [] to research the link between personality traits and cancer (with smoking as a possible mediator).

As a result of this calculation, I will be thinking and writing about AI safety, attempting to convince others of its importance, and, in the moderately probable event that I become very rich, donating money to the SIAI so that they can pay others to do the same.

Surely the most existential-risk-reduction-per-buck at this point is not "thinking and writing about AI safety", but thinking up more strategies like it in order to possibly find even better ones? Shouldn't SIAI (or perhaps FHI, depending on the comparative advantage between them) fund... (read more)

1timtyler10yThe number one point of comparison for safety regulations is the cryptography export regulations. I am pretty sceptical about something similar being attempted for machine intelligence. It is possible to imagine the export of smart robots to "bad" countries being banned - for fear that they will reverse-engineer their secrets - but not easy to imagine that anyone will bother. Machine intelligence will ultimately be more useful than cryptography was. It seems pretty difficult to imagine an effective ban. So far, I haven't seen any serious proposals to do that. Governments seem likely to continue promoting this kind of thing, not banning it.


  • If the first researcher with the key insight into general AI is really "safety conscious" we don't automatically get friendly AI first. That's a 10x reduction in marginal value from the original model.
  • Being "safety conscious" correctly is really hard and most of the 30% won't be safety conscious in the way we want, even though they "know" they should. That's another 30x reduction in marginal value from the original model.

One big penalty that was discussed is the likelihood of another researcher having the key insi... (read more)

0[anonymous]10yI don't understand this sentence. Please explain. What negative consequences?
0[anonymous]10yEdited. I would guess that "being safety conscious" isn't enough to guarantee good effects, and we only get some fraction of the benefit that an Ideal Safety Conscious Researcher would give. The negative consequences I was thinking of are in retrospect based on a silly error. Thanks for pointing those out!

After reading through the post and all the comments I think the most important moral is that a simple quantitative model thought up by very smart people in a context emphasizing rationality and examined and found lacking in significant sources of error (to the point that one of these smart people is willing to post it to Less Wrong main) can still ultimately be off by many orders of magnitude.

(Not to say that drafting a simple quantitative model isn't a great starting point, but instead that when interpreting such models one should assume that the margin of error is really really big, especially when pondering implications of the model, especially especially when pondering implications for decision policies.)

5timtyler10yIt is challenging to know what will help: * Maybe pointing at machine intelligence and shouting "DANGER!" and "WEAPON!" will just attract the attention of the military. * Maybe getting the safety-conscious teams to slow down will mean a greater chance of the unscrupulous teams getting there first []. This is one of my concerns about the SIAI. They seem to be enthusiastic about caution - but excessive caution in this area seems likely to increase the chances of an undesirable outcome - via the mechanism in the link - so they may be having a particularly negative impact.

The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously.

The "key insight" model seems deeply flawed. We know that the technical side of the problem involves performing inductive inference - which is a close cousin of stream compression. So, progress is very likely to look like progress with stream compression. Some low-hanging fruit - and then gradually diminishing returns. Rather like digging a big hole in the ground.

2timtyler10yHere's Bob Mottram making much the same point [] as I just made:
0GuySrinivasan10yHow confident should we be that general AI involves solely hard work on existing problems like performing inductive inference? I agree that if there are no more Key Insights, and instead just a bunch of insights that some researcher will eventually have, then most of the gains from the proposal can't be realized. Next steps: somehow estimate the probability that there are 0, 1, or several Key Insights remaining before general AI is "just" a matter of tons of hard research/experimentation, and estimate the gains from the 100-paper-strategy for the scenarios in which there are 0 or several Key Insights remaining.
0timtyler10yI didn't really claim that. There's also the whole issue of what utility function to use - and some other things as well - tree pruning strategies, for instance. Just that inductive inference is the key technology for the technical side of the problem - the part not to do with values. Much has been written about the link between induction and intelligence: Hutter []. Mahoney []. Me [].

"Estimate a 10% current AI risk"... wait, where did that come from? You say "Let A be the probability that an AI will be created", but actually your A is the probability that an AI will be created which then goes on to wipe out humanity unless precautions are taken, but which will also fail to wipe out humanity if the proper precautions are taken.

Your estimate for that is a whopping 10%? Without any sort of substantiating argument??
... Let's say I claim 0.000001% is a much more reasonable figure for this: what would be your rationale s... (read more)

Marginal taking-of-safety-seriously, as Eliezer points out, doesn't look good enough: you just delay the inevitable a little bit, if even that. On the other hand, establishing a widely-accepted consensus that AGI is as dangerous as A-bombs that blow up the whole universe might influence the field in more systematic ways (although it's unclear how, and achieving this goal doesn't look plausible).

3Stuart_Armstrong10yIf AGI is a long way away, then seeding a safety message to current and future grad students could influence the directions they take, and turn the field in the direction of higher safety. If AGI comes soon, then influencing people is much less useful, I agree.

Is there a body of knowledge about controlling self-modifying programs which could be used as a stepping stone to explaining what would be involved in FAI?

2timtyler10yPeople like me wrote self-modifying machine code programs back in the 1980s - but self-modification quickly went out of fashion. For one thing, you couldn't run from read-only storage. For another, it made your code difficult to maintain and debug. Self-modifying code never really came back into fashion. We do have programs writing other programs, though: refactoring, compilers, code-generating wizards and genetic programming.
1NancyLebovitz10yUntil people figure out how to create reliable self-modifying programs that have modest goals, I'm not going to worry about self-improving AI of any sort being likely any time soon. Perhaps the rational question is: How far are we from useful self-modifying programs?
0timtyler10ySelf-modifying programs seems like a bit of a red herring. Most likely groups of synthetic agents will become capable of improving the design of machine minds before individual machines can do that. So, you would then have a self-improving ecosystem of synthetic intelligent agents. This probably helps with the wirehead problem [], and with any Godel-like problems associated with a machine trying to understand its entire mind. Today, companies that work on editing/refactoring/lint etc tools are already using their own software to build the next generation of programming tools. There are still humans in the loop - but the march of automation is working on that gradually.
0Perplexed10yI agree that a multi-agent systems perspective is the most fruitful way of looking at the problem. And I agree that coalitions are far less susceptible to the pathologies that can arise with mono-maniacal goal systems. A coalition of agents is rational in a different, softer way than is a single unified agent. For example, it might split its charitable contributions among charities. Does that weaker kind of rationality mean that coalitions should be denigrated? I think not. To answer Nancy's question, there is a huge and growing body of knowledge about controlling multi-agent systems []. Unfortunately, so far as I know, little of it deals with the scenario in which the agents are busily constructing more agents.
0timtyler10yThat does happen quite a bit in genetic and memetic algorithms - and artificial life systems.
0timtyler10yI checked with the Gates Foundation []. 7549 grants and counting! It seems as though relatively united agents can split their charitable contributions too.
2GuySrinivasan10yA note, though... if I had a billion dollars and decided just to give it to whoever GiveWell [] recommended as their top-rated international charities, due to most charities' difficulty in converting significant extra funds into the same level of effect, I would end up giving 1+10+50+1+0.3+5=67.3 million to 6 different charities and then become confused at what to do with my 932.7 million dollars. I know the Gates Foundation does look like a coalition of agents rather than a single agent, but it doesn't look like a coalition of 7549+ agents. I'd guess at most about a dozen and probably fewer Large Components.
0timtyler10yTheir fact sheet [] says 24 billion dollars.
0NancyLebovitz10yIs maintaining sufficient individuality likely to be a problem for the synthetic agents?
0timtyler10yOnly if they are built to want individuality. We will probably start of with collective systems - because if you have one agent, it is easy to make another one the same, whereas it is not easy to make an agent with a brain twice as big (unless you are trivially adding memory or something). So: collective systems are easier to get off the ground with - they are the ones we are likely to build first. You can see this in most data centres - they typically contain thousands of small machines, loosely linked together. Maybe they will ultimately find ways to plug their brains into each other and more comprehensively merge together - but that seems a bit further down the line.
0NancyLebovitz10yI was concerned that synthetic agents might become so similar to each other that the advantages of different points of view would get lost. You brought up the possibility that they might start out very similar to each other.
0timtyler10yIf they started out similar, such agents could still come to differ culturally. So, one might be a hardware expert, another might be a programmer, and another might be a tester, as a result of exposure to different environments. However, today we build computers of various sizes, optimised for various different applications - so probably more like that.
0NancyLebovitz10yThere's a limit to how similar people can be made to each other, but if there are efforts to optimize all the testers (for example), it could be a problem.
0timtyler10yWell, I doubt machines being too similar to each other will cause too many problems. The main case where that does cause problems is with resistance to pathogens - and let's hope we do a good job of designing most of those out of existence. Apart from that, being similar is usually a major plus point. It facilitates mass production, streamlined and simplified support, etc.
0asr10yYes. As tim points out below, the main thing that programmers are taught is "self-modifying programs are almost always more trouble than they're worth -- don't do it." My hunch is that self-modifying AI is far more likely to crash than it is to go FOOM, and that non-self-modifying AI (or AI that self-modifies in very limited ways) may do fairly well by comparison.
0Zetetic10yMy understanding was that the CEV approach is a meta-level approach to stable self improvement, aiming to design code that outputs what we would want an FAI's code to look like (or something like this). I could certainly be wrong of course, and I have very little to go on here, as the Knowability of FAI and CEV are both more vague than I would like (since, of course, the problems are still way open) and several years old, so I have to piece the picture together indirectly. If that interpretation is correct it seems (and I stress that I might be totally off base with this) that stable recursive self-improvement over time is not the biggest conceptual concern, but rather the biggest conceptual difficulty is determining how to derive a coherent goal set from a bunch of Bayesian utility maximizers equipped with each individual person's utility function (and how to extract each person's utility function), or something like that. A stable self-improving code would then (hopefully) be extrapolated by the resulting CEV, which is actually the initial dynamic.
0asr10yMy comment wasn't directed towards CEV at all -- CEV sounds like a sensible working definition of "friendly enough", and I agree that it's probably computationally hard. I was suggesting that any program, AI or no, that is coded to rewrite critical parts of itself in substantial ways is likely to go "splat", not "FOOM" -- to degenerate into something that doesn't work at all.
0[anonymous]10yThis sounds like decision theory stuff that Eliezer and others are trying to figure out.

...if there were a 100 good papers in about it in the right journals;

Just one paper (AI safety or FAI design)...I will be very impressed. I will donate a minimum of $10 ($20 for a technical paper on FAI design) per peer-reviewed research paper per journal to the SIAI.

I doubt I'll have to donate even once within the next 50 years. But I would be happy to be proven wrong.

8CarlShulman10yThere are some of those in the works, but note that the Future of Humanity Institute converts funds into research papers on these topics as well (Nick Bostrom is working on an academic book now which pretty comprehensively summarizes the work of folk around SIAI). FHI accepts donations [], and estimates a cost of about $200k (USD, although currency swings may have changed this number) per 2 year postdoc, including travel, share of overhead and administrative costs, conferences, journal fees, etc. As part of Oxford, they have comparative advantage in hiring academics and lending prestige to the work. You can look at their research record on their website and assess things that way.
8steven046110yConverts funds, or converts marginal funds? I've been meaning to start the SIAI vs FHI conversation here in its own thread for some time, if people don't think it falls afoul of Common Interest of Many Causes [].
7CarlShulman10yMarginal funds. FHI is funding-limited in its number of positions there. The marginal hires do not average Bostrom-level productivity (it's hard to get academics to pursue a research agenda other than one they were already working on), but you can look at the last several hires and average across them.
5steven046110yI don't know who counts as the last several hires, but while I'm sure everyone at FHI does fine work, only Bostrom and Sandberg seem [] to [] be doing research related to AI risks. Also Hanson, I suppose, to the extent that he counts as working at FHI. I don't dispute that some marginal funds would on expectation go to research on these topics, but surely it would be a lot less than half.

Much of the dispersion is caused by the lack of unrestricted funds (and lack of future funding guarantees). Since we don't have enough funding from private philanthropists, we have to chase academic funding pots, and that then forces us to do some work that is less relevant to the important problems we would rather be working on. It would be unfortunate if potential private funders then looked at the fact that we've done some less-relevant work as a reason not to give.

8steven046110yThank you for weighing in! Your point sounds valid. After taking it into account, if you considered marginal dollars donated to FHI without explicit earmarking, what is your estimate for the fraction of such dollars that end up causing a dollar's worth of research into topics that would be seen as highly relevant by someone with roughly SIAI-typical estimates for the future?

A high fraction. "A dollar's worth of research" is not a well-defined quantity - that is, the worth of the research produced by a dollar varies a lot depending on whom the dollar is given to. I like to think FHI is good at converting dollars into research. The kind of research I'd prefer to do with unrestricted funds at the moment probably coincides pretty well with what a person with SIAI-typical estimates would prefer, though what can be researched also depends on the capabilities and interests of the research staff one can recruit. (There are various tradeoffs here - e.g. a weaker researcher who has a long record of working in this area or taking a chance with a slighly stronger researcher and risk that she will do irrelevant work? headhunting somebody who is already actively contributing to the area or attempt to involve a new mind who would otherwise not have contributed? etc.)

There are also indirect effects, which might lead to the fraction being larger than one - for example, if discussions, conferences, and various kinds of influence encourage external researchers to enter the field. FHI does some of that, as does the SIAI.

4steven046110yThanks. When I said "a dollar's worth of research", I had in mind the estimate Carl mentioned of $200k per 2-year postdoc. I guess that doesn't affect the fraction question.
7CarlShulman10yThe details depend on how you count the methodology/general existential risks stuff, e.g. the "probing the improbable" paper by Ord, Sandberg, and Hillerbrand. Also note that many of Bostrom's and Sandberg's publications, including the catastrophic risks book, and events like the Winter Intelligence Conference benefit from help by other FHI staff. Still, some hires have definitely done essentially no existential risk-relevant work. My guess is something like 1 Sandberg or Ord equivalent per 2-3 hires (with differential attrition leading to accumulation of the good). Also, given earmarked funding they can create positions specifically for machine intelligence issues, the results of which are easier to track (the output of that person).
1steven046110yBut presumably that would only be a consideration if FHI received very large amounts of such earmarked funding?
7CarlShulman10y$200k USD for one postdoc. One could save up for that with a donor-advised fund alone or with others, or use something like
7jimrandomh10yComments like this are evidence that focus on getting papers into journals is important, relative to the amount of effort currently going into it.
3steven046110yAnd every time someone doesn't make a comment like this, it's evidence that such a focus is unimportant, so what makes you think it comes out one way rather than the other on net?
4handoflixue10yLessWrong seems significantly more likely than normal to produce vocal dissent ("I wouldn't find this useful") rather than silence. That said, LessWrong is probably also not the majority of AI researchers, who are the actual target audience, so using ourselves as a "test market" is probably flawed on a few levels...
0timtyler10yDoes this one [] count? It has had some peer review - and should be in the AGI-11 Conference Proceedings.

The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously.

I contest this use of the term "safety". If your goal is for humanity to survive, say that your goal is for humanity to survive. Not to "promote safety".

"Safety" means avoiding certain bad outcomes. By using the word "safety", you're trying to sneak past us the assumption... (read more)

2Will_Sawin10yWhat is value? What things are valuable, and what are not? Everything that we know about value, everything that we can know, is encoded within the current state of humanity. As long as that knowledge remains, there is hope for the Best Possible Future. It may be a future that includes no humans, but it will be a future based on that knowledge. If that knowledge is destroyed, or it loses power since it is no longer riding inside the dominant life form, then the future will be, morally, as chaos - as likely to eat babies as to love them. To figure out how we can contribute to the future, what should replace us, and so on, takes time. Time we do not have if we do not focus on safety first.
0nshepperd10yWell, our distant descendants, whether uploads or cyborgs or other life-forms, could be considered part of "generalized humanity", as long as they retain what humans have that is valuable. And regardless, we certainly want current humanity (that is, all the people alive now) to survive, in the sense of not being killed by the AI. My point being, it's not necessarily right to take "the survival of humanity" to mean that we have to retain this physical form, and I don't think the OP was using the words in that sense.
-2timtyler10yAgreed. People seem to get hold of the idea that humans are good, and machines are bad, and then get into an us vs them mindset. Surely all the best possible futures involve an engineered world, where the agony of being a meat brained human who was cobbled together by natural selection is mostly a distant memory.
0Will_Sawin10yBut we have to keep the humans around until humans are capable of engineering that world carefully and without screwing it up. If we don't engineer it, who will?
0timtyler10yRight. There are pretty good instrumental reasons for all the parties concerned to do that. Humans may also be useful for a while for rebooting the system - if there is a major setback. They have successfully booted things up once already. Other backup systems are likely to be less well tested.

humanity survives if and only if that researcher is careful and takes safety seriously

Here's where I'd stick in the 10^-3 penalty. It's reasonable to assume that taking safety seriously will keep you safe from accidental leaks of toxic chemicals, deadly viruses, etc. because these are well-understood phenomena that pose a single, predictable risk. If you can keep the muriatic acid off your skin, it won't burn you. If you can keep the swine flu out of your lungs, it won't infect you.

A truly general AI, though, almost by definition, would be able to thin... (read more)

0timtyler10yIt isn't likely to be you vs the superintelligence, though. People keep imagining that - and then wringing their hands. The restraints on intelligent agents while they are being developed and tested are likely to consist of a prison built by the last generation of intelligent agents, featuring them as guards.

You focus on visibly HAL-like or Skynet-like AI - the sort of thing that AI researchers produce as demos. However, we have large, smart, durable, existing entities (businesses and other computer+human teams) that are continuously getting smarter (and entrenching themselves deeper into our society) by automating their existing business practices.

I don't advocate trying to stop business automation, or humans organizing themselves into better and better teams; I think that would be throwing the baby out with the bathwater. However, I do think "business ... (read more)

4timtyler10yAutomation leads to a world where humans vote for government welfare for themselves. Governments then seem likely to compete with each other to attract corporations with low tax regimes, and get rid of their human burdens. This scenario is similar to the early parts of Manna []. It leads to a world where humans are functionally redundant - though they may persist as a kind of parasitic organic layer on top of the machine world. Meanwhile, many humans seem likely to be memetically hijacked [], potentially leading to fertility and population declines. That may be a slow process, though. Well, only around here. Other folk are looking at the effects of automation. Here's my overview: []