Raising safety-consciousness among AGI researchers

by lukeprog1 min read2nd Jun 201232 comments

21

Personal Blog

Series: How to Purchase AI Risk Reduction

Another method for purchasing AI risk reduction is to raise the safety-consciousness of researchers doing work related to AGI.

The Singularity Institute is conducting a study of scientists who decided to either (1) stop researching some topic after realizing it might be dangerous, or who (2) forked their career into advocacy, activism, ethics, etc. because they became concerned about the potential negative consequences of their work. From this historical inquiry we hope to learn some things about what causes scientists to become so concerned about the consequences of their work that they take action. Some of the examples we've found so far: Michael Michaud (resigned from SETI in part due to worries about the safety of trying to contact ET), Joseph Rotblat (resigned from the Manhattan Project before the end of the war due to concerns about the destructive impact of nuclear weapons), and Paul Berg (became part of a self-imposed moratorium on recombinant DNA back when it was still unknown how dangerous this new technology could be).

What else can be done?

Naturally, these efforts should be directed toward researchers who are both highly competent and whose work is very relevant to development toward AGI: researchers like Josh Tenenbaum, Shane Legg, and Henry Markram.

32 comments, sorted by Highlighting new comments since Today at 1:29 AM
New Comment

For example, Moshe Looks (head of Google's AGI team) is now quite safety conscious, and a Singularity Institute supporter.

Has he done anything to make the work of Google's AGI team less dangerous?

Update: please see here.

Given that I think "Google develops powerful AI" is much more likely than "SIAI develops powerful AI," I think this effort is a very good idea.

...Moshe Looks (head of Google's AGI team) is now quite safety conscious, and a Singularity Institute supporter.

A supporter? Interesting. In January he told me that he is merely aware of SIAI.

ETA He's the head of Google's AGI team? Did he say that?

Update: please see here.

Ohh, that's easily the one on which you guys can do most harm by associating the safety concern with crankery, as long as you look like cranks but do not realize it.

Speaking of which, use of complicated things you poorly understand is a sure fire way to make it clear you don't understand what you are talking about. It is awesome for impressing people who understand those things even more poorly or are very unconfident in their understanding, but for competent experts it won't work.

Simple example [of how not to promote beliefs]: idea that Kolmogorov complexity or Solomonoff probability favours many worlds interpretation because it is 'more compact' [without having any 'observer']. Why wrong: if you are seeking lowest complexity description of your input, your theory needs to also locate yourself within what ever stuff it generates somehow (hence appropriate discount for something really huge like MWI). Why stupid: because if you don't require that, then the iterator through all possible physical theories is the lowest complexity 'explanation' and we're back to square 1. How it affects other people's opinion of your relevance: very negatively for me. edit: To clarify, the argument is bad, and I'm not even getting into details such as non-computability, our inability to represent theories in the most compact manner (so we are likely to pick not the most probable theory but the one we can compactify easier), machine/language dependence etc etc etc.

edit: Another issue: there was the mistake in phases in the interferometer. A minor mistake, maybe (or maybe the i was confused with phase of 180, in which case it is a major misunderstanding). But the one that people whom refrain of talking of the topics they don't understand, are exceedingly unlikely to make (its precisely the thing you double check). Not being sloppy with MWI and Kolmogorov complexity etc is easy: you just need to study what others have concluded. Not being sloppy with AI is a lot harder. Being less biased won't in itself make you significantly less sloppy.

if you are seeking lowest complexity description of your input, your theory needs to also locate yourself within what ever stuff it generates somehow (hence appropriate discount for something really huge like MWI)

It seems to me that such a discount exists in all interpretations (at least those that don't successfully predict measurement outcomes beyond predicting their QM probability distributions). In Copenhagen, locating yourself corresponds to specifying random outcomes for all collapse events. In hidden variables theories, locating yourself corresponds to picking arbitrary boundary conditions for the hidden variables. Since MWI doesn't need to specify the mechanism for the collapse or hidden variables, it's still strictly simpler.

Well, the goal is to predict your personal observations, in MWI you have huge wavefunction on which you need to somehow select the subjective you. The predictor will need code for this, whenever you call it mechanism or not. Furthermore, you need to actually derive Born probabilities from some first principles somehow if you want to make a case for MWI. Deriving those, that's what would be interesting, actually making it more compact (if the stuff you're adding as extra 'first principles' is smaller than collapse). Also, btw, CI doesn't have any actual mechanism for collapse, it's strictly a very un-physical trick.

Much more interestingly, Solomonoff probability hints that one should try really to search for something that would predict beyond probability distributions. I.e. search for objective collapse of some kind. Other issue: QM actually has problem at macroscopic scale, it doesn't add up to general relativity (without nasty hacks), so we are matter of factly missing something, and this whole issue is really silly argument over nothing as what we have is just a calculation rule that happens to work but we know is wrong somewhere anyway. I think that's the majority opinion on the issue. Postulating a zillion worlds based on known broken model would be tad silly. I think basically most physicists believe neither in collapse as in CI (beyond believing its a trick that works) nor believe in many worlds, because forming either belief would be wrong.

Much more interestingly, Solomonoff probability hints that one should try really to search for something that would predict beyond probability distributions. I.e. search for objective collapse of some kind.

We face logical uncertainty here. We do not know if there is a theory of objective collapse that more compactly describes our current universe then MWI or random collapse does. I am inclined to believe that the answer is "no". This issue seems very subtle, and differences on it do not seem clear enough to damn an entire organization.

because forming either belief would be wrong.

this is not really a Bayesian standard of evidence. Do you also believe that, in a Bayesian sense, it is wrong to believe those theories.

I don't really know Solomonoff induction or MWI on a formal level, but... If I know that the universe seems to obey rule X everywhere, and I know what my local environment is like and how applying rule X to that local environment would affect it, isn't that enough? Why would I need to include in my model a copy of the entire wavefunction that made up the universe, if having a model of my local environment is enough to predict how my local environment behaves? In other words, I don't need to spend a lot of effort selecting the subjective me, because my model is small enough to mostly only include the subjective me in the first place.

(I acknowledge that I don't know these topics well, and might just be talking nonsense.)

I don't really know Solomonoff induction or MWI on a formal level

You know more about it than most of the people talking of it: you know you don't know it. They don't. That is the chief difference. (I also don't know it all that well, but at least I can look at the argument that it favours something, and see if it favours the iterator over all possible worlds even more)

If I know that the universe seems to obey rule X everywhere, and I know what my local environment is like and how applying rule X to that local environment would affect it, isn't that enough?

Formally, there's no distinction between rules you know and the environment. You are to construct shortest self containing piece of code that will be predicting the experiment. You will have to include any local environment data as well.

If you follow this approach to the logical end, you get Copenhagen Interpretation, shut up and calculate form: you don't need to predict all the outcomes that you'll never see. So you are on the right track.

it doesn't take any extra code to predict all the outcomes that you'll never see. Just extra space/time. But those are not the minimized quantity. In fact, predicting all the outcomes that you'll never see is exactly the sort of wasteful space/time usage that programmers engage in when they want to minimize code length - it's hard to write code telling your processor to abandon certain threads of computation when they are no longer relevant.

you missed the point. you need code for picking some outcome that you see out of outcomes that you didn't see, if you calculated those. It does take extra code to predict the outcome you did see if you actually calculated extra outcomes you didn't see, and then it's hard to tell what would require less code, one piece of code is not subset of the other and difference likely depends to encoding of programs.

The problem of locating "the subjective you" seems to me to have two parts: first, to locate a world, and second, to locate an observer in that world. For the first part, see the grandparent; the second part seems to me to be the same across interpretations.

The point is, code of a theory has to produce output matching your personal subjective input. The objective view doesn't suffice (and if you drop that requirement, you are back to square 1 because you can iterate all physical theories). The CI has that as part of theory, MWI doesn't, you need extra code.

The complexity argument for MWI that was presented doesn't favour MWI, it favours iteration over all possible physical theories, because that key requirement was omitted.

And my original point is not that MWI is false, or that MWI has higher complexity, or equal complexity. My point is that argument is flawed. I don't care about MWI being false or true, I am using argument for MWI as an example of sloppiness SI should try not to have (hopefully without this kind of sloppiness they will also be far less sure that AIs are so dangerous).

Most of this seems unrelated to what the OP says. Are you sure you posted this in the right place?

In my opinion, instead of trying to spread awareness broadly, SI should focus on persuading/learning from just a few AI researchers who are most sympathetic to its current position. Those researchers will be able to inform their position and tell them how to persuade the rest most effectively.

Why wrong: if you are seeking lowest complexity description of your input, your theory needs to also locate yourself within what ever stuff it generates somehow (hence appropriate discount for something really huge like MWI). Why stupid: because if you don't require that, then the iterator through all possible physical theories is the lowest complexity 'explanation' and we're back to square 1.

Are you saying that MWI proponents need to explain why observers like themselves are more likely across the entire wavefunction than other MWI-possible observers? That's an interesting perspective, and a question I'd like to see addressed by smart people.

That too.

My main point though is that you can't dispose of the code for generating the subjective view complete with some code for collapsing the observer (and subsequently collapsing the stuff entangled with observer). The 'objective' viewpoint doesn't suffice. It does not suffice to output something out of which intelligent observer will figure out the rest. With Solomonoff induction you are to predict your input. Not some 'objective' something. And if you drop that requirement, the whole thing falls apart. It is unclear whenever the shortest subjective experience generating code on top of MWI will be simpler than what you have in CI, or even distinct.

I agree that MWI doesn't help much in explaining our sensory strings in a Solomonoff Induction framework, relative to "compute the wave function, sample experiences according to some anthropic rule and weighted by squared amplitude." This argument is known somewhat widely around here, e.g. see this Less Wrong post by Paul Christiano, under "Born probabilities," and discussions of MWI and anthropic reasoning going back to the 190s (on the everything-list, in Nick Bostrom's dissertation, etc).

MWI would help in Solomonoff induction if there was some way of deriving the Born probabilities directly from the theory. Thus Eliezer's praise of Robin Hanson's mangled worlds idea. But at the moment there is no well-supported account of that type, as Eliezer admitted.

It's also worth distinguishing between complexity of physical laws, and anthropic penalties. Accounts of the complexity/prior of anthropic theories and measures to use in cosmology are more contested than simplicity of physical law. The Solomonoff prior implies some contested views about measure.

I don't know how many LessWrongers knew what AGI meant. (Apparently it's artificial general intelligence, aka Strong AI).

I don't know how many LessWrongers knew what AGI meant.

Greater than 90%.

Just looking at Wikipedia, and artificial general intelligence redirects to Strong AI).

I'm concerned that there's no mention of dangers, risks, or caution in the Wikipedia article.* Is there any "notable" information on the topic that could be added to the article? E.g. discussion of the subject in a publication of some kind (book or magazine/newspaper - not a self-published book).

*haven't read the whole thing - just did a search.

http://singinst.org/upload/artificial-intelligence-risk.pdf appears to have been published in Global Catastrophic Risks, by Oxford University Press.

I have made the addition you suggested, this is a good time for suggestions or improvements...

[This comment is no longer endorsed by its author]Reply

Propaganda - to encourage competitors to slow down.

However, is there a good reason to think that such propaganda would be effective?

IMO, a more obvious approach would be to go directly for public opinion.

Would negative public opinion do much more than (a) force such research underground, or (b) lead to researchers being more circumspect?

(Not a rhetorical question - just unsure whether focusing on public opinion is a useful approach.)

Organisations seem more likely to take advice from their customers than their competitors.

Terminator, Matrix (and soon Robopocalypse) have already had a good go at public opinion on AI, though.