tl;dr An unconstrained search through possible future worlds is a dangerous way of choosing positive outcomes. Constrained, imperfect or under-optimised searches work better.

Some suggested methods for designing AI goals, or controlling AIs, involve unconstrained searches through possible future worlds. This post argues that this is a very dangerous thing to do, because of the risk of being tricked by "siren worlds" or "marketing worlds". The thought experiment starts with an AI designing a siren world to fool us, but that AI is not crucial to the argument: it's simply an intuition pump to show that siren worlds can exist. Once they exist, there is a non-zero chance of us being seduced by them during a unconstrained search, whatever the search criteria are. This is a feature of optimisation: satisficing and similar approaches don't have the same problems.

 

The AI builds the siren worlds

Imagine that you have a superintelligent AI that's not just badly programmed, or lethally indifferent, but actually evil. Of course, it has successfully concealed this fact, as "don't let humans think I'm evil" is a convergent instrumental goal for all AIs.

We've successfully constrained this evil AI in a Oracle-like fashion. We ask the AI to design future worlds and present them to human inspection, along with an implementation pathway to create those worlds. Then if we approve of those future worlds, the implementation pathway will cause them to exist (assume perfect deterministic implementation for the moment). The constraints we've programmed means that the AI will do all these steps honestly. Its opportunity to do evil is limited exclusively to its choice of worlds to present to us.

The AI will attempt to design a siren world: a world that seems irresistibly attractive while concealing hideous negative features. If the human mind is hackable in the crude sense - maybe through a series of coloured flashes - then the AI would design the siren world to be subtly full of these hacks. It might be that there is some standard of "irresistibly attractive" that is actually irresistibly attractive: the siren world would be full of genuine sirens.

Even without those types of approaches, there's so much manipulation the AI could indulge in. I could imagine myself (and many people on Less Wrong) falling for the following approach:

First, the siren world looks complicated, wrong and scary - but with just a hint that there's something more to it. Something intriguing, something half-glimpsed, something making me want to dig deeper. And as I follow up this something, I see more patterns, and seem to gain a greater understanding. Not just of the world I'm looking at, but of the meaning of good itself. The world seems to confirm to me some of my ideas about what constitutes a worthwhile life - not just the ideas I've been able to articulate, but the ones I've only got a vague half-baked notion of, and the ones I didn't even know I had.

The longer I stare into this world, the greater an understanding I get of my own values. And this is just the starting point: the world subtly opens up new avenues of philosophical musings in my brain, the images I see triggering me to come up with my own insights and thought experiments as to the meaning of goodness - insights that are then carefully confirmed as I did deeper. I could stay staring at this wonderful, beautiful and complex world for hours, days, years, gaining a deeper wisdom and understanding as I go, truly appreciating how the mysteries of this place unravel into new versions of worth and goodness. Every time I ever have a doubt about it, that doubt is systemically resolved into a new glorious confirmation of how much the AI really gets what I value, even before I knew how much I valued it.

Meanwhile, at some level of complexity safely beyond what my human mind will ever reach, the AI is hiding all the evil and unmitigated suffering.

Your siren world may differ from mine - you may be vulnerable to different tricks. Since people often believe they would be immune to such manipulation, feel free to imagine that the AI is fooling someone else - we know how stupid other people can be, right? - while you watch as a helpless bystander.

The problem is that a future universe is likely to be vast and complicated. When inspecting it, we have certain inspection criteria (IC). These consist of our values and preferences, but also the ways we would look into this universe, how we would follow up on initial impressions, various measures and yardsticks we might use to summarise the data we see, all the clever and devious plans we might come up with to ferret out "what's really going on". These IC are going to be considerably simpler than the totality of the future universe. So the AI's task is to optimise a universe design that passes the IC, while shoving in as much disutility as it can - which in a large universe, is a tremendous amount. Unless our IC are perfect and already include a good solution to the problem of value (in which case we've solved the friendliness problem already), a superintelligent AI will likely succeed at its task.

 

Siren and marketing worlds without builders

The above thought experiment needed a superintelligent evil AI for the design of the siren world. But if we admit that that is possible, we don't actually need the AI any more. The siren worlds exist: there are potential worlds of extreme disutility that satisfie our IC. If we simply did an unconstrained search across all possible future worlds (something like the search in Paul Christiano's indirect normativity - an idea that inspired the siren world concept), then we would at some point find siren worlds. And if we took the time to inspect them, we'd get sucked in by them.

How bad is this problem in general? A full search will not only find the siren worlds, but also a lot of very-seductive-but-also-very-nice worlds - genuine eutopias. We may feel that it's easier to be happy than to pretend to be happy (while being completely miserable and tortured and suffering). Following that argument, we may feel that there will be far more eutopias than siren worlds - after all, the siren worlds have to have bad stuff plus a vast infrastructure to conceal that bad stuff, which should at least have a complexity cost if nothing else. So if we chose the world that best passed our IC - or chose randomly among the top contenders - we might be more likely to hit a genuine eutopia than a siren world.

Unfortunately, there are other dangers than siren worlds. We are now optimising not for quality of the world, but for ability to seduce or manipulate the IC. There's no hidden evil in this world, just a "pulling out all the stops to seduce the inspector, through any means necessary" optimisation pressure. Call a world that ranks high in this scale a "marketing world". Genuine eutopias are unlikely to be marketing worlds, because they are optimised for being good rather than seeming good. A marketing world would be utterly optimised to trick, hack, seduce, manipulate and fool our IC, and may well be a terrible world in all other respects. It's the old "to demonstrate maximal happiness, it's much more reliable to wire people's mouths to smile rather than make them happy" problem all over again: the very best way of seeming good may completely preclude actually being good. In a genuine eutopia, people won't go around all the time saying "Btw, I am genuinely happy!" in case there is a hypothetical observer looking in. If every one of your actions constantly proclaims that you are happy, chances are happiness is not your genuine state. EDIT: see also my comment:

We are both superintelligences. You have a bunch of independently happy people that you do not aggressively compel. I have a group of zombies - human-like puppets that I can make do anything, appear to feel anything (though this is done sufficiently well that outside human observers can't tell I'm actually in control). An outside human observer wants to check that our worlds rank high on scale X - a scale we both know about.

Which of us do you think is going to be better able to maximise our X score?

This can also be seen as a epistemic version of Goodhart's law: "When a measure becomes a target, it ceases to be a good measure." Here the IC are the measure, and the marketing worlds are targeting them, and hence they cease to be a good measure. But recall that the IC include the totality of approaches we use to rank these worlds, so there's no way around this problem. If instead of inspecting the worlds, we simply rely on some sort of summary function, then the search will be optimised to find anything that can fool/pass that summary function. If we use the summary as a first filter, then apply some more profound automated checking, then briefly inspect the outcome so we're sure it didn't go stupid - then the search will optimised for "pass the summary, pass automated checking, seduce the inspector".

Different IC therefore will produce different rankings of worlds, but the top worlds in any of the ranking will be marketing worlds (and possibly siren worlds).

 

Constrained search and satisficing our preferences

The issue is a problem of (over) optimisation. The IC correspond roughly with what we want to value, but differs from it in subtle ways, enough that optimising for one could be disastrous for the other. If we didn't optimise, this wouldn't be a problem. Suppose we defined an acceptable world as one that we would judge "yeah, that's pretty cool" or even "yeah, that's really great". Then assume we selected randomly among the acceptable worlds. This would probably result in a world of positive value: siren worlds and marketing worlds are rare, because they fulfil very specific criteria. They triumph because they score so high on the IC scale, but they are outnumbered by the many more worlds that are simply acceptable.

This is in effect satisficing over the IC, rather than optimising over them. Satisficing has its own issues, however, so other approaches could be valuable as well. One way could be use constrained search. If for instance we took a thousand random worlds and IC-optimised over them, we're very unlikely to encounter a siren or marketing world. We're also very unlikely to encounter a world of any quality, though; we'd probably need to IC-optimise over at least a trillion worlds to find good ones. There is a tension in the number: as the number of worlds searched increases, their quality increases, but so does the odds of encountering a marketing or siren world. EDIT: Lumifer suggested using a first-past-the-post system: search through worlds, and pick the first acceptable one we find. This is better than the approach I outlined in this paragraph.

We could also restrict the search by considering "realistic" worlds. Suppose we had to take 25 different yes-no decisions that could affect the future of the humanity. This might be something like "choosing which of these 25 very different AIs to turn on and let loose together" or something more prosaic (which stocks to buy, which charities to support). This results in 225 different future worlds to search through: barely more than 33 million. Because there are so few worlds, they are unlikely to contain a marketing world (given the absolutely crucial proviso that none of the AIs is an IC-optimiser!). But these worlds are not drawn randomly from the space of future worlds, but are dependent on key decisions that we believe are important and relevant. Therefore they are very likely to contain an acceptable world - or at least far more likely than a random set of 33 million worlds would be. By constraining the choices in this way, we have in effect satisficed without satisficing, which is both Zen and useful.

As long as we're aware of the problem, other approaches may also allow for decent search without getting sucked in by a siren or a marketer.

Siren worlds and the perils of over-optimised search
New Comment
418 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

While not generally an opponent of human sexuality, to be kind to all the LW audience including those whose parents might see them browsing, please do remove the semi-NSFW image.

4Stuart_Armstrong
Is the new one more acceptable?
7MugaSofer
See, now I'm curious about the old image...
3Stuart_Armstrong
The image can be found at http://truckstopstruckstop.tumblr.com/post/39569037859/nude-with-skull-via
6Eliezer Yudkowsky
Sure Why Not

LOL. The number of naked women grew from one to two, besides the bare ass we now also have breasts with nipples visible (OMG! :-D) and yet it's now fine just because it is old-enough Art.

4A1987dM
The fact that the current picture is a painting and the previous one was a photograph might also have something to do with it.
2Lumifer
Can you unroll this reasoning?
3A1987dM
It's just what my System 1 tells me; actually, I wouldn't know how to go about figuring out whether it's right.
1[anonymous]
Is there some other siren you'd prefer to see?
5Lumifer
See or hear? :-D

This indeed is why "What a human would think of a world, given a defined window process onto a world" was not something I considered as a viable form of indirect normativity / an alternative to CEV.

To my mind, the interesting part is the whole constrain search/satisficing ideas which may allow such an approach to be used.

[-][anonymous]170

First question: how on Earth would we go about conducting a search through possible future universes, anyway? This thought experiment still feels too abstract to make my intuitions go click, in much the same way that Christiano's original write-up of Indirect Normativity did. You simply can't actually simulate or "acausally peek at" whole universes at a time, or even Earth-volumes in such. We don't have the compute-power, and I don't understand how I'm supposed to be seduced by a siren that can't sing to me.

It seems to me that the greater danger is that a UFAI would simply market itself as an FAI as an instrumental goal and use various "siren and marketing" tactics to manipulate us into cleanly, quietly accepting our own extinction -- because it could just be cheaper to manipulate people than to fight them, when you're not yet capable of making grey goo but still want to kill all humans.

And if we want to talk about complex nasty dangers, it's probably going to just be people jumping for the first thing that looks eutopian, in the process chucking out some of their value-set. People do that a lot, see: every single so-called "utopian" movement ever ... (read more)

5Stuart_Armstrong
Two main reasons for this: first, there is Christiano's original write-up, which has this problem. Second, we may be in a situation where we ask an AI to simulate the consequences of its choice, have a glance at it, and then approve/disapprove. That's less a search problem, and more the original siren world problem, and we should be aware of the problem.
7[anonymous]
This sounds extremely counterintuitive. If I have an Oracle AI that I can trust to answer more-or-less verbal requests (defined as: any request or "program specification" too vague for me to actually formalize), why have I not simply asked it to learn, from a large corpus of cultural artifacts, the Idea of the Good, and then explain to me what it has learned (again, verbally)? If I cannot trust the Oracle AI, dear God, why am I having it explore potential eutopian future worlds for me?

If I cannot trust the Oracle AI, dear God, why am I having it explore potential eutopian future worlds for me?

Because I haven't read Less Wrong? ^_^

This is another argument against using constrained but non-friendly AI to do stuff for us...

4Stuart_Armstrong
Colloquially, this concept is indeed very close to overfitting. But it's not technically overfitting ("overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship."), and using the term brings in other connotations. For instance, it may be that the AI needs to use less data to seduce us than it would to produce a genuine eutopia. It's more that it fits the wrong target function (having us approve its choice vs a "good" choice) rather than fitting it in an overfitted way.
2[anonymous]
Thanks. My machine-learning course last semester didn't properly emphasize the formal definition of overfitting, or perhaps I just didn't study it hard enough. What I do want to think about here is: is there a mathematical way to talk about what happens when a learning algorithm finds the wrong correlative or causative link among several different possible links between the data set and the target function? Such maths would be extremely helpful for advancing the probabilistic value-learning approach to FAI, as they would give us a way to talk about how we can interact with an agent's beliefs about utility functions while also minimizing the chance/degree of wireheading.
0Stuart_Armstrong
That would be useful! A short search gives "bias" as the closest term, which isn't very helpful.
4[anonymous]
Unfortunately "bias" in statistics is completely unrelated to what we're aiming for here. In ugly, muddy words, what we're thinking is that we give the value-learning algorithm some sample of observations or world-states as "good", and possibly some as "bad", and "good versus bad" might be any kind of indicator value (boolean, reinforcement score, whatever). It's a 100% guarantee that the physical correlates of having given the algorithm a sample apply to every single sample, but we want the algorithm to learn the underlying causal structure of why those correlates themselves occurred (that is, to model our intentions as a VNM utility function) rather than learn the physical correlates themselves (because that leads to the agent wireheading itself). Here's a thought: how would we build a learning algorithm that treats its samples/input as evidence of an optimization process occurring and attempts to learn the goal of that optimization process? Since physical correlates like reward buttons don't actually behave as optimization processes themselves, this would ferret out the intentionality exhibited by the value-learner's operator from the mere physical effects of that intentionality (provided we first conjecture that human intentions behave detectably like optimization). Has that whole "optimization process" and "intentional stance" bit from the LW Sequences been formalized enough for a learning treatment?
2Quill_McGee
http://www.fungible.com/respect/index.html This looks to be very related to the idea of "Observe someone's actions. Assume they are trying to accomplish something. Work out what they are trying to accomplish." Which seems to be what you are talking about.
1[anonymous]
That looks very similar to what I was writing about, though I've tried to be rather more formal/mathematical about it instead of coming up with ad-hoc notions of "human", "behavior", "perception", "belief", etc. I would want the learning algorithm to have uncertain/probabilistic beliefs about the learned utility function, and if I was going to reason about individual human minds I would rather just model those minds directly (as done in Indirect Normativity).
0Stuart_Armstrong
I will think about this idea...
1[anonymous]
The most obvious weakness is that such an algorithm could easily detect optimization processes that are acting on us (or, if you believe such things exist, you should believe this algorithm might locate them mistakenly), rather than us ourselves.
1Stuart_Armstrong
I've been thinking about this, and I haven't found any immediately useful way of using your idea, but I'll keep it in the back of my mind... We haven't found a good way of identifying agency in the abstract sense ("was cosmic phenonmena X caused by an agent, and if so, which one?" kind of stuff), so this might be a useful simpler problem...
2[anonymous]
Upon further research, it turns out that preference learning is a field within machine learning, so we can actually try to address this at a much more formal level. That would also get us another benefit: supervised learning algorithms don't wirehead. Notably, this fits with our intuition that morality must be "taught" (ie: via labelled data) to actual human children, lest they simply decide that the Good and the Right consists of eating a whole lot of marshmallows. And if we put that together with a conservation heuristic for acting under moral uncertainty (say: optimize for expectedly moral expected utility, thus requiring higher moral certainty for less-extreme moral decisions), we might just start to make some headway on managing to construct utility functions that would mathematically reflect what their operators actually intend for them to do. I also have an idea written down in my notebook, which I've been refining, that sort of extends from what Luke had written down here. Would it be worth a post?
0[anonymous]
Hi, there appears to be a lot of work on learning causal structure from data.
0[anonymous]
Keywords? I've looked through Wikipedia and the table of contents from my ML textbook, but I haven't found the right term to research yet. "Learn a causal structure from the data and model the part of it that appears to narrow the future" would in fact be how to build a value-learner, but... yeah. EDIT: One of my profs from undergrad published a paper last year about causal-structure. The question is how useful it is for universal AI applications. Joshua Tenenbaum tackled it from the cog-sci angle in 2011, but again, I'm not sure how to transfer it over to the UAI angle. I was searching for "learning causal structure from data" -- herp, derp.
0IlyaShpitser
Who was this prof?
4[anonymous]
I was referring to David Jensen, who taught "Research Methods in Empirical Computer Science" my senior year.
2IlyaShpitser
Thanks.

This puts me in mind of a thought experiment Yvain posted a while ago (I’m certain he’s not the original author, but I can’t for the life of me track it any further back than his LiveJournal):

“A man has a machine with a button on it. If you press the button, there is a one in five million chance that you will die immediately; otherwise, nothing happens. He offers you some money to press the button once. What do you do? Do you refuse to press it for any amount? If not, how much money would convince you to press the button?”

This is – I think – analogous to y... (read more)

0Stuart_Armstrong
I consider that is also a constrained search!

One issue here is that worlds with an "almost-friendly" AI (one whose friendliness was botched in some respect) may end up looking like siren or marketing worlds.

In that case, worlds as bad as sirens will be rather too common in the search space (because AIs with botched friendliness are more likely than AIs with true friendliness) and a satisficing approach won't work.

2Stuart_Armstrong
Interesting thought there...

We could also restrict the search by considering "realistic" worlds. Suppose we had to take 25 different yes-no decisions that could affect the future of the humanity. This might be something like "choosing which of these 25 very different AIs to turn on and let loose together" or something more prosaic (which stocks to buy, which charities to support). This results in 225 different future worlds to search through: barely more than 33 million. Because there are so few worlds, they are unlikely to contain a marketing world (given the absolutely crucial

... (read more)

I've just now found my way to this post, from links in several of your more recent posts, and I'm curious as to how this fits in with more recent concepts and thinking from yourself and others.

Firstly, in terms of Garrabrant's taxonomy, I take it that the "evil AI" scenario could be considered a case of adversarial Goodhart, and the siren and marketing worlds without builders could be considered cases of regressional and/or extremal Goodhart. Does that sound right?

Secondly, would you still say that these scenarios demonstrate reas... (read more)

3Stuart_Armstrong
To a large extent I do, but there may be some residual effects similar to the above, so some anti-optimising pressure might still be useful.

It seems based on your later comments that the premise of marketing worlds existing relies on there being trade-offs between our specified wants and our unspecified wants, so that the world optimised for our specified wants must necessarily be highly likely to be lacking in our unspecified ones ("A world with maximal bananas will likely have no apples at all").

I don't think this is necessarily the case. If I only specify that I want low rates of abortion, for example, then I think it highly likely that 'd get a world that also has low rates of ST... (read more)

0Stuart_Armstrong
Yes, certainly. That's a problem of optimisation with finite resources. If A is a specified want and B is an unspecified want, then we shouldn't confuse "there are worlds with high A and also high B" with "the world with the highest A will also have high B".
0Stuart_Armstrong
You would get a world with no conception, or possibly with no humans at all.
2PhilosophyTutor
I don't think you have highlighted a fundamental problem since we can just specify that we mean a low percentage of conceptions being deliberately aborted in liberal societies where birth control and abortion are freely available to all at will. My point, though, is that I don't think it is very plausible that "marketing worlds" will organically arise where there are no humans, or no conception, but which tick all the other boxes we might think to specify in our attempts to describe an ideal world. I don't see how there being no conception or no humans could possibly be a necessary trade-off with things like wealth, liberty, rationality, sustainability, education, happiness, the satisfaction of rational and well-informed preferences and so forth. Of course a sufficiently God-like malevolent AI could presumably find some way of gaming any finite list we give it, since there are probably an unbounded number of ways of bringing about horrible worlds, so this isn't a problem with the idea of siren worlds. I just don't find the idea of market worlds very plausible because so many of the things we value are fundamentally interconnected.
1Stuart_Armstrong
The "no conception" example is just to illustrate that bad things happen when you ask an AI to optimise along a certain axis without fully specifying what we want (which is hard/impossible). A marketing world is fully optimised along the "convince us to choose this world" axis. If at any point, the AI in confronted with a choice along the lines of "remove genuine liberty to best give the appearance of liberty/happiness", it will choose to do so. That's actually the most likely way a marketing world could go wrong - the more control the AI has over people's appearance and behaviour, the more capable it is of making the world look good. So I feel we should presume that discrete-but-total AI control over the world's "inhabitants" would be the default in a marketing world.
5PhilosophyTutor
I think this and the "finite resources therefore tradeoffs" argument both fail to take seriously the interconnectedness of the optimisation axes which we as humans care about. They assume that every possible aspect of society is an independent slider which a sufficiently advanced AI can position at will, even though this society is still going to be made up of humans, will have to be brought about by or with the cooperation of humans and will take time to bring about. These all place constraints on what is possible because the laws of physics and human nature aren't infinitely malleable. I don't think discreet but total control over a world is compatible with things like liberty, which seem like obvious qualities to specify in an optimal world we are building an AI to search for. I think what we might be running in to here is less of an AI problem and more of a problem with the model of AI as an all-powerful genie capable of absolutely anything with no constraints whatsoever.
0Stuart_Armstrong
Precisely and exactly! That's the whole of the problem - optimising for one thing (appearance) results in the loss of other things we value. Next challenge: define liberty in code. This seems extraordinarily difficult. So we do agree that there are problem with an all-powerful genie? Once we've agreed on that, we can scale back to lower AI power, and see how the problems change. (the risk is not so much that the AI would be an all powerful genie, but that it could be an all powerful genie compared with humans).
5PhilosophyTutor
This just isn't always so. If you instruct an AI to optimise a car for speed, efficiency and durability but forget to specify that it has to be aerodynamic, you aren't going to get a car shaped like a brick. You can't optimise for speed and efficiency without optimising for aerodynamics too. In the same way it seems highly unlikely to me that you could optimise a society for freedom, education, just distribution of wealth, sexual equality and so on without creating something pretty close to optimal in terms of unwanted pregnancies, crime and other important axes. Even if it's possible to do this, it seems like something which would require extra work and resources to achieve. A magical genie AI might be able to make you a super-efficient brick-shaped car by using Sufficiently Advanced Technology indistinguishable from magic but even for that genie it would have to be more work than making an equally optimal car by the defined parameters that wasn't a silly shape. In the same way an effectively God-like hypothetical AI might be able to make a siren world that optimised for everything except crime and create a world perfect in every way except that it was rife with crime but it seems like it would be more work, not less. I think if we can assume we have solved the strong AI problem, we can assume we have solved the much lesser problem of explaining liberty to an AI. We've got a problem with your assumptions about all-powerful genies, I think, because I think your argument relies on the genie being so ultimately all-powerful that it is exactly as easy for the genie to make an optimal brick-shaped car or an optimal car made out of tissue paper and post-it notes as it is for the genie to make an optimal proper car. I don't think that genie can exist in any remotely plausible universe. If it's not all-powerful to that extreme then it's still going to be easier for the genie to make a society optimised (or close to it) across all the important axes at once than one opt
1Stuart_Armstrong
The strong AI problem is much easier to solve than the problem of motivating an AI to respect liberty. For instance, the first one can be brute forced (eg AIXItl with vast resources), the second one can't. Having the AI understand human concepts of liberty is pointless unless it's motivated to act on that understanding. An excess of anthropomophisation is bad, but an analogy could be about creating new life (which humans can do) and motivating that new life to follow specific rules are requirements if they become powerful (which humans are pretty bad at at).
5PhilosophyTutor
I don't believe that strong AI is going to be as simple to brute force as a lot of LessWrongers believe, personally, but if you can brute force strong AI then you can just get it to run a neuron-by-neuron simulation of the brain of a reasonably intelligent first year philosophy student who understands the concept of liberty and tell the AI not to take actions which the simulated brain thinks offend against liberty. That is assuming that in this hypothetical future scenario where we have a strong AI we are capable of programming that strong AI to do any one thing instead of another, but if we cannot do that then the entire discussion seems to me to be moot.
8Nornagest
I've met far too many first-year philosophy students to be comfortable with this program.
0Stuart_Armstrong
How? "tell", "the simulated brain thinks" "offend": defining those incredibly complicated concepts contains nearly the entirety of the problem.
2PhilosophyTutor
I could be wrong but I believe that this argument relies on an inconsistent assumption, where we assume we have solved the problem of creating an infinitely powerful AI, but we have not solved the problem of operationally defining commonplace English words which hundreds of millions of people successfully understand in such a way that a computer can perform operations using them. It seems to me that the strong AI problem is many orders of magnitude more difficult than the problem of rigorously defining terms like "liberty". I imagine that a relatively small part of the processing power of one human brain is all that is needed to perform operations on terms like "liberty" or "paternalism" and engage in meaningful use of them so it is a much, much smaller problem than the problem of creating even a single human-level AI, let alone a vastly superhuman AI. If in our imaginary scenario we can't even define "liberty" in such a way that a computer can use the term, it doesn't seem very likely that we can build any kind of AI at all.
0[anonymous]
My mind is throwing a type-error on reading your comment. Liberty could well be like pornography: we know it when we see it, based on probabilistic classification. There might not actually be a formal definition of liberty that includes all actual humans' conceptions of such as special cases, but instead a broad range of classifier parameters defining the variation in where real human beings "draw the line".
4PhilosophyTutor
The standard LW position (which I think is probably right) is that human brains can be modelled with Turing machines, and if that is so then a Turing machine can in theory do whatever it is we do when we decide that something ls liberty, or pornography. There is a degree of fuzziness in these words to be sure, but the fact we are having this discussion at all means that we think we understand to some extent what the term means and that we value whatever it is that it refers to. Hence we must in theory be able to get a Turing machine to make the same distinction although it's of course beyond our current computer science or philosophy to do so.
0Stuart_Armstrong
Yes. Here's another brute force approach: upload a brain (without understanding it), run it very fast with simulated external memory, subject it to evolutionary pressure. All this can be done with little philosophical and conceptual understanding, and certainly without any understanding of something as complex as liberty.
-1PhilosophyTutor
If you can do that, then you can just find someone who you think understands what we mean by "liberty" (ideally someone with a reasonable familiarity with Kant, Mill, Dworkin and other relevant writers), upload their brain without understanding it, and ask the uploaded brain to judge the matter. (Off-topic: I suspect that you cannot actually get a markedly superhuman AI that way, because the human brain could well be at or near a peak in the evolutionary landscape so that there is no evolutionary pathway from a current human brain to a vastly superhuman brain. Nothing I am aware of in the laws of physics or biology says that there must be any such pathway, and since evolution is purposeless it would be an amazing lucky break if it turned out that we were on the slope of the highest peak there is, and that the peak extends to God-like heights. That would be like if we put evolutionary pressure on a cheetah and discovered that if we do that we can evolve a cheetah that runs at a significant fraction of c. However I believe my argument still works even if I accept for the sake of argument that we are on such a peak in the evolutionary landscape, and that creating God-like AI is just a matter of running a simulated human brain under evolutionary pressure for a few billion simulated years. If we have that capability then we must also be able to run a simulated philosopher who knows what "liberty" refers to). EDIT: Downvoting this without explaining why you disagree doesn't help me understand why you disagree.
0Stuart_Armstrong
And would their understanding of liberty remain stable under evolutionary pressure? That seems unlikely. Have not been downvoting it.
0PhilosophyTutor
I didn't think we needed to put the uploaded philosopher under billions of years of evolutionary pressure. We would put your hypothetical pre-God-like AI in one bin and update it under pressure until it becomes God-like, and then we upload the philosopher separately and use them as a consultant. (As before I think that the evolutionary landscape is unlikely to allow a smooth upward path from modern primate to God-like AI, but I'm assuming such a path exists for the sake of the argument).
1Stuart_Armstrong
And then we have to ensure the AI follows the consultant (probably doable) and define what querying process is acceptable (very hard). But your solution (which is close to Paul Christiano's) works whatever the AI is, we just need to be able to upload a human. My point was that we could conceivably create an AI without understanding any of the hard problems, still stands. If you want I can refine it: allow partial uploads: we can upload brains, but they don't function as stable humans, as we haven't mapped all the fine details we need to. However, we can use these imperfect uploads, plus a bit of evolution, to produce AIs. And here we have no understanding of how to control its motivations at all.
1PhilosophyTutor
I won't argue against the claim that we could conceivably create an AI without knowing anything about how to create an AI. It's trivially true in the same way that we could conceivably turn a monkey loose on a typewriter and get strong AI. I also agree with you that if we got an AI that way we'd have no idea how to get it to do any one thing rather than another and no reason to trust it. I don't currently agree that we could make such an AI using a non-functioning brain model plus "a bit of evolution". I am open to argument on the topic but currently it seems to me that you might as well say "magic" instead of "evolution" and it would be an equivalent claim.
0Stuart_Armstrong
Why are you confident that an AI that we do develop will not have these traits? You agree the mindspace is large, you agree we can develop some cognitive abilities without understanding them. If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments, that the AI will be likely developed for a military or commercial purpose, I don't see why you'd have high confidence that they will converge on a safe design?
3XiXiDu
Why do you think such an AI wouldn't just fail at being powerful, rather than being powerful in a catastrophic way? If programs fail in the real world then they are not working well. You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.
0Stuart_Armstrong
If it fails at being powerful, we don't have to worry about it, so I feel free to ignore those probabilities. But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...
0TheAncientGeek
So you're not pursuing the claim that a SAI will probably be dangerous, you are just worried that it might be?
0Stuart_Armstrong
My claim has always been that the probability that an SAI will be dangerous is too high to ignore. I fluctuate on the exact probability, but I've never seen anything that drives it down to a level I feel comfortable with (in fact, I've never seen anything drive it below 20%).
-2[anonymous]
This is why the Wise employ normative uncertainty and the learning of utility functions from data, rather than hardcoding verbal instructions that only make sense in light of a complete human mind and social context.
3Stuart_Armstrong
Indeed. But the more of the problem you can formalise and solve (eg maintaining a stable utility function over self-improvements) the more likely the learning approach is to succeed.
2[anonymous]
Well yes, of course. I mean, if you can't build an agent that was capable of maintaining its learned utility while becoming vastly smarter (and thus capable of more accurately learning and enacting capital-G Goodness), then all that utility-learning was for nought.
1TheAncientGeek
Yeah, but hardcoding is an easier sell to people who know how to code but have never done .AI... Its like political demagogues selling unworkable but easily understood ideas.
0[anonymous]
Not really, no. Most people don't recognize the "hidden complexity of wishes" in Far Mode, or when it's their wishes. However, I think if I explain to them that I'll be encoding my wishes, they'll quickly figure out that my attempts to hardcode AI Friendliness are going to be very bad for them. Human intelligence evolved for winning arguments when status, wealth, health, and mating opportunities are at issue: thus, convince someone to treat you as an opponent, and leave the correct argument lying right where they can pick it up, and they'll figure things out quickly. Hmmm... I wonder if that bit of evolutionary psychology explains why many people act rude and nasty even to those close to them. Do we engage more intelligence when trying to win a fight than when trying to be nice?
-7TheAncientGeek
-2XiXiDu
The very idea underlying AI is enabling people to get a program to do what they mean without having to explicitly encode all details. What AI risk advocates do is to turn the whole idea upside down, claiming that, without explicitly encoding what you mean, your program will do something else. The problem here is that it is conjectured that the program will do what it was not meant to do in a very intelligent and structured manner. But this can't happen when it comes to intelligently designed systems (as opposed to evolved systems), because the nature of unintended consequences is overall chaotic. How often have you heard of intelligently designed programs that achieved something highly complex and marvelous, but unintended, thanks to the programmers being unable to predict the behavior of the program? I don't know of any such case. But this is exactly what AI risk advocates claim will happen, namely that a program designed to do X (calculate 1+1) will perfectly achieve Y (take over the world). If artificial general intelligence will eventually be achieved by some sort of genetic/evolutionary computation, or neuromorphic engineering, then I can see how this could lead to unfriendly AND capable AI. But an intelligently designed AI will either work as intended or be incapable of taking over the world (read: highly probable). This of course does not ensure a positive singularity (if you believe that this is possible at all), since humans might use such intelligently and capable AIs to wreck havoc (ask the AI to do something stupid, or something that clashes with most human values). So there is still a need for "friendly AI". But this is quite different from the idea of interpreting "make humans happy" as "tile the universe with smiley faces". Such a scenario contradicts the very nature of intelligently designed AI, which is an encoding of “Understand What Humans Mean” AND “Do What Humans Mean”. More here.
2[anonymous]
Alexander, have you even bothered to read the works of Marcus Hutter and Juergen Schmidhuber, or have you spent all your AI-researching time doing additional copy-pastas of this same argument every single time the subject of safe or Friendly AGI comes up? Your argument makes a measure of sense if you are talking about the social process of AGI development: plainly, humans want to develop AGI that will do what humans intend for it to do. However, even a cursory look at the actual research literature shows that the mathematically most simple agents (ie: those that get discovered first by rational researchers interested in finding universal principles behind the nature of intelligence) are capital-U Unfriendly, in that they are expected-utility maximizers with not one jot or tittle in their equations for peace, freedom, happiness, or love, or the Ideal of the Good, or sweetness and light, or anything else we might want. (Did you actually expect that in this utterly uncaring universe of blind mathematical laws, you would find that intelligence necessitates certain values?) No, Google Maps will never turn superintelligent and tile the solar system in computronium to find me a shorter route home from a pub crawl. However, an AIXI or Goedel Machine instance will, because these are in fact entirely distinct algorithms. In fact, when dealing with AIXI and Goedel Machines we have an even bigger problem than "tile everything in computronium to find the shortest route home": the much larger problem of not being able to computationally encode even a simple verbal command like "find the shortest route home". We are faced with the task of trying to encode our values into a highly general, highly powerful expected-utility maximizer at the level of, metaphorically speaking, pre-verbal emotion. Otherwise, the genie will know, but not care. Now, if you would like to contribute productively, I've got some ideas I'd love to talk over with someone for actually doing something about
-2XiXiDu
If I believed that anything as simple as AIXI could possibly result in practical general AI, or that expected utility maximizing was at all feasible, then I would tend to agree with MIRI. I don't. And I think it makes no sense to draw conclusions about practical AI from these models. This is crucial. That's largely irrelevant and misleading. Your autonomous car does not need to feature an encoding of an amount of human values that correspondents to its level of autonomy. That post has been completely debunked. ETA: Fixed a link to expected utility maximization.
-3XiXiDu
I asked several people what they think about it, and to provide a rough explanation. I've also had e-Mail exchanges with Hutter, Schmidhuber and Orseau. I also informally thought about whether practically general AI that falls into the category “consequentialist / expected utility maximizer / approximation to AIXI” could ever work. And I am not convinced. If general AI, which is capable of a hard-takeoff, and able to take over the world, requires less lines of code, in order to work, than to constrain it not to take over the world, then that's an existential risk. But I don't believe this to be the case. Since I am not a programmer, or computer scientist, I tend to look at general trends, and extrapolate from there. I think this makes more sense than to extrapolate from some unworkable model such as AIXI. And the general trend is that humans become better at making software behave as intended. And I see no reason to expect some huge discontinuity here. Here is what I believe to be the case: (1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions. (2) Error detection and prevention is such a capability. (3) Something that is not better than humans at preventing errors is no existential risk. (4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors. (5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error. Here is what I doubt: (1) Present-day software is better than previous software generations at understanding and doing what humans mean. (2) There will be future generations of software which will be better than the current generation at understanding and doing what human
8jimrandomh
This is a much bigger problem for your ability to reason about this area than you think.
1XiXiDu
A relevant quote from Eliezer Yudkowsky (source): And another one (source): So since academic consensus on the topic is not reliable, and domain knowledge in the field of AI is negatively useful, what are the prerequisites for grasping the truth when it comes to AI risks?
3Jiro
I think that in saying this, Eliezer is making his opponents' case for them. Yes, of course the standard would also let you discard cryonics. One solution to that is to say that the standard is bad. Another solution is to say "yes, and I don't much care for cryonics either".
-1[anonymous]
Nah, those are all plausibly correct things that mainstream science has mostly ignored and/or made researching taboo. If you prefer a more clear-cut example, science was wrong about continental drift for about half a century -- until overwhelming, unmistakable evidence became available.
3Jiro
The main reason that scientists rejected continental drift was that there was no known mechanism which could cause it; plate tectonics wasn't developed until the late 1950's. Continental drift is also commonly invoked by pseudoscientists as a reason not to trust scientists, and if you do so too you're in very bad company. There's a reason why pseudoscientists keep using continental drift for this purpose and don't have dozens of examples: examples are very hard to find. Even if you decide that continental drift is close enough that it counts, it's a very atypical case. Most of the time scientists reject something out of hand, they're right, or at worst, wrong about the thing existing, but right about the lack of good evidence so far.
-3[anonymous]
There was also a great deal of institutional backlash against proponents of continental drift, which was my point. Guilt by association? Grow up. There are many, many cases of scientists being oppressed and dismissed because of their race, their religious beliefs, and their politics. That's the problem, and that's what's going on with the CS people who still think AI Winter implies AGI isn't worth studying.
3Jiro
So? I'm pretty sure that there would be backlash against, say, homeopaths in a medical association. Backlash against deserving targets (which include people who are correct but because of unlucky circumstances, legitimately look wrong) doesn't count. I'm reminded of an argument I had with a proponent of psychic power. He asked me what if psychic powers happen to be of such a nature that they can't be detected by experiments, don't show up in double-blind tests, etc.. I pointed out that he was postulating that psi is real but looks exactly like a fake. If something looks exactly like a fake, at some point the rational thing to do is treat it as fake. At that point in history, continental drift happened to look like a fake. That's not guilt by association, it's pointing out that the example is used by pseudoscientists for a reason, and this reason applies to you too. If scientists dismissed cryonics because of the supporters' race, religion, or politics, you might have a point.
-3[anonymous]
I'll limit my response to the following amusing footnote: This is, in fact, what happened between early cryonics and cryobiology. EDIT: Just so people aren't misled by Jiro's motivated interpretation of the link: Obviously political.
3Jiro
You're equivocating on the term "political". When the context is "race, religion, or politics", "political" doesn't normally mean "related to human status", it means "related to government". Besides, they only considered it low status based on their belief that it is scientifically nonsensical. My reply was steelmanning your post by assuming that the ethical considerations mentioned in the article counted as religious. That was the only thing mentioned in it that could reasonably fall under "race, religion, or politics" as that is normally understood.
3Jiro
Most of the history described in your own link makes it clear that scientists objected because they think cryonics is scientifically nonsense, not because of race, religion, or politics. The article then tacks on a claim that scientists reject it for ethical reasons, but that isn't supported by its own history, just by a few quotes with no evidence that these beliefs are prevalent among anyone other than the people quoted. Furthermore, of the quotes it does give, one of them is vague enough that I have no idea if it means in context what the article claims it means. Saying that the "end result" is damaging doesn't necessarily mean that having unfrozen people walking around is damaging--it may mean that he thinks cryonics doesn't work and that having a lot of resources wasted on freezing corpses is damaging.
2nshepperd
At a minimum, a grasp of computer programming and CS. Computer programming, not even AI. I'm inclined to disagree somewhat with Eliezer_2009 on the issue of traditional AI - even basic graph search algorithms supply valuable intuitions about what planning looks like, and what it is not. But even that same (obsoleted now, I assume) article does list computer programming knowledge as a requirement.
0XiXiDu
What counts as "a grasp" of computer programming/science? I can e.g. program a simple web crawler and solve a bunch of Project Euler problems. I've read books such as "The C Programming Language". I would have taken the udacity courses on machine learning by now, but the stated requirement is a strong familiarity with Probability Theory, Linear Algebra and Statistics. I wouldn't describe my familiarity as strong, that will take a few more years. I am skeptical though. If the reason that I dismiss certain kinds of AI risks is that I lack the necessary education, then I expect to see rebuttals of the kind "You are wrong because of (add incomprehensible technical justification)...". But that's not the case. All I see are half-baked science fiction stories and completely unconvincing informal arguments.
2jimrandomh
This is actually a question I've thought about quite a bit, in a different context. So I have a cached response to what makes a programmer, not tailored to you or to AI at all. When someone asks for guidance on development as a programmer, the question I tend to ask is, how big is the biggest project you architected and wrote yourself? The 100 line scale tests only the mechanics of programming; the 1k line scale tests the ability to subdivide problems; the 10k line scale tests the ability to select concepts; and the 50k line scale tests conceptual taste, and the ability to add, split, and purge concepts in a large map. (Line numbers are very approximate, but I believe the progression of skills is a reasonably accurate way to characterize programmer development.)
2trist
New programmers (not jimrandomh), be wary of line counts! It's very easy for a programmer who's not yet ready for a 10k line project to turn it into a 50k lines. I agree with the progression of skills though.
0jimrandomh
Yeah, I was thinking more of "project as complex as an n-line project in an average-density language should be". Bad code (especially with copy-paste) can inflate inflate line numbers ridiculously, and languages vary up to 5x in their base density too.
0Nornagest
I think you're overestimating these requirements. I haven't taken the Udacity courses, but I did well in my classes on AI and machine learning in university, and I wouldn't describe my background in stats or linear algebra as strong -- more "fair to conversant". They're both quite central to the field and you'll end up using them a lot, but you don't need to know them in much depth. If you can calculate posteriors and find the inverse of a matrix, you're probably fine; more complicated stuff will come up occasionally, but I'd expect a refresher when it does.
0[anonymous]
Don't twist Eliezer's words. There's a vast difference between "a PhD in what they call AI will not help you think about the mathematical and philosophical issues of AGI" and "you don't need any training or education in computing to think clearly about AGI".
-6TheAncientGeek
-1jimrandomh
Ability to program is probably not sufficient, but it is definitely necessary. But not because of domain relevance; it's necessary because programming teaches cognitive skills that you can't get any other way, by presenting a tight feedback loop where every time you get confused, or merge concepts that needed to be distinct, or try to wield a concept without fully sharpening your understanding of it first, the mistake quickly gets thrown in your face. And, well... it's pretty clear from your writing that you haven't mastered this yet, and that you aren't going to become less confused without stepping sideways and mastering the basics first.
0Lumifer
That looks highly doubtful to me.
1trist
You mean that most cognitive skills can be taught in multiple ways, and you don't see why those taught by programming are any different? Or do you have a specific skill taught by programming in mind, and think there's other ways to learn it?
4Lumifer
There are a whole bunch of considerations. First, meta. It should be suspicious to see programmers claiming to posses special cognitive skills that only they can have -- it's basically a "high priesthood" claim. Besides, programming became widespread only about 30 years ago. So, which cognitive skills were very rare until that time? Second, "presenting a tight feedback loop where ... the mistake quickly gets thrown in your face" isn't a unique-to-programming situation by any means. Third, most cognitive skills are fairly diffuse and cross-linked. Which specific cognitive skills you can't get any way other than programming? I suspect that what the OP meant was "My programmer friends are generally smarter than my non-programmer friends" which is, um, a different claim :-/
5Nornagest
I don't think programming is the only way to build... let's call it "reductionist humility". Nor even necessarily the most reliable; non-software engineers probably have intuitions at least as good, for example, to say nothing of people like research-level physicists. I do think it's the fastest, cheapest, and currently most common, thanks to tight feedback loops and a low barrier to entry. On the other hand, most programmers -- and other types of engineers -- compartmentalize this sort of humility. There might even be something about the field that encourages compartmentalization, or attracts to it people that are already good at it; engineers are disproportionately likely to be religious fundamentalists, for example. Since that's not sufficient to meet the demands of AGI problems, we probably shouldn't be patting ourselves on the back too much here.
0Lumifer
Can you expand on how do you understand "reductionist humility", in particular as a cognitive skill?
4Nornagest
I might summarize it as an intuitive understanding that there is no magic, no anthropomorphism, in what you're building; that any problems are entirely due to flaws in your specification or your model. I'm describing it in terms of humility because the hard part, in practice, seems to be internalizing the idea that you and not some external malicious agency are responsible for failures. This is hard to cultivate directly, and programmers usually get partway there by adopting a semi-mechanistic conception of agency that can apply to the things they're working on: the component knows about this, talks to that, has such-and-such a purpose in life. But I don't see it much at all outside of scientists and engineers.
1A1987dM
IOW realizing that the reason why if you eat a lot you get fat is not that you piss off God and he takes revenge, as certain people appear to alieve.
0Lumifer
So it's basically responsibility? Clearly you never had to chase bugs through third-party libraries... :-) But yes, I understand what you mean, though I am not sure in which way this is a cognitive skill. I'd probably call it an attitude common to professions in which randomness or external factors don't play a major role -- sure, programming and engineering are prominent here.
0Nornagest
You could describe it as a particular type of responsibility, but that feels noncentral to me. Heh. A lot of my current job has to do with hacking OpenSSL, actually, which is by no means a bug-free library. But that's part of what I was trying to get at by including the bit about models -- and in disciplines like physics, of course, there's nothing but third-party content. I don't see attitudes and cognitive skills as being all that well differentiated.
-2TheAncientGeek
But randomness and external factors do predominate in almost everything. For that reason, applying programming skills to other domains is almost certain to be suboptimal
0Lumifer
I don't think so, otherwise walking out of your door each morning would start a wild adventure and attempting to drive a vehicle would be an act of utter madness.
-2TheAncientGeek
They don't predominate overall because you have learnt how to deal with them. If there were no random or external factors in driving, you could do so with a blindfold on.
-1Lumifer
... Make up your mind :-)
-2TheAncientGeek
Predominate in almost every problem. Don't predominate in any solved problem. Learning to drive is learningto deal with other traffic (external) and not knowing what is going to happen next (random)
0TheAncientGeek
Much of the writing on this site is philosophy, and people with a technology background tend not to grok philosophy because they are accurated to answer that can be be looked up, or figured out by known methods. If they could keep the logic chops and lose the impatience, they [might make good philosophers], but they tend not to.
0Nornagest
Beg pardon?
-1[anonymous]
On a complete sidenote, this is a lot of why programming is fun. I've also found that learning the Coq theorem-prover has exactly the same effect, to the point that studying Coq has become one of the things I do to relax.
-2[anonymous]
People have been telling him this for years. I doubt it will get much better.
0[anonymous]
Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power. Strangely, doing so will not make it conform to your ideas about "eventual future AGI", because this one is actually existing AGI, and reality doesn't have to listen to you. That is exactly the situation we face, your refusal to believe in actually-existing AGI models notwithstanding. Whine all you please: the math will keep on working. Then I recommend you shut up about matters of highly involved computer science until such time as you have acquired the relevant knowledge for yourself. I am a trained computer scientist, and I held lots of skepticism about MIRI's claims, so I used my training and education to actually check them. And I found that the actual evidence of the AGI research record showed MIRI's claims to be basically correct, modulo Eliezer's claims about an intelligence explosion taking place versus Hutter's claim that an eventual optimal agent will simply scale itself up in intelligence with the amount of computing power it can obtain. That's right, not everyone here is some kind of brainwashed cultist. Many of us have exercised basic skepticism against claims with extremely low subjective priors. But we exercised our skepticism by doing the background research and checking the presently available object-level evidence rather than by engaging in meta-level speculations about an imagined future in which everything will just work out. Take a course at your local technical college, or go on a MOOC, or just dust off a whole bunch of textbooks in computer-scientific and mathematical subjects, study the necessary knowledge to talk about AGI, and then you get to barge in telling everyone around you how we're all full of crap.
4private_messaging
Which one are you talking about, to be completely exact? then use that training and figure out how many galaxies worth of computing power it's going to take.
0[anonymous]
Of bleeding course I was talking about AIXI. What I find strange to the point of suspiciousness here is the evinced belief on part of the "AI skeptics" that the inefficiency of MC-AIXI means there will never, ever be any such thing as near-human, human-equivalent, or greater-than-human AGIs. After all, if intelligence is impossible without converting whole galaxies to computronium first, then how do we work? And if we admit that sub-galactic intelligence is possible, why not artificial intelligence? And if we admit that sub-galactic artificial intelligence is possible, why not something from the "Machine Learning for Highly General Hypothesis Classes + Decision Theory of Active Environments = Universal AI" paradigm started by AIXI? I'm not at all claiming current implementations of AIXI or Goedel Machines are going to cleanly evolve into planet-dominating superintelligences that run on a home PC next year, or even next decade (for one thing, I don't think planet dominating superintelligences will run on a present-day home PC ever). I am claiming that the underlying scientific paradigm of the thing is a functioning reduction of what we mean by the word "intelligence", and given enough time to work, this scientific paradigm is very probably (in my view) going to produce software you can run on an ordinary massive server farm that will be able to optimize arbitrary, unknown or partially unknown environments according to specified utility functions. And eventually, yes, those agents will become smarter than us (causing "MIRI's issues" to become cogent), because we, actual human beings, will figure out the relationships between compute-power, learning efficiency (rates of convergence to error-minimizing hypotheses in terms of training data), reasoning efficiency (moving probability information from one proposition or node in a hypothesis to another via updating), and decision-making efficiency (compute-power needed to plan well given models of the environment). Actual
0private_messaging
The notion that AI is possible is mainstream. The crank stuff such as "I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.", that's to computer science as hydrinos are to physics. As for your server farm optimizing unknown environments, the last time I checked, we knew some laws of physics, and did things like making software tools that optimize simulated environments that follow said laws of physics, incidentally it also being mathematically nonsensical to define an "utility function" without a well defined domain. So you got your academic curiosity that's doing all on it's own and using some very general and impractical representations for modelling the world, so what? You're talking of something that is less - in terms of it's market value, power, anything - than it's parts and underlying technologies.
-2[anonymous]
Which is why reinforcement learning is so popular, yes: it lets you induce a utility function over any environment you're capable of learning to navigate. Remember, any machine-learning algorithm has a defined domain of hypotheses it can learn/search within. Given that domain of hypotheses, you can define what a domain of utility functions. Hence, reinforcement learning and preference learning. You are completely missing the point. If we're all going to agree that AI is possible, and agree that there's a completely crappy but genuinely existent example of AGI right now, then it follows that getting AI up to dangerous and/or beneficial levels is a matter of additional engineering progress. My whole point is that we've already crossed the equivalent threshold from "Hey, why do photons do that when I fire them at that plate?" to "Oh, there's a photoelectric effect that looks to be described well by this fancy new theory." From there it was less than one century between the raw discovery of quantum mechanics and the common usage of everyday technologies based on quantum mechanics. The point being: when we can manage to make it sufficiently efficient, and provided we can make it safe, we can set it to work solving just about any problem we consider to be, well, a problem. Given sufficient power and efficiency, it becomes useful for doing stuff people want done, especially stuff people either don't want to do themselves or have a very hard time doing themselves.
5Richard_Kennaway
This is devoid of empirical content.
2private_messaging
Yeah. I can write formally the resurrection of everyone who ever died. Using pretty much exact same approach. A for loop, iterating over every possible 'brain' just like the loops that iterate over every action sequence. Because when you have no clue how to do something, you can always write a for loop. I can put it on github, then cranks can download it and say that resurrecting all dead is a matter of additional engineering progress. After all, all dead had once lived, so it got to be possible for them to be alive.
0[anonymous]
How so?
5Richard_Kennaway
Describing X as "Y, together with the difference between X and Y" is a tautology. Drawing the conclusion that X is "really" a sort of Y already, and the difference is "just" a matter of engineering development is no more than inspirational fluff. Dividing problems into subproblems is all very well, but not when one of the subproblems amounts to the whole problem. The particular instance "here's a completely crappy attempt at making an AGI and all we have to do is scale it up" has been a repeated theme of AGI research from the beginning. The scaling up has never happened. There is no such thing as a "completely crappy AGI", only things that aren't AGI.
0nshepperd
I think you underestimate the significance of reducing the AGI problem to the sequence prediction problem. Unlike the former, the latter problem is very well defined, and progress is easily measurable and quantifiable (in terms of efficiency of cross-domain compression). The likelyhood of engineering progress on a problem where success can be quantified seems significantly higher than on something as open ended as "general intelligence".
2private_messaging
It doesn't "reduce" anything, not in reductionism sense anyway. If you are to take that formula and apply the yet unspecified ultra powerful mathematics package to it - that's what you need to run it on planet worth of computers - it's this mathematics package that has to be extremely intelligent and ridiculously superhuman, before the resulting AI is even a chimp. It's this mathematics package that has to learn tricks and read books, that has to be able to do something as simple as making use of a theorem it encountered on input.
0[anonymous]
The mathematics package doesn't have to do anything "clever" to build a highly clever sequence predictor. It just has to be efficient in terms of computing time and training data necessary to learn correct hypotheses. So nshepperd is quite correct: MC-AIXI is a ridiculously inefficient sequence predictor and action selector, with major visible flaws, but reducing "general intelligence" to "maximizing a utility function over world-states via sequence prediction in an active environment" is a Big Deal.
0private_messaging
Multitude of AIs have been following what you think "AIXI" model is - select predictors that work, use them - long before anyone bothered to formulate it as a brute force loop (AIXI). I think you, like most people over here, have a completely inverted view with regards to the difficulty of different breakthroughs. There is a point where the AI uses hierarchical models to deal with environment of greater complexity than the AI itself; getting there is fundamentally difficult, as in, we have no clue how to get there. It is nice to believe that the word of some hoi polloi is waiting on you for some conceptual breakthrough just roughly within your reach like AIXI is, but that's just not how it works. edit: Basically, it's as if you're concerned about nuclear powered 20 feet tall robots that shoot nuclear hand grenades. After all, the concept of 20 feet tall robot is the enormous breakthrough, while a sufficiently small nuclear reactor or hand grenade sized nukes are just a matter of "efficiency".
3Nornagest
That's not what's interesting about AIXI. "Select predictors that work, then use them" is a fair description of the entire field of machine learning; we've learned how to do that fairly well in narrow, well-defined problem domains, but hypothesis generation over poorly structured, arbitrarily complex environments is vastly harder. The AIXI model is cool because it defines a clever (if totally impractical, and not without pitfalls) way of specifying a single algorithm that can generalize to arbitrary environments without requiring any pipe-fitting work on the part of its developers. That is (to my knowledge) new, and fairly impressive, though it remains a purely theoretical advance: the Monte Carlo approximation eli mentioned may qualify as general AI in some technical sense, but for practical purposes it's about as smart as throwing transistors at a dart board.
3[anonymous]
What a wonderful quote!
0private_messaging
Hypothesis generation over environments that aren't massively less complex than the machine is vastly harder, and remains vastly harder (albeit there are advances). There's a subtle problem substitution occurring which steals the thunder you originally reserved for something that actually is vastly harder. Thing is, many people could at any time write a loop over, say, possible neural network values, and NNs (with feedback) being Turing complete, it'd work roughly the same. Said for loop would be massively, massively less complicated, ingenious, and creative than what those people actually did with their time instead. The ridiculousness here is that, say, John worked on those ingenious algorithms while keeping in mind that the ideal is the best parameters out of the whole space (which is the abstract concept behind the for loop iteration over those parameters). You couldn't see what John was doing because he didn't write it out as a for loop. So James does some work where he - unlike John - has to write out the for loop explicitly, and you go Whoah! Isn't. See Solomonoff induction, works of Kolmogorov, etc.
1private_messaging
There's the AIs that solve novel problems along the lines of "design a better airplane wing" or "route a microchip", and in that field, reinforcement learning of how basic physics works is pretty much one hundred percent irrelevant. Slow, long term progress, an entire succession of technologies. Really, you're just like free energy pseudoscientists. They do all the same things. Ohh, you don't want to give money for cold fusion? You must be a global warming denialist. That's the way they think and that's precisely the way you think about the issue. That you can make literally cold fusion happen with muons in no way shape or form supports what the cold fusion crackpots are doing. Nor does it make cold fusion power plants any more or less a matter of "additional engineering progress" than it would be otherwise. edit: by same logic, resurrection of the long-dead never-preserved is merely a matter of "additional engineering progress". Because you can resurrect the dead using this exact same programming construct that AIXI uses to solve problems. It's called a "for loop", there's this for loop in monte carlo aixi. This loop goes over every possible [thing] when you have no clue what so ever how to actually produce [thing] . Thing = action sequence for AIXI and the brain data for resurrection of the dead.
-1[anonymous]
Ok, hold on, halt, major question: how closely do you follow the field of machine learning? And computational cognitive science? Because on the one hand, there is very significant progress being made. On the other hand, when I say "additional engineering progress", that involves anywhere from years to decades of work before being able to make an agent that can compose an essay, due to the fact that we need classes of learners capable of inducing fairly precise hypotheses over large spaces of possible programs. What it doesn't involve is solving intractable, magical-seeming philosophical problems like the nature of "intelligence" or "consciousness" that have always held the field of AI back. No, that's just plain impossible. Even in the case of cryonic so-called "preservation", we don't know what we don't know about what information we will have needed preserved to restore someone.
3private_messaging
(makes the gesture with the hands) Thiiiiis closely. Seriously though, not far enough as to start claiming that mc-AIXI does something interesting when run on a server with root access, or to claim that it would be superhuman if run on all computers we got, or the like. Do I need to write code for that and put it on github? Iterates over every possible brain (represented as, say, a Turing machine), runs it for enough timesteps. Requires too much computing power.
0[anonymous]
Tell me, if I signed up as the PhD student of one among certain major general machine learning researchers, and built out their ideas into agent models, and got one of those running on a server cluster showing interesting proto-human behaviors, might it interest you?
-2TheAncientGeek
Progress in 1. The sense of incrementally throwing more resources at AIXI, or 2. Forgetting AIXI , and coming up with something more parsimonious? Because, if it's 2, there is no other AGI to use as a stating point got incremental progress.
0[anonymous]
Is that what they tell you?
3V_V
I think you are underestimating this by many orders of magnitudes.
2private_messaging
Yeah. A starting point could be the AI writing some 1000 letter essay (action space of 27^1000 without punctuation) or talking through a sound card (action space of 2^(16*44100) per second). If he was talking about mc-AIXI on github, the relevant bits seem to be in the agent.cpp and it ain't looking good.
2David_Gerard
what
7nshepperd
https://github.com/moridinamael/mc-aixi We won't get a chance to test the "planet's worth of computing power" hypothesis directly, since none of us have access to that much computing power. But, from my own experience implementing mc-aixi-ctw, I suspect that is an underestimate of the amount of compute power required. The main problem is that the sequence prediction algorithm (CTW) makes inefficient use of sense data by "prioritizing" the most recent bits of the observation string, so only weakly makes connections between bits that are temporally separated by a lot of noise. Secondarily, plain monte carlo tree search is not well-suited to decision making in huge action spaces, because it wants to think about each action at least once. But that can most likely be addressed by reusing sequence prediction to reduce the "size" of the action space by chunking actions into functional units. Unfortunately. both of these problems are only really technical ones, so it's always possible that some academic will figure out a better sequence predictor, lifting mc-aixi on an average laptop from "wins at pacman" to "wins at robot wars" which is about the level at which it may start posing a threat to human safety.
0V_V
only? Mc-aixi is not going to win at something as open ended as robot wars just by replacing CTW or CTS with something better. And anyway, even if it did, it wouldn't be about the level at which it may start posing a threat to human safety. Do you think that the human robot wars champions a threat to human safety? Are they even at the level of taking over the world? I don't think so.
0nshepperd
When I said a threat to human safety, I meant it literally. A robot wars champion won't take over the world (probably) but it can certainly hurt people, and will generally have no moral compunctions about doing so (only hopefully sufficient anti-harm conditioning, if its programmers thought that far ahead).
1V_V
Ah yes, but in this sense, cars, trains, knives, etc., also can certainly hurt people, and will generally have no moral compunctions about doing so. What's special about robot wars-winning AIs?
0Cyan
Domain-general intelligence, presumably.
0private_messaging
Most basic pathfinding plus being a spinner (Hypnodisk-style) = win vs most non spinners.
0Cyan
I took "winning at Robot Wars" to include the task of designing the robot that competes. Perhaps nshepperd only meant piloting, though...
0private_messaging
Well, we're awfully far from that. Automated programming is complete crap, automatic engineering is quite cool but its practical tools, it's not a power fantasy where you make some simple software with surprisingly little effort and then it does it all for you.
0