In general, the idea of ensuring AI safety is great (I do a lot of work on that myself), but I have a problem with people asking for donations so they can battle nonexistent threats from AI.

Many people are selling horror stories about the terrible things that could happen when AIs become truly intelligent - and those horror stories frequently involve the idea that even if we go to enormous lengths to build a safe AI, and even if we think we have succeeded, those pesky AIs will wriggle out from under the safety net and become psychopathic monsters anyway.

To be sure, future AIs might do something other than what we expect - so the general principle is sound - but the sad thing about these horror stories is that if you look closely you will find they are based on a set of astonishingly bad assumptions about how the supposed AIs of the future will be constructed.  The worst of these bad assumptions is the idea that AIs will be controlled by something called "reinforcement learning" (frequently abbreviated to "RL").

WARNING!   If you already know about reinforcement learning, I need you to be absolutely clear that what I am talking about here is the use of RL at the global-control level of an AI.  I am not talking about RL as it appears in relatively small, local circuits or adaptive feedback loops.  There has already been much confusion about this (with people arguing vehemently that RL has been applied here, there, and all over the place with great success).  RL does indeed work in limited situations where the reward signal is clear and the control policies are short(ish) and not too numerous:  the point of this essay is to explain that when it comes to AI safety issues, RL is assumed at or near the global level, where reward signals are virtually impossible to find, and control policies are both gigantic (sometimes involving actions spanning years) and explosively numerous.

EDIT:   In the course of numerous discussions, one question has come up so frequently that I have decided to deal with it here in the essay.  The question is:  "You say that RL is used almost ubiquitously as the architecture behind these supposedly dangerous AI systems, and yet I know of many proposals for dangerous AI scenarios that do not talk about RL."

In retrospect this is a (superficially) fair point, so I will clarify what I meant.

All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).  Without repeating that story here, I can summarize by saying that those weaknesses lead straight to a set of solutions that are manifestly easy to implement. For example, in the case of Steve Omohundro's paper, it is almost trivial to suggest that for ALL of the types of AI he considers, he has forgotten to add a primary supergoal which imposes a restriction on the degree to which "instrumental goals" are allowed to supercede the power of other goals. At a stroke, every problem he describes in the paper disappears, with the single addition of a goal that governs the use of instrumental goals -- the system cannot say "If I want to achieve goal X I could do that more efficiently if I boosted my power, so therefore I should boost my power to cosmic levels first, and then get back to goal X."  This weakness is so pervasive that I can hardly think of a popular AI Risk scenario that is not susceptible to it.

However, in response to this easy demolition of those weak scenarios, people who want to salvage the scenarios invariably resort to claims that the AI could be developing its intelligene through the use of RL, completely independently of all human attempts to design the control mechanism. By this means, these people eliminate the idea that there is any such thing as a human programmer who comes along and writes the supergoal which stops the instrumental goals from going up to the top of the stack.

This maneuver is, in my experience of talking to people about such scenarios, utterly universal. I repeat: every time they are backed into a corner and confronted by the manifestly easy solutions, they AMEND THE SCENARIO TO MAKE THE AI CONTROLLED BY REINFORCEMENT LEARNING.

That is why I refer to reinforcement learning as the one thing that all these AI Risk scenarios (the ones popularized by MIRI, FHI, and others) have as a fundamental architectural assumption.

Okay, that is the end of that clarification.  Now back to the main line of the paper...

I want to set this essay in the context of some important comments about AI safety made by Holden Karnofsky at openphilanthropy.org.  Here is his take on one of the "challenges" we face in ensuring that AI systems do not become dangerous:

Going into the details of these challenges is beyond the scope of this post, but to give a sense for non-technical readers of what a relevant challenge might look like, I will elaborate briefly on one challenge. A reinforcement learning system is designed to learn to behave in a way that maximizes a quantitative “reward” signal that it receives periodically from its environment - for example, DeepMind’s Atari player is a reinforcement learning system that learns to choose controller inputs (its behavior) in order to maximize the game score (which the system receives as “reward”), and this produces very good play on many Atari games. However, if a future reinforcement learning system’s inputs and behaviors are not constrained to a video game, and if the system is good enough at learning, a new solution could become available: the system could maximize rewards by directly modifying its reward “sensor” to always report the maximum possible reward, and by avoiding being shut down or modified back for as long as possible. This behavior is a formally correct solution to the reinforcement learning problem, but it is probably not the desired behavior. And this behavior might not emerge until a system became quite sophisticated and had access to a lot of real-world data (enough to find and execute on this strategy), so a system could appear “safe” based on testing and turn out to be problematic when deployed in a higher-stakes setting. The challenge here is to design a variant of reinforcement learning that would not result in this kind of behavior; intuitively, the challenge would be to design the system to pursue some actual goal in the environment that is only indirectly observable, instead of pursuing problematic proxy measures of that goal (such as a “hackable” reward signal).

My focus in the remainder of this essay is on the sudden jump from DeepMind's Atari game playing program to the fully intelligent AI capable of outwitting humanity.  They are assumed to both involve RL.  The extrapolation of RL to the global control level in a superintelligent AI is unwarranted, and that means that this supposed threat is a fiction.

What Reinforcement Learning is.

Let's begin by trying to explain what "reinforcement learning" (RL) actually is.  Back in the early days of Behaviorism (which became the dominant style of research in psychology in the 1930s) some researchers decided to focus on simple experiments like putting a rat into a cage with a lever and a food-pellet dispenser, and then connecting these two things in such a way that if the rat pressed the lever, a pellet would be dispensed.  Would the rat notice this?  Of course it did, and soon the rat would be spending inordinate amounts of time just pressing the lever, whether food came out or not.

What the researchers did next was to propose that the only thing of importance "inside" the rat's mind was a set of connections between behaviors (e.g. pressing the lever), stimuli (e.g a visual image of the lever) and rewards (e.g. getting a food pellet).  Critical to all of this was the idea that if a behavior was followed by a reward, a direct connection between the two would be strengthened in such a way that future behavior choices would be influenced by that strong connection.

That is reinforcement learning: you "reinforce" a behavior if it appears to be associated with a reward.  What these researchers really wanted to claim was that this mechanism could explain everything important going on inside the rat's mind.  And, with a few judicious extensions, they were soon arguing that the same type of explanation would work for the behavior of all "thinking" creatures.

I want you to notice something very important buried in this idea.  The connection between the two reward and action is basically a single wire with a strength number on it.  The rat does not weigh up a lot of pros and cons; it doesn't think about anything, does not engage in any problem solving or planning, does not contemplate the whole idea of food, or the motivations of the idiot humans outside the cage.  The rat is not supposed to be capable of any of that: it just goes bang! lever-press, bang! food-pellet-appears, bang! increase-strength-of-connection.

The Demise of Reinforcement Learning

Now let's fast forward to the 1960s.  Cognitive psychologists are finally sick and tired of the ridiculousness of the whole Behaviorist programme.  It might be able to explain the rat-pellet-lever situation, but for anything more complex, it sucks.  Behaviorists have spent decades engaging in all kinds of mental contortionist tricks to argue that they would eventually be able to explain all of human behavior without using much more than those direct connections between stimuli, behaviors and rewards ... but by 1960 the psychology community has stopped believing that nonsense, because it never worked.

Is it possible to summarize the main reason why they rejected it?  Sure.  For one thing, almost all realistic behaviors involve rewards that arrive long after the behaviors that cause them, so there is a gigantic problem with deciding which behaviors should be reinforced, for a given reward.  Suppose you spend years going to college, enduring hard work and very low-income lifestyle.  Then years later you get a good job and pay off your college loan.  Was this because, like the rat, you happened to try the going-to-college-and-suffering-poverty behavior many times before, and the first time you tried it you got a good-job-that-paid-off-your-loan reward? And was it the case that you noticed the connection between reward and behavior (uh ... how did you do that, by the way? the two were separated in time by a decade!), and your brain automatically reinforced the connection between those two?

A More Realistic Example

Or, on a smaller scale, consider what you are doing when you sit in the library with a mathematics text, trying to solve equations.  What reward are you seeking?  A little dopamine hit, perhaps?  (That is the modern story that neuroscientists sell).

Well, maybe, but let's try to stay focused on the precise idea that the Behaviorists were trying to push:  that original rat was emphatically NOT supposed to do lots of thinking and analysis and imagining when it decided to push the lever, it was supposed to push the lever by chance, and then it happened to notice that a reward came.

The whole point of the RL mechanism is that the intelligent system doesn't engage in a huge, complex, structured analysis of the situation, when it tries to decide what to do (if it did, the explanation for why the creature did what it did would be in the analysis itself, after all!). Instead, the RL people want you to believe that the RL mechanism did the heavy lifting, and that story is absolutely critical to RL.  The rat simply tries a behavior at random - with no understanding of its meaning - and it is only because a reward then arrives, that the rat decides that in the future it will go press the lever again.

So, going back to you, sitting in the library doing your mathematics homework.  Did you solve that last equation because you had a previous episode where you just happened to try the behavior of solving that exact same equation, and got a dopamine hit (which felt good)?  The RL theorist needs you to believe that you really did.  The RL theorist would say that you somehow did a search through all the quintillions of possible actions you could take, sitting there in front of an equation that requires L'Hôpital's Rule, and in spite of the fact that the list of possible actions included such possibilities as jumping-on-the-table-and-singing-I-am-the-walrus, and driving-home-to-get-a-marmite-sandwich, and asking-the-librarian-to-go-for-some-cheeky-nandos, you decide instead that the thing that would give you the best dopamine hit right now would be applying L'Hôpital's Rule to the equation.

I hope I have made it clear that there is something profoundly disturbing about the RL/Behaviorist explanation for what is happening in a situation like this.

Whenever the Behaviorists tried to find arguments to explain their way out of scenarios like that, they always seemed to add machinery onto the basic RL mechanism.  "Okay," they would say, "so it's true the  basic forms of RL don't work ... but if you add some more stuff onto the basic mechanism, like maybe the human keeps a few records of what they did, and they occasionally scan through the records and boost a few reinforcement connections here and there, and ... blah blah blah...".

The trouble with this kind of extra machinery is that after a while, the tail began to wag the dog.

People started to point out that the extra machinery was where all the action was happening.  And that extra machinery was most emphatically not designed as a kind of RL mechanism, itself.  In theory, there was still a tiny bit of reinforcement learning somewhere deep down inside all the extra machinery, but eventually people just said "What's the point?"  Why even bother to use the RL language anymore?  The RL, if it is there at all, is pointless.  A lot of parameter values get changed in complex ways, inside all the extra machinery, so why even bother to mention the one parameter among thousands, that is supposed to be RL, when it is obvious that the structure of that extra machinery is what matters.

That "extra machinery" is what eventually became all the many and varied mechanisms discussed by cognitive psychologists.  Their understanding of how minds work is not that reinforcement learning plus extra machinery can be used to explain cognition -- they would simply assert that reinforcement learning does not exist as a way to understand cognition.

Take home message:  RL has become an irrelevance in explanations of human cognition.

Artificial Intelligence and RL

Now let's get back to Holden Karnofsky's comment, above.

He points out that there exists a deep learning program that can learn to play arcade games, and it uses RL.

(I should point out that his chosen example was not by any means pure RL.  This software already had other mechanisms in it, so the slippery slope toward RL+extra machinery has already begun.)

Sadly, the DeepMind’s Atari player is nothing more sophisticated than a rat.  It is so mind-bogglingly simple that it actual can be controlled by RL.  Actually, it is unfair to call it a rat:  rats are way smarter than this program, so it would be better to compare it to an amoeba, or an insect.

This is typical of claims that RL works.  If you start scanning the literature you will find that all the cited cases use systems that are so trivial that RL really does have a chance of working.

(Here is one example, picked almost at random:  Rivest, Bengio and Kalaska.  At first it seems that they are talking about deriving an RL system from what is known about the brain.  But after a lot of preamble they give us instead just an RL program that does the amazing task of ... controlling a double-jointed pendulum.  The same story is repeated in endless AI papers about reinforcement learning:  at the end of the day, the algorithm is applied to a trivially simple system.)

But Karnofsky wants to go beyond just the trivial Atari player, he wants to ask what happens when the software is expanded and augmented.  In his words, "[what] if a future reinforcement learning system’s inputs and behaviors are not constrained to a video game, and if the system is good enough at learning..."?

That is where everything goes off the rails.

In practice there is not and never has been any such thing as augmenting and expanding an RL system until it becomes much more generally intelligent.  We are asked to imagine that this "system [might become] quite sophisticated and [get] access to a lot of real-world data (enough to find and execute on this strategy)...".  In other words, we are being asked to buy the idea that there might be such a thing as an RL system that is fully as intelligent as a human being (smarter, in fact, since we are supposed to be in danger from its devious plans), but which is still driven by a reinforcement learning mechanism.

I see two problems here.  One is that this scenario ignores the fact that three decades of trying to get RL to work as a theory of human cognition produced nothing.  That period in the history of psychology was almost universally condemned as complete write-off.  As far as we know it simply does not scale up.

But the second point is even worse:  not only did psychologists fail to get it to work as a theory of human cognition, but AI researchers also failed to build one that works for anything approaching a real-world task.  What they have achieved is RL systems that do very tiny, narrow-AI tasks.

The textbooks might describe RL as if it means something, but they conspicuously neglect to mention that, actually, all the talking, thinking, development and implementation work since at least the 1960s has failed to result in an RL system that could actually control meaningful real-world behavior.  I do not know if AI researchers have been trying to do this and failing, or if they have not been trying at all (on the grounds that have no idea how to even start), but what I do know is that they have published no examples.

The Best Reinforcement Learning in the World?

To give a flavor of how bad this is, consider that in the 2008 Second Annual Reinforcement Learning Competition, the AI systems were supposed to compete in categories like:

Mountain Car: Perhaps the most well-known reinforcement learning benchmark task, in which an agent must learn how to drive an underpowered car up a steep mountain road.

Tetris: The hugely popular video game, in which four-block shapes must be manipulated to form complete lines when they fall.

Helicopter Hovering: A simulator, based on the work of Andrew Ng and collaborators, which requires an agent to learn to control a hovering helicopter.

Keepaway: A challenging task, based on the RoboCup soccer simulator, that requires a team of three robots to maintain possession of the ball while two other robots attempt to steal it.

As of the most recent RL competition, little has changed.  They are still competing to see whose RL algorithm can best learn how to keep a helicopter stable -- an insect-level intelligence task.  Whether they are succeeding in getting those helicopters to run beautifully smoothly or not is beside the point -- the point is that helicopter hovering behavior is a fundamentally shallow task.

Will RL Ever Become Superintelligent?

I suppose that someone without a technical background might look at all of the above and say "Well, even so ... perhaps we are only in the early stages of RL development, and perhaps any minute now someone will crack the problem and create an RL type of AI that becomes superintelligent.  You can't say you are sure that will not happen?"

Well, let's put it this way.  All of evidence is that the resource requirements for RL explode exponentially when you try to scale it up.  That means:

  • If you want to use RL to learn how to control a stick balancing on end, you will need an Arduino.
  • If you want to use RL to learn how to control a model helicopter, you will need a PC.
  • If you want to use RL to learn how to play Go, or Atari games, you will need the Google Brain (tens of thousands of cores).
  • If you want to use RL to learn how to control an artificial rat, which can run around and get by in the real world, you will need all the processing power currently available on this planet (and then some).
  • If you want to use RL to learn how to cook a meal, you will need all the computing power in the local galactic cluster.
  • If you want to use RL to learn how to be as smart as Winnie the Pooh (a bear, I will remind you, of very little brain), you will need to convert every molecule in the universe into a computer.

That is what exponential resource requirements are all about.

Conclusion

Reinforcement learning first came to prominence in 1938 with Skinner's The Behavior of Organisms: An Experimental Analysis.  But after nearly 80 years of experiments, mathematical theories and computational experiments, and after being written into the standard AI textbooks - and now after being widely assumed as the theory of how future Artificial General Intelligence systems will probably be controlled - after all this it seems that the best actual RL algorithm can barely learn how to perform tasks that an insect can do.

And yet there are dozens - if not hundreds - of people now inhabiting the "existential risk ecosystem", who claim to be so sure of how future AGI systems will be controlled, that they are already taking a large stream of donated money, promising to do research on how this failed control paradigm can be modified so it will not turn around and kill us.

And when you interrogate people in that ecosystem, to find out what exactly they see as the main dangers of future AGI, they quote - again and again and again - scenarios in which an AGI is controlled by Reinforcement Learning, and it is both superintelligent and dangerous psychopathic.

These RL-controlled AGIs are a fiction, and the flow of money to research projects based on RL-AGI needs to stop.

New Comment
50 comments, sorted by Click to highlight new comments since: Today at 11:59 AM

...add a primary supergoal which imposes a restriction on the degree to which "instrumental goals" are allowed to supercede the power of other goals. At a stroke, every problem he describes in the paper disappears, with the single addition of a goal that governs the use of instrumental goals -- the system cannot say "If I want to achieve goal X I could do that more efficiently if I boosted my power, so therefore I should boost my power to cosmic levels first, and then get back to goal X."

This is not so simple. "Power" and "instrumental goals" are abstractions. Not things that actually can be programmed into an AI. The AI has no concept of "power" and will do whatever leads to it's goal.

Imagine for instance, a chess playing AI. You tell it to limit it's "power". How do you do this? Is using the queen too powerful? Is taking opposing pieces too powerful? How do you define "power" precisely, in a way that can be coded into an actual algorithm?

Of course the issues of AI risk go well beyond just figuring out how to build an AI that doesn't want to take over the world. Even if your proposed solution could actually work, you can't stop other people from making AIs that don't use it, or use a bugged version of it, etc.


The rest of your essay is just a misunderstanding of what reinforcement learning is. Yes it does have origins in old psychology research. But the field has moved on an awful lot since then.

There are many different ideas on how to implement RL algorithms. But the simplest is, to use an algorithm that can predict the future reward. And then take an action which leads to the highest reward.

This procedure is totally independent of what method is used to predict the future reward. There is absolutely nothing that says it has to be an algorithm that can only make short term predictions. Sure, it's a lot easier to make algorithms that just predict the short term. But that doesn't mean it's impossible to do otherwise. Humans sure seem capable of predicting the long term future.

The RL theorist would say that you somehow did a search through all the quintillions of possible actions you could take, sitting there in front of an equation that requires L'Hôpital's Rule, and in spite of the fact that the list of possible actions included such possibilities as jumping-on-the-table-and-singing-I-am-the-walrus, and driving-home-to-get-a-marmite-sandwich, and asking-the-librarian-to-go-for-some-cheeky-nandos, you decide instead that the thing that would give you the best dopamine hit right now would be applying L'Hôpital's Rule to the equation.

This just demonstrates how badly you misunderstand RL. Real AIs like AlphaGo don't need to search through the entire search space. In fact the reason they work over other methods is because they avoid that. It uses machine learning to eliminate large chunks of the search space almost instantly. And so it only needs to consider a few candidate paths.

People started to point out that the extra machinery was where all the action was happening. And that extra machinery was most emphatically not designed as a kind of RL mechanism, itself. In theory, there was still a tiny bit of reinforcement learning somewhere deep down inside all the extra machinery, but eventually people just said "What's the point?" Why even bother to use the RL language anymore? The RL, if it is there at all, is pointless. A lot of parameter values get changed in complex ways, inside all the extra machinery, so why even bother to mention the one parameter among thousands, that is supposed to be RL, when it is obvious that the structure of that extra machinery is what matters.

RL is a useful concept, because it lets you get useful work out of other, more limited algorithms. A neural net, on it's own, can't do anything but supervised learning. You give it some inputs, and it makes a prediction of what the output should be. You can't use this to play a video game. You need RL to build on top of it to do anything interesting.


You go on and on about how the tasks that AI researchers achieve with AI are "too simple". This is just typical AI Effect. Problems in AI don't seem as hard as they actually are, so significant progress never seems like progress at all, from the outside.

But whatever. You are right that currently NNs are restricted to "simple" tasks that don't require long term prediction or planning. Because long term prediction is hard. But again, I don't see any reason to believe it's impossible. It would certainly be much more complex than today's feed-forward NNs, but it would still be RL. It would still be just doing predictions about what action leads to the most reward, and taking that action.

There is some recent work in this regard. Researcher are combining "new" NN methods with "old" planning algorithms and symbolic methods, so they can get the best of both worlds.

You keep making this assertion that RL has exponential resource requirements. It doesn't. It's a simple loop; predict the action that leads to the highest reward, and take it. With a number of variations, but they are all similar.

The current machine learning algorithms that RL methods use, might have exponential requirements. But so what? No one is claiming that future AIs will be just like today's machine learning algorithms.


Let's say you are right about everything. RL doesn't scale, and future AIs will be based on something entirely different that we can't even imagine.

So what? The same problems that affect RL apply to every AI architecture. That is the control problem. Making AIs do what we want. The problem that most AI goals lead to them seeking more power to better maximize their goals. The problem is that most utility functions are not aligned with human values.

Unless you have an alternative AI method that isn't subject to this, you aren't adding anything. And I'm pretty sure you don't.

[-][anonymous]7y-40

You say:

The rest of your essay is just a misunderstanding of what reinforcement learning is. Yes it does have origins in old psychology research. But the field has moved on an awful lot since then. There are many different ideas on how to implement RL algorithms. But the simplest is, to use an algorithm that can predict the future reward. And then take an action which leads to the highest reward. This procedure is totally independent of what method is used to predict the future reward.

I really do not like being told that I do not know what reinforcement learning is, by someone who goes on to demonstrate that they haven't a clue and can't be bothered to actually read the essay carefully.

Bye.

In my book, "reinforcement learning" has very little to do with its behaviorist origins anymore. Rather, I understand it the way it is defined in Sutton & Barto (chap. 1.1):

Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal. [...] Reinforcement learning is defined not by characterizing learning methods, but by characterizing a learning problem. Any method that is well suited to solving that problem, we consider to be a reinforcement learning method. [...]

Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. This is in contrast with many approaches that consider subproblems without addressing how they might fit into a larger picture. For example, we have mentioned that much of machine learning research is concerned with supervised learning without explicitly specifying how such an ability would finally be useful. Other researchers have developed theories of planning with general goals, but without considering planning’s role in real-time decisionmaking, or the question of where the predictive models necessary for planning would come from. Although these approaches have yielded many useful results, their focus on isolated subproblems is a significant limitation.

Reinforcement learning takes the opposite tack, starting with a complete, interactive, goal-seeking agent. All reinforcement learning agents have explicit goals, can sense aspects of their environments, and can choose actions to influence their environments. Moreover, it is usually assumed from the beginning that the agent has to operate despite significant uncertainty about the environment it faces. When reinforcement learning involves planning, it has to address the interplay between planning and real-time action selection, as well as the question of how environmental models are acquired and improved. When reinforcement learning involves supervised learning, it does so for specific reasons that determine which capabilities are critical and which are not. For learning research to make progress, important subproblems have to be isolated and studied, but they should be subproblems that play clear roles in complete, interactive, goal-seeking agents, even if all the details of the complete agent cannot yet be filled in.

Thus your critique seems misplaced - for instance, you say that

The whole point of the RL mechanism is that the intelligent system doesn't engage in a huge, complex, structured analysis of the situation, when it tries to decide what to do (if it did, the explanation for why the creature did what it did would be in the analysis itself, after all!). Instead, the RL people want you to believe that the RL mechanism did the heavy lifting, and that story is absolutely critical to RL. The rat simply tries a behavior at random - with no understanding of its meaning - and it is only because a reward then arrives, that the rat decides that in the future it will go press the lever again.

... but as is noted in the excerpt above, in modern RL, there's no single "RL mechanism" - rather any method which successfully solves the reinforcement learning problem is "an RL method". Nothing requires that method to be "try things totally at random with no understanding of their meaning" (even if that is one RL method which may be suited to some very simple situations).

[-][anonymous]7y-30

This was all addressed in my essay - what you just quoted was Sutton and Barto doing exactly what I described, introducing "extra machinery" in order to get RL to work.

So I already responded to you. The relevant points are these:

1 -- If all that extra machinery becomes complex enough to handle real situations, it starts to become meaningless to insist that the ONLY way to choose which policy to adopt should be a single "reward" signal. Why not make the decision a complex, distributed computation? Why insist that the maximization of that one number MUST be the way the system operates? After all, a realistic learning mechanism (the extra machinery) will have thousands of components operating in collaboration with one another, and these other mechanisms will be using dozens or hundreds of internal parameters to control how they work. And then, the final result of all that cognitive apparatus is that there is a decision point where maximisation of a single number is computed, to select between all those plans?

If you read what I wrote (and read the background history) you will see that psychologists considered that scenario exhaustively, and that is why they abandoned the whole paradigm. The extra machinery was where the real action was, and the imposition of that final step (deciding policy based on reward signal) became a joke. Or worse than a joke: nobody, to my knowledge, could actually get such systems to work, because the subtlety and intelligence achieved by the extra machinery would be thrown in the trash by that back-end decision.

2 -- Although Sutton and Barto allow any kind of learning mechanism to be called "RL", in practice that other stuff has never become particularly sophisticated, EXCEPT in those cases where it become totally dominant, and the researcher abandoned the reward signal. In simple terms: yes, but RL stops working when the other stuff becomes clever.

Conclusion: you did not read the essay carefully enough. Your point was already covered.

[-]gjm7y90

And when you interrogate people in that ecosystem, to find out what exactly they see as the main dangers of future AGI, they quote - again and again and again - scenarios in which an AGI is controlled by Reinforcement Learning, and it is both superintelligent and dangerous psychopathic.

I would like to see some more evidence that this is actually true. The post contains one example of someone assuming an AI controlled by RL, quoted from this blog post by Holden Karnofsky. But that blog post very explicitly does not assume that what we need to worry about most is reinforcement learners run amok. Perhaps it assumes that that is one thing we need to worry about, and perhaps it is badly wrong to assume that, but it doesn't at all make the assumption that if AIs become dangerous then they will be reinforcement learners.

So perhaps the real target here is MIRI rather than, say, Holden Karnofsky? (Perhaps they are the recipients of the "large stream of donated money".) Well, I had a look at MIRI's description of their mission (nothing about reinforcement learning, either explicitly or implicitly) and their "technical agenda" (ditto) and the paper that describes that agenda in more detail (which mentions reinforcement learning as something that might form a part of how an AI works but certainly neither states nor assumes that AIs will be reinforcement learners).

Maybe the issue is popularizations like Bostrom's "Superintelligence"? Well, that at least has "reinforcement learning" in its index. I checked all the places pointed to by that index entry; none of them goes any further than suggesting that reinforcement learning might be one element of how an AI system comes to be.

Perhaps, then, the target is Less Wrong more specifically: maybe the idea is that the community here has been infected by the idea that what we need to be afraid of is systems that attain superintelligence through reinforcement learning alone. That's a harder one to assess -- there's a lot of writing on Less Wrong, and any given bit can't be assumed to be endorsed by everyone here. So I put <<<"reinforcement learning" site:lesswrong.com>>> into Google and followed a selection of links from the first few pages of results ... and I didn't find anything that seems to expect that any system will attain superintelligence through reinforcement learning alone, nor anything that assumes that AI and reinforcement learning are the same thing, nor anything else of the sort.

(The picture on LW looks to me more like this: most discussion of AI doesn't use the term "reinforcement learning" or anything much like it; sometimes reinforcement-learning agents are used in toy models, a practice that seems to me obviously harmless; sometimes the possibility that a reinforcement-learning mechanism might be part of how an AI system learns, which again seems reasonable especially given that some of the successes (such as they are) that AI has had have in fact worked partly by reinforcement learning; sometimes it's stated or assumed that some of what goes on in human brains is kinda reinforcement-learning-y, which again seems eminently reasonable.)

So I'm left rather confused. What are these research projects based on RL-AGI that need to stop getting funded? Who is quoting "again and again and again" scenarios in which a superintelligent AGI is controlled by reinforcement learning? Where are the non-straw-man reinforcement learning hype merchants?

[-][anonymous]7y10

You say "I would like to see some more evidence that this is actually true."

Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.

So the reply is: pick any one you like.

In all my discussions with people who defend those scenarios, I am pretty sure that EVERY one of those people eventually retreated to a point where they declared that the hypothetical AI was driven by RL. It turns out to be a place of last resort, when lesser lunacies of the scenario are shown to be untenable. Always, the refrain is "Yes, but this system uses reinforcement learning: it's control mechanism was not programmed by someone explicitly".

At that point, in those conversations, the other person then adds that surely I know that RL is a viable basis for that AI.

The last time I had a f2f conversation along those lines it was with Daniel Dewey, when we met at Stanford.

I came here to write exactly what gjm said, and your response is only to repeat the assertion "Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption."

What? What about all the scenarios in IEM or Superintelligence? Omohundro's paper on instrumental drives? I can't think of anything which even mentions RL, and I can't see how any of it relies upon such an assumption.

So you're alleging that deep down people are implicitly assuming RL even though they don't say it, but I don't see why they would need to do this for their claims to work nor have I seen any examples of it.

[-][anonymous]7y20

Perhaps I assumed it was clearer than it was, so let me spell it out.

All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).

Those weaknesses lead straight to a set of solutions that are manifestly easy to implement. For example, in the case of Steve Omohundro's paper, it is almost trivial to suggest that for ALL of the types of AI he considers, he has forgot to add a primary supergoal which imposes a restriction on the degree to which all kinds of "instrumental goals" are allowed to supercede the power of other goals. At a stroke, every problem he describes in the paper disappears.

So, in response to the easy demolition of those weak scenarios, people who want to salvage the scenarios invariably resort to claims that the AI could be developing itself through the use of RL, completely independently of all human attempts to design the control mechanism. By this means, they eliminate the idea that there is any such thing as a human who comes along and writes the supergoal which stops the instrumental goals from going up to the top of the stack.

This maneuver is, in my experience of talking to people about such scenarios, utterly universal. I repeat: every time they are backed into a corner and confronted by the manifestly easy solutions, they AMEND THE SCENARIO TO MAKE THE AI CONTROLLED BY REINFORCEMENT LEARNING.

That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.

All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).

The idea of superintelligence at stake isn't "good at inferring what people want and then decides to do what people want," it's "competent at changing the environment". And if you program an explicit definition of 'happiness' into a machine, its definition of what it wants - human happiness - is not going to change no matter how competent it becomes. And there is no reason to expect that increases in competency lead to changes in values. Sure, it might be pretty easy to teach it the difference between actual human happiness and smiley faces, but it's a simplified example to demonstrate a broader point. You can rephrase it as "fulfill the intentions of programmers", but then you just kick things back a level with what you mean by "intentions", another concept which can be hacked, and so on.

Your argument for "swarm relaxation intelligence" is strange, as there is only one example of intelligence evolving to approximate the format you describe (not seven billion - human brains are conditionally dependent, obviously), and it's not even clear that human intelligence isn't equally well described as goal directed agency which optimizes for a premodern environment. The arguments in Basic AI Drives and other places don't say anything about how AI will be engineered, so they don't say anything about whether they're driven by logic, just about how it will behave, and all sorts of agents behave in generally logical ways without having explicit functions to do so. You can optimize without having any particular arrangement of machinery (humans do as well).

Anyway, in the future when making claims like this, it would be helpful to make it clear early on that you're not really responding to the arguments that AI safety research relies upon - you're responding to an alleged set of responses to the particular responses that you have given to AI safety research.

That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.

So you had two conversations. I suppose I'm just not convinced that there is an issue here: I think most people would probably reject the claims in your paper in the first place, rather than accepting them and trying a different route.

The idea of superintelligence at stake isn't "good at inferring what people want and then decides to do what people want," it's "competent at changing the environment".

It's both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.

And if you program an explicit definition of 'happiness' into a machine

What has that to do with NNs? You seem to be just regurgitating standard dogma. There is no reason to expect

[-][anonymous]7y-40

You have shown too little sign of understanding the issues, so I am done. Thank you for your comment.

[-][anonymous]7y00

I have modified the original essay to include a clarification of why I describe RL as ubiquitous.

When machine learning people talk about reinforcement, they are usually being not quite literal. In practice, if someone says they study reinforcement learning, they mean they're studying a class of algorithms that's sort of clustered around the traditional multi-armed-bandit stuff, and preserves a lot of the flavor, but which significantly deviate from the traditional formulation and which they're hoping will overcome the limitations you describe. Similarly, when people talk about RL in the context of far-future predictions, they are imagining something which shares a few of the premises of today's RL - a system which takes sequential actions and receives reward - but very little of the other details. This is poor terminology all around, but unfortunately research into "RL-flavored-but-not-literally-RL algorithms" doesn't have a good ring to it.

[-][anonymous]7y00

A system which only shares a few of the features of RL, without sticking strictly to the main tenets of the paradigm, is one that does not have the dangers that these people are talking about -- if they are just talking about systems that "generally take account of rewards, in a flexible and subtle way", then there is no way to postulate those extreme scenarios in which the AI does something utterly bizarre.

If someone posits that a future AI could do something like rewiring its sensors or its rewriting its internal definitions in order to maximize a "reward signal", then this begs the question of what kind of AI the person is assuming. If they assume crude RL, such a bizarre scenario is feasible.

But if on the other hand that person is being extremely inclusive about what they mean by "RL", and they actually mean to encompass systems that "generally take account of rewards, in a flexible and subtle way", then the scenario is nonsensical. It would be almost trivial to ensure that a system with that more nuanced design was constructed in such a way as to have checks and balances (global modeling of self, andlarge numbers of weak constraints) that prevent it from doing idiotic things like tampering with its sensors, etc.

Think of it in human terms. Teenagers don't want to take out the trash, right? So what if they redefine trash as "the stuff in my bedroom wastebasket"? Then they can just do a really easy task, and say they have satisfied the requirement (they get the reward TASK-COMPLETED and asscoiated dopamine hit, presumably). But every human who is smart enough to be worth talking about eventually realizes that IF they carry on tampering with definitions in order to comply with such things, they will ultimately be screwed. So they stop doing it. That is an example of a system that "generally takes account of rewards, in a flexible and subtle way".

But the people who discuss AI safety almost invariably talk in such a way that, actually, their assumed AI is following crude RL principles.

I'm trying a new way of wording this; maybe it'll be helpful, maybe it'll be worthless. We'll find out.

Suppose we have a program that executes code in a while loop. It's got some perceptual input systems, and some actuator output systems. We'll ignore, for the moment, the question of whether those are cameras and servomotors or an ethernet connection with bits flowing both ways.

It seems like there's an important distinction between the model this program has of the world ("If I open this valve, my velocity will change like so") and what it's trying to do ("it's better to be there than here."). It also seems like there's a fairly small set of possible classifications for what it's trying to do:

  1. Inactive: it never initiates any actions, and can basically be ignored.

  2. Reactive: it has a set of predefined reflexes, and serves basically as a complicated API.

  3. Active: it has some sort of preferences that it attempts to fulfill by modifying the world.

It looks to me like it doesn't matter much how the active system's preferences are implemented, that is, whether it's utility maximization, a reinforcement learning agent, taking the action that seems like the most likely action that it'll take, a congress of subagents that vote on actions, and so on. For any possible architecture, it's possible to exhibit an example from that architecture that falls into various sorts of preference traps.

For example, the trap of direct stimulation is one where the program spends all its computational resources on gaming its preference mechanism. If it's a utility maximizer, it just generates larger numbers to stick in the 'utility' memory location; if it's a congress, it keeps generating voters that are maximally happy with the plan of generating more voters; and so on.

There's a specific piece of mental machinery that avoids this trap: the understanding of the difference between the map and the territory, and specifying preferences over the territory instead of over the map. This pushes the traps up a meta-level; instead of gaming the map, now the program has to game the territory.

The motivation behind safety research, as I understand it, is to determine 1) what pieces of mental machinery we need to avoid those traps and 2) figure out how to make that machinery. That a particular system seems unlikely to be built (it seems unlikely that someone will make a robot that tries to mimic what it thinks a human will do by always choosing the most likely action for itself) doesn't affect the meta-level point that we need to find and develop trap-avoidance machinery, hopefully in a way that can be easily translated between architectures.

[-][anonymous]7y40

Hmmmmm... Interesting.

Okay, so the overall point is agreed ("... that we need to find and develop trap-avoidance machinery").

So if other people were to phrase the problem as you have phrased it here, describing the safety issue as all about trying to find AGI designs in which the danger of the AGI becoming loopy is minimized, then we are all pulling in the same direction.

But as you can imagine, the premise of the essay I wrote is that people are not speaking in those terms. They are - I would argue - postulating danger scenarios that are predicated on RL (and other fallacious assumptions) and then going off and doing research that is aimed at fixing the problems in hypothetical AI systems that are presumed to be built on an RL foundation.

That has two facets. First, donors are being scared into giving money by scenarios in which the AI does crazy things, but those crazy things are a direct result of the assumption that this AI is an insane RL engine that is gaming its reward function. Those donors are not being told that, actually, the scenario involves something that cannot be built (an RL superintelligence). Some people would call that fraudulent. Do you suppose they would want their money back if they were told that none of those actual scenarios are actually possible?

Second facet: when the research to combat the dangers is predicated on non-feasible RL extrapolations, the research is worthless. And the donors' money is wasted.

You might say "But nobody is assuming RL in those scenarios". My experience is quite the opposite - people who promote those scenarios usually resort to RL as a justification for why the AI's are so dangerous (RL doesn't need anyone handcoding it, so you can just push the button and it goes Foom! by itself).

First, donors are being scared into giving money by scenarios in which the AI does crazy things, but those crazy things are a direct result of the assumption that this AI is an insane RL engine that is gaming its reward function.

My impression is that the argument for AI risk is architecture independent, and the arguments use RL agents as an illustrative example instead of as a foundation. When I try to explain AI risk to people, I typically start with the principal agent problem from institutional design (that is, the relationship between the owner and the manager of a business), but this has its own defects (it exaggerates the degree to which self-interest plays a role in AI risk, relative to the communication difficulties).

Second facet: when the research to combat the dangers is predicated on non-feasible RL extrapolations, the research is worthless. And the donors' money is wasted.

My impression is that, five years ago, the arguments were mostly about how a utility maximizer would go loopy, not how something built on a RL learner would go loopy. Even so, it seems like the progress that's been made over the last five years is still useful for thinking about how to prevent RL learners from going loopy. Suppose that in another five years, it's clear that instead of RL learners, the AGI will have architecture X; then it also seems reasonable to me to expect that whatever we've figured out trying to prevent RL learners from going loopy will likely transfer over to architecture X.

You seem to think otherwise, but it's unclear to me why.

If we know nothing about architecture X, surely we should adopt a Bayesian 50:50 about whether some other architecture is applicable to it.

[-][anonymous]7y-10

How can anything transfer from a (hypothetical) mechanism that cannot possibly scale up? That is one of the most obvious truisms of all engineering and mathematics - if you invent a formalism that does not actually work, or which is inconsistent, or which does not scale, it is blindingly obvious that any efforts spent writing theories about how a future version of that formalism will behave, are pointless.

How can anything transfer from a (hypothetical) mechanism that cannot possibly scale up?

We can build neither Universal Turing machines nor Carnot engines, but that doesn't mean their properties aren't worthwhile to study.

[-][anonymous]7y20

Turing machines and Carnot engines are abstractions that manifestly can apply to real systems.

But just because something is an abstraction, doesn't mean it's properties apply to anything.

Consider a new abstraction called a "Quring Machine". It is like a Turing machine, but for any starting tape that it gets, it sends that tape off to a planet where there is lots of primordial soup, then waits for the planet to evolve lifeforms which discover the tape and then invent a Macintosh-Plus-Equivalent computer and then write a version of the original tape that runs on that computer, and then the Quring Machine outputs the symbols that come from that alien Mac Plus. If the first planet fails to evolve appropriately, it tries another one, and keeps trying until the right response comes back.

Now, is that worth studying?

Reinforcement Learning, when assumed as a control mechanism for a macroscopic intelligent system, contains exactly the sort of ridiculous mechanism inside it, as the Quring Machine. (Global RL requires staggering amounts of computation and long wait times, to accumulate enough experience for the competing policies to develop enough data for meaningful calculations about their relationship to rewards).

Turing machines and Carnot engines are abstractions that manifestly can apply to real systems.

But just because something is an abstraction, doesn't mean it's properties apply to anything.

Agreed with both points, but I'm unclear on whether or not you still endorse the claim that we can't get transfer from formalisms that do not actually work.

Global RL requires staggering amounts of computation and long wait times, to accumulate enough experience for the competing policies to develop enough data for meaningful calculations about their relationship to rewards

I agree with this, and I also agree with your point earlier that most of the work in modern ML systems that have a "RL core" is in the non-core parts that are doing the interesting pieces.

But it's still not clear to me why this makes you think that RL, because it's not a complete solution, won't be a part of whatever the complete solution ends up being. I don't think you could run a human on just reinforcement learning, as it seems likely that some other things are going on (like brain regions that seem hardwired to learn a particular thing), but I would also be surprised by a claim that no reinforcement learning is going on in humans.

Or maybe to put this a different way, I think there are problems probably inherent in all motivation systems, which you see with utility maximization and reinforcement learning and others. If we figure out a way to get around that problem with one system--say, finding a correction to a utility function that makes it corrigible--I also suspect that the solution will suggest equivalent solutions for other motivation mechanisms. (That is, given a utility function correction, it's probably easy to come up with a reinforcement learning update correction.)

This makes me mostly uninterested in the reasons why particular motivation mechanisms are intractable or unlikely to be what we actually use, unless those reasons are also reasons to expect that any solutions designed for that mechanism will be difficult to transfer to other mechanisms.

[-][anonymous]7y30

I have been struggling to find a way to respond, here.

When discussing this, we have to be really careful not to slip back and forth between "global RL", in the sense that the whole system learns through RL, and "micro-RL", where bits of the system using something like RL. I do keep trying to emphasize that I have no problem with the latter, if it proves feasible. I would never "claim that no reinforcement learning is going on in humans" because, quite the contrary, I believe it really IS going on there.

So where does that leave my essay, and this discussion? Well, a few things are important.

1 -- The incredible minuteness of the feasible types of RL must be kept in mind. In pure form, it explodes or becomes infeasible if the micro-domain gets above the reflex (or insect) level.

2 -- We need to remember that plain old "adaptation" is not RL. So, is there an adaptation mechanism that builds (e.g.) low level feature detectors in the visual system? I bet there is. Does it work by trying to optimize a single parameter? Maybe. Should we call that parameter a "reward" signal? Well, I guess we could. But it is equally possible that such mechanisms are simultaneously optimizing a few parameters, not just one. And it is also just as likely that such mechanisms are following rules that cannot be shoehorned into the architecture of RL (there being many other kinds of adaptation). Where am I going with this? Well, why would we care to distinguish the "RL style of adaptation mechanism" from other kinds of adaptation, down at that level? Why make a special distinction? When you think about it, those micro-RL mechanisms are boring and unremarkable ...... RL only becomes worth remarking on IF it is the explanation for intelligence as a whole. The behaviorists thought they were the Isaac Newtons of psychology, because they though that something like RL could explain everything. And it is only when it is proposed at that global level that it has dramatic significance, because then you could imagine an RL-controlled AI building and amplifying its own intelligence without programmer intervention.

3 -- Most importantly, if there do exist some "micro-RL" mechanisms somewhere in an intelligence, at very low levels where RL is feasible, those instances do not cause any of their properties to bleed upward to higher levels. This is the same as a really old saw ..... that, just because computers do all their basic computation with in binary, that does not mean that the highest levels of the computer must use binary numbers. Sometimes you say things that sort of imply that because RL could exist somewhere, therefore we could learn "maybe something" from those mechanisms, when it comes to other, higher aspects of the system. That really, really does not follow, and it is a dangerous mistake to make.

So, at the end of the day, my essay was targeting the use of the RL idea ONLY in those cases where it was assumed to be global. All other appearances of something RL-like just do not have any impact on arguments about AI motivation and goals.

Thats the wrong way round. They are generall cases of which the real life machines are special cases. Loosemore is saying that RL is a special case that doesn't generalise.

I'm not sure that I agree that a real car engine is a special case of a Carnot engine; I think the general case is a heat engine, of which a Carnot engine is a special case that is mathematically convenient but unattainable in physical reality.

[-][anonymous]7y20

I have edited the original essay to include a clarification of why I describe RL as being ubiquitous in AI Risk scenarios. I realized that some things that were obvious to me because of the long exposure I have had to people who try to justify the scenarios, was not obvious to everyone. My bad.

Take home message: RL has become an irrelevance in explanations of human cognition.

I believe you are overselling your point here.

You are right that behaviorists were traditionally guilty of fake reductionism; they used "reinforcement" as their applause light and bottom line for everything. The first ones were probably "tabula rasa" partial evolution denialists; the later ones gradually accepted that genes might have some minor influence on the mind, usually limited to making an organism more sensitive to certain types of reinforcement learning at certain age. The first ones were also greedy reductionists; for them the causal chain "reinforcement -> thoughts -> behavior" was already too long and unscientific; it had to be "reinforcement -> behavior" directly, because it is known that thoughts are not real, they are not measurable, so scientists are not allowed to talk about them. The later ones gradually fixed this by introducing "black boxes" in their flowcharts, which was a politically correct way for Vulcans to talk about thoughts and emotions and other disgusting human stuff.

But I believe that just like behaviorists got into a silly extreme by trying to avoid the silliness of Freudism, you are also trying to reverse the stupidity of behaviorists, replacing "human cognition is RL" by "RL is irrelevant to human cognition".

I would buy your argument, if you would be talking about cognition in general. I find it credible that there may be some kind of intelligence that doesn't use RL at all. (Well, this is kinda an argument by my ignorance, but from the inside it seems valid.) I just don't believe that human intelligence happens to be the one.

Following my "read the Sequences" mantra, consider this quote from "Avoiding Your Belief's Real Weak Points":

More than anything, the grip of religion is sustained by people just-not-thinking-about the real weak points of their religion. I don't think this is a matter of training, but a matter of instinct. People don't think about the real weak points of their beliefs for the same reason they don't touch an oven's red-hot burners; it's painful.

Also, a quote from "Positive Bias: Look Into the Dark":

One may be lectured on positive bias for days, and yet overlook it in-the-moment. Positive bias is not something we do as a matter of logic, or even as a matter of emotional attachment. (...) the mistake is sub-verbal, on the level of imagery, of instinctive reactions. (...) Which example automatically pops into your head? You have to learn, wordlessly, to zag instead of zig. You have to learn to flinch toward the zero, instead of away from it. (...) So much of a rationalist's skill is below the level of words.

The human brain is in some sense composed of layers: the "human layer", upon the "mammalian layer", upon the "lizard layer". The intentional strategic thinking happens on the human layer, while the behaviorists are trying to explain everything at the lower levels. This is why behaviorists fail to understand (or refuse to believe in) the specifically-human stuff.

However, the layers are not clearly separated. The "lizard and mammalian layers" can, and regularly do, override the "human layer" functionality. You may be trying to devise a smart strategy for achieving your goals, but the lower layers will at random moment start interrupting by "hey, that looks scary!" or "that feels low-status!", and the whole branches of the decision tree get abandoned, often without even noticing that this happened; it just feels like the whole branch wasn't even there. It's hard to implement rationality on broken hardware.

Okay; this example only proves that RL can have harmful impact on human cognition. But that already makes it relevant. If nothing else, strategically using RL to override the existing harmful RL could benefit human cognition.

(I also believe that RL has a positive role in human cognition, if merely by pruning the potentially infinite decision trees, but I am leaving this to experts.)

EDIT:

For applied rationality purposes, even if the role of RL in making good strategic decisions would be negligible, good strategies for your life usually include making RL work in your favor. For example, a decision to exercise regularly is not one you make by RL (you don't randomly try different schedules and notice that when you exercise you feel better). But if you make a decision to exercise, having some reinforcements in the process increases the probability that you will follow through the plan. But this is already irrelevant to the topic of the article.

[-][anonymous]7y-20

Viliam,

Thanks for your response.

(Preamble: let's keep the "Sequences" out of this. I am not religious, and citing the "Sequences" is an appeal to religious authority - a religious authority that I do not accept.)

I am trying to read beyond the letter of your reply, to get to the spirit of it, and I think I see an important disconnect between the point I was making, and the one you are making in return. What you said is valid, but it doesn't bear on my point.

You see, nobody (least of all me) is denying the existence of reinforcements/rewards/feedback which are noticed by the intelligent system and used for guiding behavior. In general, there will be huge amounts of that. But all that means is that the system is "somewhat sensitive to rewards".

But that has no bearing on the issue I was addressing, because when people talk about an AI being controlled by RL, they are claiming something far more specific than that the system is "somewhat sensitive to rewards". They are claiming the existence of an actual RL mechanism, very close to the crude sort that I discussed in the essay. If they were NOT making this strong claim, they would be unable to make the claims they do about the behavior of those AGI systems.

For example, when Holden Karnofsky talks about RL systems going off the rails, his scenario is meaningless unless what he is talking about is a crude RL system. No strong arguments about AGI danger can be inferred, if all he means is that these systems will be "somewhat sensitive to rewards", and no work could be done if the researchers trying to reassure Karnofsky go out there and write down mathematical or logical analyses of such systems.

It is only when "RL" is taken from the simple applications used today (applications that follow the mechanism described in the AI textbooks), then extrapolated WHOLESALE to the global control level of an AI, that any statements can be made. And, if you look at all the places where "RL" is mentioned in reference to AI safety, it is always clear from the context that this is what they mean. They do not mean that the AI just has a few places in it where it takes some account of rewards.

I agree with you that trying to build the whole AI on a behaviorist-style RL would most likely not work. But see below, Kaj Sotala says that "RL" means something else in the machine learning context. As I am not familiar with machine learning, I can't say anything meanngful about this, other than that I'd like to see your answer to Kaj's comment.

[-][anonymous]7y00

See my reply to Kaj. His point was not valid because I already covered that in detail in the original essay.

The site is basically founded on the sequences. If you reject them, then why bother with LW (which is your choice anyway, but referring to them should be expected), and if you don't reject them, then why complain about them being brought up?

[-]gjm7y40

I don't think this is a good way of looking at things.

The Sequences are an important part of LW history. I would guess that most LW regulars mostly agree with most of the prominent ideas in them. (As do plenty of people who aren't LWers at all.) They say many sensible and useful things. But they aren't the sort of thing it makes sense to "accept" or "reject" wholesale. That really is, as Richard_Loosemore says, the way religions tend to think about their traditions; it leads their adherents into bad thinking, and doing the same here would do the same to us.

Now, in this particular case, I don't think there was anything very religion-y about Viliam's quotations from the Sequences. (Despite his use of the word "mantra" :-).) He found something that made the point he wanted and quoted it, much as one might quote from a textbook or a familiar work of literature. So I don't think Richard's "let's keep the Sequences out of it: I'm not religious" response was warranted -- but I think that response is better understood as an expression of the longstanding Loosemore-Yudkowsky hostility than as a serious assessment of the merits of the Sequences or any specific idea found in them.

Be that as it may, the appropriate reaction is more like "fair enough" or "Viliam wasn't actually using the Sequences the way you suggest, but never mind" than "if you Reject The Sequences then you should keep away from here".

(Actually, I would say more or less the same even if LW were a religious-style community where membership is predicated on Accepting The Sequences. A community should be open to criticism from the outside.)

You walked right into that. Why not just say that those particular postings explain relevant points?

why bother with LW

[-][anonymous]7y-20

I would no more "point out specific things that aren't accurate about the sequence posts" than I would waste my time analyzing the seedier parts of the Advice section of the local bookstore.

[-][anonymous]7y-30

If this site is "founded on the Sequences" then it is a religious group or personality cult, because the defining feature of the latter sort of entity is that it centers around a sacred text.

Members of Lesswrong vehemently deny that it is a religious group or personality cult.

Am I to take it that you think it really is?

Or to ask a more direct question: are you declaring that membership of this community should be allowed only for people who swear allegiance to The Sequences? That others should be ejected or vilified?

(I also need to point out that I am pretty sure the Sequences decry Appeals To Authority. What are the constant references to the Sequences, except Appeals To Authority? I have always been a little unclear on that point.)

I also need to point out that I am pretty sure the Sequences decry Appeals To Authority. What are the constant references to the Sequences, except Appeals To Authority? I have always been a little unclear on that point.

The main reason those happen is to establish a shared language and background concepts. If I want to defuse an argument about whether vanilla or chocolate is the best flavor for ice cream, then I can link to 2-Place and 1-Place Words, not in order to swing the day for my chosen side, but in order to either get people to drop the argument as mistaken or argue about the real issue (perhaps whether we should stock the fridge with one or the other). This is way cleaner than trying to recreate from scratch the argument for seeing adjectives as describing observer-observed relationships, rather than being attributes of the observed.

The function of Viliam's quotes seems to have been to provide examples of human thinking behaving in an RL-like fashion, drawn from the sequences rather than other places mostly because of availability.

Throwing around the "religion" label seems to be committing the non-centrist fallacy . . . . .

The answer to your question depends on what exactly it is that you're asking. Do I believe most of the sequence posts are correct? Yes. Do I believe it is useful to treat them as standards? Yeah. Do I think you aren't allowed to criticize them? No, by all means, if you have issues with their content, we can discuss that (I have criticized them once). But I think you should point out specific things that aren't accurate about the sequence posts, rather than rejecting them for the sake of it.

For me, "reading the Sequences" is like "reading the FAQ", except that instead of "frequently asked questions" it is more like "questions that should have been asked, but most people believe they already have an answer, which usually involves some kind of confusing the map with the territory".

Or, to use an educational metaphor, it's like asking people to study Algebra 101 before they start attending Algebra 201 classes... because if they ignore that, you already know people are going to end up confused and then most of the time will be spent repeating the Algebra 101 knowledge and not getting further.

The idea was that after learning some elementary stuff, we (the LessWrong community) would be able to move forward and discuss more advanced topics. Well, it was a nice idea...

Are you familiar with the control theory paradigm of psychology? It's mostly from Behavior: The Control of Perception, and I talk about it here. To explain it shortly with a bunch of jargon, he sees humans as hierarchical negative feedback control systems, where each layer perceives the layers below it and acts by setting their reference levels, in order to minimize the error with respect to its reference level, set by a higher level.

There's obviously a lowest level--the individual nerves that control muscle fibers or glands or so on--and there's also, given the finite number of neurons in the brain, an upper level, where the reference level is set by something other than a neuron.

I bring this up because it's what happens when an engineer looks at the behaviorist paradigm, says "well that's obviously wrong," and proposes something better. (As far as I can tell, it's the best current theory of psychology to explain human brains and motivation, while also explaining simpler animals, but it's not quite complete.)

I also bring it up because it has the inner complexity necessary to tackle the increasing complexity of problems without exponentially increasing complexity, and so could realistically be the model on which robots are built, and still falls prey to the problems of self-modification and hijackable rewards that you seem to think high-capacity systems will have avoided by definition.

[-][anonymous]7y-10

You say "As far as I can tell, it's the best current theory of psychology to explain humans and human motivation, while also explaining simpler animals, but it's not quite complete."

Speaking as a cognitive psychologist, I can say that Powers' work is almost universally ignored as a relic of the behaviorist era. It is not the "best current theory of psychology to explain humans and human motivation". Quite the opposite, it has no credibility at all.

Speaking as a cognitive psychologist, I can say that Powers' work is almost universally ignored as a relic of the behaviorist era. It is not the "best current theory of psychology to explain humans and human motivation". Quite the opposite, it has no credibility at all.

This is your cue to point me to another resource.

[-][anonymous]7y-10

But, what if there was no currently accepted theory of motivation, other than (a) a few hints and suggestions scattered across cog psych, and (b) an old theory that is little more than a speculative extrapolation from experiments on rats?

That, sadly, is the current situation. But just because it is so poor does not mean that Powers' theory is the best.

Also, did you really mean to say "best current theory of psychology to explain humans...." I stop your quote there, specifically. Best to explain "humans"? Probably that was just an infelicity in the wording.

Anyhow, the situation in cog psych is that without a clear way to do measurements of motivation, all we have is descriptive stuff from social psychology. That does not add up to a mechanism.

That, sadly, is the current situation. But just because it is so poor does not mean that Powers' theory is the best.

This looks to me like an uninteresting argument about what the word "best" means. It looks like we both agree that there's not a superior positive alternative, although it's fine if you think it's better to choose the negative alternative of no current theory.

Probably that was just an infelicity in the wording.

Edited.

[-][anonymous]7y-10

No, we really don't agree. The existing body of cognitive psychology knowledge contains, implicit in it, an outline theory of how motivation works. That theory (be it ever so implicit) is already a whole world better than Powers' theory, because the latter is so totally arbitrary and inconsistent with the cognitive pscyhology body of knowledge.

(You will ask why: because the latter does not rely on simple control parameters that act like homunculi. But this is a subtle point. Too complex to handle in this context).

OP's arguments against RL seem to be based on a conception of RL as mapping a set of stimuli directly to an action, which would be silly. They could also be taken as arguments that the brain cannot possibly be implemented in neurons.

[-][anonymous]7y00

Wrong, sorry Phil. I addressed exactly this in the essay itself.

Nothing whatever changes if the mapping from stimuli to action involves various degrees of indirectness, UNLESS the stuff in the middle is so smart, in and of itself, that it starts to dominate the behavior.

And, as a matter of fact, that is always what happens in so-called RL systems (as is explained in the essay). Extra machinery is added to the RL, to get it to work, and in practice that extra machinery does the heavy lifting. Sometimes the extra machinery is invisible -- as when the experimenter uses their own intelligence to pre-package stimuli -- but it can also be visible machinery, in the form of code that does extra processing.

The trouble is, that extra machinery ends up doing so much work that RL by itself is pointless.

Since I made this point with great force, already, it is a little exasperating to have to answer it as if it was not there in the OP, and instead the OP was just "silly".

As for your second point: "They could also be taken as arguments that the brain cannot possibly be implemented in neurons." that is not worth explaining. It obviously does no such thing.

RL is necessary but not sufficient for adaptive intelligent behavior. My model of human intelligence is something like this:

  1. Large-scale unsupervised learning that takes as input a vast sensorimotor data stream, and produces a rich, complex suite of abstractions that describe and predict the data stream ;
  2. A simplistic, error-prone, heuristic, quick and dirty RL algorithm that uses the advanced abstractions produced by the unsupervised layer to create adaptive behavior.

So yes, RL by itself is not strong, but RL+powerful unsupervised learning could be very strong.

This kind of combination seems obvious to me based on simple introspection. Observing my own brain's capability in computational terms, I find that I am amazing at perception (vision/language/audition/etc), which ability is produced by the unsupervised learning component. At the same time I am terrible at planning, which is produced by the RL component.

By the way, Chomsky wrote a good takedown of "pure" behaviorist theory in 1967.

[-][anonymous]7y00

"RL is necessary but not sufficient for adaptive intelligent behavior."

You state this without proof. In my essay I explain all the reasons why this is not in fact correct, but you haven't addressed those yet.

"By the way, Chomsky wrote a good takedown of "pure" behaviorist theory in 1967."

That would be 1957, not 1967. I read his paper (actually a review of Skinner's book) quite some time back. It is valid, of course, but it was only the final nail in a coffin that was already smelling a bit like it had something decaying in it.

But we could see that human beings use e.g. money as RL stimulus, and a lot of people try to get them counterfactually and we call them criminals. The same about sex, popularity and pleasure.

So complex intelligent beings which are humans are still easily corrupted by RL mechanisms in biology and society. The same could happen with AI. It would have much more opportunities to short-circut its RL system by rewriting itself than human being (and it could result to its halt if it will be able to it quickly).

So I just proved that self-improving AI halts. )))

[-][anonymous]7y20

Ah, not quite, no.

The appropriate analogy would be people who get monetary reward by writing a letter to themselves, which reads like it came from the Lottery and tells them they have won the biggest payout in the history of the world. Then they mail the letter to themselves, receive it, open it, jump for joy, and go around telling everyone they can find that they are rich. And they believe it.

We tend not to invite those sorts of people to dinner parties.