(with thanks to Daniel Dewey, Owain Evans, Nick Bostrom, Toby Ord and BruceyB)

In theory, a satisficing agent has a lot to recommend it. Unlike a maximiser, that will attempt to squeeze the universe to every drop of utility that it can, a satisficer will be content when it reaches a certain level expected utility (a satisficer that is content with a certain level of utility is simply a maximiser with a bounded utility function). For instance a satisficer with a utility linear in paperclips and a target level of 9, will be content once it's 90% sure that it's built ten paperclips, and not try to optimize the universe to either build more paperclips (unbounded utility), or obsessively count the ones it has already (bounded utility).

Unfortunately, a self-improving satisficer has an extremely easy way to reach its satisficing goal: to transform itself into a maximiser. This is because, in general, if E denotes expectation,

E(U(there exists an agent A maximising U))  ≥  E(U(there exists an agent A satisficing U))

How is this true (apart from the special case when other agents penalise you specifically for being a maximiser)? Well, agent A will have to make decisions, and if it is a maximiser, will always make the decision that maximises expected utility. If it is a satisficer, it will sometimes not make the same decision, leading to lower expected utility in that case.

So hence if there were a satisficing agent for U, and it had some strategy S to accomplish its goal, then another way to accomplish this would be to transform itself into a maximising agent and let that agent implement S. If S is complicated, and transforming itself is simple (which would be the case for a self-improving agent), then self-transforming into a maximiser is the easier way to go.

So unless we have exceedingly well programmed criteria banning the satisficer from using any variant of this technique, we should assume satisficers are as likely to be as dangerous as maximisers.

Edited to clarify the argument for why a maximiser maximises better than a satisficer.

Edit: See BruceyB's comment for an example where a (non-timeless) satisficer would find rewriting itself as a maximiser to be the only good strategy. Hence timeless satisficers would behave as maximisers anyway (in many situations). Furthermore, a timeless satisficer with bounded rationality may find that rewriting itself as a maximiser would be a useful precaution to take, if it's not sure to be able to precalculate all the correct strategies.

New Comment
70 comments, sorted by Click to highlight new comments since:

If that were not the case, then the maximising agent would transform itself into a satisficing agent, but, (unless there are other agents out there penalising you for your internal processes), there is no better way of maximising the expected U than by attempting to maximise the expected U.

Is that really true? This seems to be the main and non-trivial question here, presented without proof. It seems to me that there ought to be plenty of strategies that a satisficer would prefer over a maximizer, just like risk-averse strategies differ from optimal risk-neutral strategies. eg. buying +EV lottery tickets might be a maximizer's strategy but not a satisficer.

I reworded the passage to be: Yes, the satisficer can be more risk averse than the maximiser - but it's precisely that that makes a worse expected utility maximiser.
OK, that makes more sense to me.
1Cole Killian
Does it make sense to to claim that a satisficer will be content when it reaches a certain level of expected utility though? Some satisficers may work that way, but they don't all need to work that way. Expected utility is somewhat arbitrary. Instead, you could have a satisficer which tries to maximize the probability that the utility is above a certain value. This leads to different dynamics than maximizing expected utility. What do you think? Related post on utility functions here: https://colekillian.com/posts/sbf-and-pascals-mugging/
If U is the utility and u is the value that it needs to be above, define a new utility V, which is 1 if and only if U>u and is 0 otherwise. This is a well-defined utility function, and the design you described is exactly equivalent with being an expected V-maximiser.

I don't think this follows. Consider the case where there's two choices:

1) 10% chance of no paperclips, 90% chance of 3^^^3 paperclips 2) 100% chance of 20 paperclips

The maximizer will likely pick 1, while the satisficer will definitely prefer 2.

As I understand it, the actual problem in this area is not so much that "satisficers want to become maximisers" - but rather that a simple and effective means of satisficing fairly often involves constructing a maximising resource-gathering minion. Then after the satisficer is satisfied, the minion(s) may continue unsupervised - unless care is taken. I discussed this issue in my 2009 essay on the topic.
The expected utility of option 1 is higher than the threshold for the satisficer, so it could just as easily pick 1) as 2); it'll be indifferent between the two choices, and will need some sort of tie-breaker.
But inasmuch as it will want to want one over the other, it will want to want 2 which is guaranteed to continue to satisfice over 1 which has only a 90% chance of continuing to satisfice, so it should not want to become a maximizer.
So that's actually the "bounded utility" definition, which Stuart says he isn't using. It does seem more intuitive though... I think you can get a paradox out of Stuart's definition, actually, which should not be surprising, since it isn't a utility-maximizer.
A satisficer is not motivated to continue to satisfice. It is motivated to take an action that is a satisficing action, and 1) and 2) are equally satisficing. I know what you're trying to do, I think. I tried to produce a "continuously satisficing agent" or "future satisficing agent", but couldn't get it to work out.
Surey option 1 has a 10% chance of failing to satisfy.
Option 1) already satisfies. Taking option 1) brings the expected utility up above the threshold, so the satisficer is done. If you add the extra requirement that the AI must never let the expected utility fall below the threshold in future, then the AI will simply blind itself or turn itself off, once the satisficing level is reached; then its expected utility will never fall, as no extra information ever arrives.
Sorry - a failure to reread the question on my part :-(
Right, the satisficer will not have an incentive to increase its expected utility by becoming a maximizer when its expected utility (by remaining a satisficer) is already over the threshold. But surely this condition would fail frequently.
If it isn't over the threshold, it could just keep making the same decisions a maximizer would.

As I understand it, your satisficing agent has essentially the utility function min(E[paperclips], 9). This means it would be fine with a 10^-100 chance of producing 10^101 paperclips. But isn't it more intuitive to think of a satisficer as optimizing the utility function E[min(paperclips, 9)]? In this case, the satisficer would reject the 10^-100 gamble described above, in favor of just producing 9 paperclips (whereas a maximizer would still take the gamble and hence would be a poor replacement for the satisficer).

A satisficer might not want to take over ... (read more)

Alternately, a satisficer could build a maximiser. For example, if you don't give it the ability to modify its own code. It also might build a paperclip-making Von Neumann machine that isn't anywhere near a maximizer, but is still insanely dangerous.

I notice a satisficing agent isn't well-defined. What happens when it has two ways of satisfying its goals? It may be possible to make a safe one if you come up with a good enough answer to that question.

What I usually mean by it is: maximise until some specified criterion is satisfied - and then stop. However, perhaps "satisficing" is not quite the right word for this. IMO, agents that stop are an important class of agents. I think we need a name for them - and this is one of the nearest things. In my essay, I called them "Stopping superintelligences". That's the same as with a maximiser.
Except much more likely to come up; a maximiser facing many exactly balanced strategies in the real world is a rare occurance.
Well, usually you want satisfaction rapidly - and then things are very similar again.
Then state that. It's an inverse-of-time-until-satisfaction-is-complete maximiser. The way you defined satisfaction doesn't really work with that. The satisficer might just decide that it has a 90% chance of producing 10 paperclips, and thus its goal is complete. There is some chance of it failing in its goal later on, but this is likely to be made up by the fact that it probably will satisfy its goals with some extra. Especially if it could self-modify.
Yep. Coding "don't unleash (or become) a maximiser or something similar" is very tricky. It may be. But encoding "safe" for a satisficer sounds like it's probably just as hard as constructing a safe utility function in the first place.

I see that a satisficer would assign higher expected utility to being a maximizer than to being a satisficer. But if the expected utility of being a satisficer were high enough, wouldn't it be satisfied to remain a satisficer?

The act of becoming a maximiser is an act that would, in itself, satisfy its satisficing requirement. The act of staying a satisficer might not do so (because if it did, for ever, then the satisficer will just be content with remaining a satisficer for ever, and never getting anything done).

If the way to satisfice best is to act like a maximizer, then wouldn't an optimal satisficer simply act like a maximizer, no self-rewriting required?

Here is a (contrived) situation where a satisficer would need to rewrite.

Sally the Satisficer gets invited to participate on a game show. The game starts with a coin toss. If she loses the coin toss, she gets 8 paperclips. If she wins, she gets invited to the Showcase Showdown where she will first be offered a prize of 9 paperclips. If she turns down this first showcase, she is offered the second showcase of 10 paper clips (fans of The Price is Right know the second showcase is always better).

When she first steps on stage she considers whether she should switch to maximizer mode or stick with her satisficer strategy. As a satisficer, she knows that if she wins the coin toss she won't be able to refuse the 9 paperclip prize since it satisfies her target expected utility of 9. So her expected utility as a satisficer is (1/2) 8 + (1/2) 9 = 8.5. If she won the flip as a maximizer, she would clearly pass on the first showcase and receive the second showcase of 10 paperclips. Thus her expected utility as a maximizer is (1/2) 8 + (1/2) 10 = 9. Switching to maximizer mode meets her target while remaining a satisficer does not, so she rewrites herself to be a maximizer.

Ah, good point. So "picking the best strategy, not just the best individual moves" is similar to self-modifying to be a maximizer in this case. On the other hand, if our satisficer runs on updateless decision theory, picking the best strategy is already what it does all the time. So I guess it depends on how your satisficer is programmed.
This seems to imply that an updatless satisficer would behave like a maximiser - or that an updatless satisficer with bounded rationality would make themselves into a maximiser as a precaution.
A UDT satisficer is closer to the original than a pure maximizer, because where different strategies fall above the threshold the original tie-breaking rule can still be applied.
Cool example! But your argument relies on certain vagueness in the definitions of "satisficer" and "maximiser", that between: * A: an agent "content when it reaches a certain level expected utility"; and * B: "simply a maximiser with a bounded utility function" (These definitions are from the OP). Looking at the situation you presented: "A" would recognise the situation as having an expected utility as 9, and be content with it (until she loses the coin toss...). "B" would not distinguish between the utility of 9 and the utility of 10. Neither agent would see a need to self-modify. Your argument treats Sally as (seeing itself) morphing from "A" before the coin toss to "B" after - this, IMO, invalidates your example.
I like this, I really do. I've added a mention to it in the post. Note that your point not only shows that a non-timeless satisficer would want to become a maximiser, but that a timeless satisficer would behave as a maximiser already.
I realize this is old (which is why I'm replying to a comment to draw attention), but still, the entire post seems to be predicated on a poor specification of the utility function. Remember, the utility function by definition includes/defines the full preference ordering over outcomes, and must therefore include the idea of acting "satisfied" inside it. Here, instead, you seem to define a "fake" utility function of U = E(number of paperclips) and then say that the AI will be satisfied at a certain number of paperclips, even though it clearly won't be because that's not part of the utility function. That is, something with this purported utility function is already a pure maximiser, not a satisficer at all. Instead, the utility function you're constructing should be something like U = {9 if E(paperclips) >= 9, E(paperclips) otherwise}, in which case in the above example the satisficer really wouldn't care if it ended up with 9 or 10 paperclips and would remain a satisficer. The notion that a satisficer wants to become a maximiser arises only because you made the "satisficer's" utility function identical to a maximiser's to begin with. (There may be other issues with satisficers, but I don't think this is one of them. Also, sorry if that came across as confrontational - I just wanted to make my objection as clear as possible.)
Satisficing is a term for a specific type of decision making - quoting wikipedia: "a decision-making strategy that attempts to meet an acceptability threshold. This is contrasted with optimal decision-making, an approach that specifically attempts to find the best option available." So by definition a satisficer is an agent that is content with a certain outcome, even though they might prefer a better one. Do you think my model - utility denoting the ideal preferences, and satisficing being content with a certain threshold - is a poor model of this type of agent?
Yes, as I said, I think any preferences of the agent, including being "satisfied", need to be internalized in the utility function. That is, satisficing should probably be content not with a certain level of utility, but with a certain level of the objective. Anything that's "outside" the utility function, as satisficing is in this case, will naturally be seen as an unnecessary imposition by the agent and ultimately ignored (if the agent is able to ignore it), regardless of what it is. For a contrived analogy, modeling a satisficer this way is similar to modelling an honest man as someone who wants to maximize money, but who lives under the rule of law (and who is able to stop the law applying to him whenever he wants at that).
So I did a post saying that a satisfier would turn into an expected utility maximiser, and your point is... that any satisficer should already be an expected utility maximiser :-)
No, only one that's modeled the way you're modeling. I think I'm somehow not being clear, sorry =( My point is that your post is tautological and does an injustice to satisficers. If you move the satisfaction condition inside the utility function, e.g. U = {9 if E(paperclips) >= 9, E(paperclips) otherwise}, so that its utility increases to 9 as it gains expected paperclips, and then stops at 9 (which is also not really an optimal definition, but an adequate one), the phenomenon of wanting to be a maximiser disappears. With that utility function, it would be indifferent between being a satisficer and a maximiser. If you instead changed to a utility function like, let's say: U = {1 if 8 < E(paperclips) < 11, 0 otherwise}, then it would strictly prefer to remain a satisficer, since a maximiser would inevitably push it into the 0 utility area of the function. I think this is the more standard way to model a satisficer (also with a resource cost thrown in as well), and it's certainly the more "steelmaned" one, as it avoids problems like the ones in this post.
That's just a utility maximiser with a bounded utility function. But this has become a linguistic debate, not a conceputal one. One version of satisficisers (the version I define, which some people intuitively share) will tend to become maximisers. Another version (the bounded utility maximisers that you define) are already maximisers. We both agree on these facts - so what is there to argue about but the linguistics? Since satisficing is more intuitively that rigorously defined (multiple formal definitions on wikipedia), I don't think there's anything more to dispute?
All right, I agree with that. It does seem like satisficers are (or quickly become) a subclass of maximisers by either definition. Although I think the way I define them is not equivalent to a generic bounded maximiser. When I think of one of those it's something more like U = paperclips/(|paperclips|+1) than what I wrote (i.e. it still wants to maximize without bound, it's just less interested in low probabilities of high gains), which would behave rather differently. Maybe I just have unusual mental definitions of both, however.
Maybe bounded maximiser vs maximiser with cutoff? With the second case being a special case of the first (for there are many ways to bound a utility).
Yes, that sounds good. I'll try using those terms next time.

Doesn't follow if an agent wants to satisfice multiple things, since maximizing the amount of one thing could destroy your chances of bringing about a sufficient quantity of another.

Interesting idea, but I think it reduces to the single case. If you want to satisfice, say, E(U1) > N1 and E(U2) > N2, then you could set U3=min(U1-N1,U2-N2) and satisfice E(U3) > 0.
Sure, but there are vastly more constraints involved in maximizing E(U3). It's easy to maximize E(U(A)) in such a way as to let E(U(B)) go down the tubes. But if C is positively correlated with either A or B, it's going to be harder to max-min A and B while letting E(U(C)) plummet. The more accounted-for sources of utility there are, the likelier a given unaccounted-for source of utility X is to be entangled with one of the former, so the harder it will be for the AI to max-min in such a way that neglects X. Perhaps it gets exponentially harder! And humans care about hundreds or thousands of things. It's not clear that a satisficer that's concerned with a significant fraction of those would be able to devise a strategy that fails utterly to bring the others up to snuff.
I'm simply pointing out that a multi-satisficer is the same as a single-satisficer. Putting all the utilities together is a sensible thing; but making a satisficer out of that combination isn't.

E(U(there exists an agent A maximising U) ≥ E(U(there exists an agent A satisficing U)

It's a good idea to define your symbols and terminology in general before (or right after) using them. Presumably U is utility, but what it E? Expectation value? How do you calculate it? What is an agent? How do you calculate utility of an existential quantifier? If this is all common knowledge, at least give a relevant link. Oh, and it is also a good idea to prove or at least motivate any non-trivial formula you present.

Feel free to make your post (which apparently attempts to make an interesting point) more readable for the rest of us (i.e. newbies like me).

Reworded somewhat. E is expectation value, as is now stated; it does not need to calculated, we just need to know that a maximiser will always make the decision that maximises the expected value of U, while a satisficer may sometimes make a different decision; hence the presence of a U-maximiser increases the expected value of U over the presence of an otherwise equivalent U-satisficer. An agent is "An entity which is capable of Action)"; an AI or human being or collection of neurons that can do stuff. It's a general term here, so I didn't define it.

I described this issue - and discussed some strategies for dealing with it - in 2009 here.

What if the satisficer is also an optimiser? That is, its utility function is not only flat in the number of paperclips after 9, but actually decreasing.

See comment http://lesswrong.com/lw/854/satisficers_want_to_become_maximisers/52f4 and the responses.

E(U(there exists an agent A maximising U) ≥ E(U(there exists an agent A satisficing U)

The reason this equation looks confusing is because (I presume) there ought to be a second closing bracket on both sides.

Anyhow, I agree that a satisficer is almost as dangerous a maximiser. However, I've never come across the idea that a satisficing agent "has a lot to recommend it" on Less Wrong.

I thought that the vast majority of possible optimisation processes - maximisers, satisficers or anything else - are very likely to destroy humanity. That is why CE... (read more)

There are second closing brakets on both sides. Look closely. They have always been there. Honest, guv. No, do not look into your cache or at previous versions. They lie! I would never have forgotten to put closing brakets. Nor would I ever misspell the word braket. Or used irony in a public place ;-)
This is a simple argument, that I hadn't seen before as to why satisficers are not a good way to go about things. I've been looking at Oracles and other non-friendly AGIs that may nevertheless be survivable, so it's good to know that satisficers are not to be counted among them.

As I understand what is meant by satisficing, this misses the mark. A satisficer will search for an action until it finds one that is good enough, then it will do that. A maximiser will search for the best action and then do that. A bounded maximser will search for the "best" (best according to its bounded utility function) and then do that.

So what the satisficer picks depends on what order the possible actions are presented to it in a way it doesn't for either maximiser. Now, if easier options are presented to it first then I guess your conclusion still follows, as long as we grant the premise that self-transforming will be easy.

But I don't think it's right to identify bounded maximisers and satisficers.

It seems to me that a satisficer that cares about expected utility rather than actual utility is not even much of a satisficer in the first place, in that it doesn't do what we expect of satisficers (mostly ignoring small probabilities of gains much greater than its cap in favor of better probabilities of ones that just meet the cap). Whereas the usual satisficer, maximizer with the bounded utility function (well, not just bounded - cut off) does.

it is very easy to show that any "satisficing problem" can be formulated as an equivalent "optimization problem"

Satisficing seems a great way to describe the behavior of maximizers with multiple-term utility functions and an ordinal ranking of preference satisfaction i.e. humans. This sounds like it should have some fairly serious implications.


So you're defining a satisficing agent as an agent with utility function f that it wants to maximize, but that acts like its trying to maximize minimum(f, a constant)? In that case, sure, turning itself into an agent that actually tries to maximize f will make it better at maximizing f. This is a fairly trivial case of the general fact that making yourself better at maximizing your utility tends to increase your utility. However, if the satisficing agent with utility function f acts exactly like a maximizing agent with utility function min(f, constant), th... (read more)

[This comment is no longer endorsed by its author]Reply

Can you really assume the agent to have a utility function that is both linear in paperclips (which implies risk neutrality) and bounded + monotonic?

No; but you can assume it's linear up to some bound.

Build the utility function such that excesses above the target level are penalized. If the agent is motivated to build 9 paperclips only and absolutely no more, then the idea of becoming a maximizer becomes distasteful.

This amuses me because I know actual human beings who behave as satisficers with extreme aversion to waste, far out of proportion to the objective costs of waste. For example: Friends who would buy a Toyota Corolla based on its excellent value-to-cost ratio, and who would not want a cheaper, less reliable car, but who would also turn down a much nicer car offered to them at a severe discount, on the grounds that the nicer car is "indulgent."

That is already a maximiser - Its utility is maximised by building exactly 9 paperclips. It will take over universe to build more and more sophisticated ways of checking that there are exactly 9 paperclips, and more ways of preventing itself (however it defines itself) from inadvertently building more. In fact it may take over the universe first, put all the precautions in place, and build exactly 9 paperclips just before heat-death wipes out everything remaining.
Ah, I see. Thanks for correcting me. So my friends are maximizers in the sense that they seek very specific targets in car-space, and the fact that those targets sit in the middle of a continuum of options is not relevant to the question at hand.
But you run into other problems then, like the certainty the OP touched on. Then the agent will spend significant resources ensuring that it has exactly 9 paperclips made, and wouldn't accept a 90% probability of making 10 paperclips, because a 99.9999% probability of making 9 paperclips would yield more utility for it.
Sooo - you would normally give such an agent time and resource-usage limits.
But the entire point of building FAI is to not require it to have resource usage limits, because it can't help us if it's limited. And such resource limits wouldn't necessarily be useful for "testing" whether or not an AI was friendly, because if it weren't, it would mimic the behaviour of a FAI so that it could get more resources.
Machines can't cause so much damage if they have resource-usage limits. This is a prudent safety precaution. It is not true that resource-limited machines can't help us. So: the main idea is to attempt damage limitation. If the machine behaves itself, you can carry on with another session. If it does not, it is hopefully back to the drawing board, without too much damage done.

Um, the standard AI definition of a satisficer is:

"optimization where 'all' costs, including the cost of the optimization calculations themselves and the cost of getting information for use in those calculations, are considered."

That is, a satisficer explicitly will not become a maximizer, because it is consciously aware of the costs of being a maximizer rather than a satisficer.

A maximizer might have a utility function like "p", where p is the number of paperclips, while a satisficer would have a utility function like "p-c", ... (read more)

According to the page you cite, satisficers are a subset of maximisers. Satisficers are just maximisers whose utility functions factor in constraints.
Yes for some definitions of maximizers. The article Stuart_Armstrong wrote seems have to differing definitions: maximizers are agents that seek to get as much X as possible, and his satisficers want to get as much E(X) as possible. Then, trivially, those reduce to agents that want to get as much X as possible. I don't see that as novel or relevant since what I would call satisficers are those that try to set marginal gain equal to marginal cost. Those generally do not reduce to agents that seek to get as much X as possible.