Everything I ever needed to know, I learned from World of Warcraft: Goodhart’s law

[-]Jameson Quinn6y60Review for 2018 Review

This is a moderately interesting and well-written example, but did not really surprise me at any point. Worth having, but wouldn't be something I'd go out of my way to recommend.

Reply

[-]Said Achmiz8y50

Of course I can’t resist playing devil’s advocate, even to myself.

It’s almost always the case that you have the comparative advantage in doing the secondary thing that avoids the doom; if others have to pick up your slack there, it’ll be way less efficient, overall.

But how true is this, really? And, more importantly, in what circumstances does it fail to be true?

This is no idle pedantry; identifying scenarios when this heuristic (on which the OP strongly insists) fails to hold can allow us to extract additional value, additional performance, out of a system (but at a price—see below).

A WoW example first (generalization to follow):

Supposing I am playing a DPS character in a raid, and we’re fighting a boss which periodically emits intense flames for a short period of time, doing terrible damage to all characters in the immediate vicinity. Being adjacent to (“in melee with”) the boss during this time causes a character to sustain tremendous damage; any character who remains within the area of effect for the duration of this periodic flame eruption—and does not receive massive, sustained healing—will die. Thus, the standard tactic for dealing with this threat is for all characters in melee to immediately run away when the flame eruption begins, and return when it subsides. (Obviously, characters who flee for their lives are doing absolutely no DPS at all in the meantime—which is the price of surviving to resume DPS once the danger passes.)

Suppose, however, I say to the healers in the raid: “If I stay put instead of running—if I make no effort to avoid this harm—I know that you can heal me, and prevent my death. It will be very difficult for you, of course—but consider the benefit: I will be able to keep DPSing non-stop! My impressive damage output will benefit greatly—will be even more substantial—by avoiding these periodic interruptions, and as a result, we will kill the boss more quickly [which is desirable for many reasons].”

The healers will protest, of course. But how seriously should I take their protests? (Suppose I am the raid leader, or a raid officer whose advice the raid leader will weight heavily—in short, suppose I have the power to tell the raid’s healers whom to heal, and when.) Perhaps I have good reason to believe (based on past performance, for instance, or on some informed analysis, etc.) that they do, indeed, have the capacity to do this difficult thing which I am asking of them; their protests, then, are borne out of a desire to do less work, to put in less effort, than they are capable of. After all, what is gained by some folks in the raid having had an easier time of it than they might’ve? Nothing… right?

This is the choice that seems to confront us: I (and possibly some other DPSers in the raid) can work harder (by avoiding the doom ourselves), or the healers can work harder (by healing us while we blithely DPS despite the flames raging around us). In the second scenario, the boss dies more quickly (which, again, is beneficial for many reasons). As far as consequences (i.e., outputs of our strategy) go, that seems to be the only difference. This seems to strongly suggest that the second strategy is superior. Certainly that’s true unless we can find a difference in costs of this strategy that outweigh the clear benefit (in that case, the “heal through the damage” approach will be inferior on net).

In terms of inputs, the difference is twofold: who shoulders the burden (the DPS or the healers), and how great the burden differential is. In the first approach (“DPSers save themselves”), the DPSers shoulder the burden, and the burden is relatively light; in the second approach (“DPSers stay put, and the healers heal them”), the healers shoulder the burden, and the burden is massive.

Well, but—so what? Is there any principle which says that every raid encounter must tax each raid member equally, that no one has an easier time of it than anyone else? There is not. Merely the fact that it’s easier for the DPSers to save themselves than for the healers to work extra-hard to save the DPSers, does not, by itself, recommend the first approach over the second. In other words, we’ve uncovered a cost to some individual raid members, but it’s not clear at all whether we ought to count this as a cost to the raid. The fact remains that with the second approach, we kill the boss more quickly, with no detrimental effects on our chances of success.

There are, in fact, two ways in which such a cost to individual raid members may translate into a cost to the raid. One is trivial and contingent, the other is fundamental.

The trivial way first: raid members are human. The healers may not appreciate being asked to work so much harder, just so that the DPSers can work a bit less hard, and “but this benefits the raid” may not suffice to persuade them. Even if we assume perfect rationality, perfect understanding of game theory, and perfect sublimation of one’s own selfish desires to the collective goal, we may still expect that the stress, the strain, the exertion of these great demands that we propose to place on the healers, will accelerate burn-out, will disincentivize prospective raid members from signing on as healers, and will have a myriad other effects that result from a particular position in an organization, a particular role in a collective effort, being very difficult and stressful.

(I say this effect is ‘trivial’, but of course it’s anything but—in practice, a good leader must well understand, and deftly manage, effects of this type. It’s just that this is a phenomenon which is not unique to the specific problem I describe in this comment; we may abstract away from it, and be left with an equally stark dilemma.)

The fundamental way: what we propose to do, in the “let the healers heal the DPSers while the latter take the damage” strategy, is to exploit reserves of capacity (in order to improve overall performance/effectiveness/output). We believe that the healers have reserves of capacity, which they are, currently, not exploiting. We say: let us exploit those reserves, converting them to useful output (damage done to the boss)—at a relatively low rate (it takes a great deal of additional effort on the healers’ part to effect what is certainly a non-trivial, but nonetheless not earth-shaking, amount of additional DPS), but every bit counts, and what good is that reserve capacity doing us, if it’s just (metaphorically) sitting there, unused?

You may begin to see the problems with this approach. There are several; let’s make them explicit:

(1) We may be mistaken about how much reserve capacity there is—or, equivalently, about how much capacity will be required in order to do what we propose. Perhaps the healers try to heal the DPSers through the damage, and fail; some or all of the DPSers die (this is bad!). Or, perhaps, while the healers are trying to heal the DPSers, they can’t quite devote the attention that their other tasks require, such as healing other folks in the raid (themselves included)—and as a result, some of the healers die (this is very bad!) or the tank dies (this is utterly catastrophic!).

This may play out in a subtler, and more insidious, way. Perhaps the healers do pull it off, and all the DPSers live, having taken no action to save themselves, and kept up their damage output all the while; but in order to accomplish this, the healers had to draw on reserves of resources that they would otherwise save for later in the engagement… this “borrowing against future capacity” may then come back to bite the raid in its collective ass.

This is a particularly insidious type of consequence because it may be quite difficult to diagnose. Suppose you do some difficult thing, which severely drains your future capacity; that future comes, and you are found wanting, with collective failure resulting. You say: “this is because I had to do that difficult thing!” “Was it really that?” the leader asks, “or perhaps you could’ve managed your resources better?” (And he may well be right. Or not. How can you tell? Suppose that leader is the one who proposed the “make you do the difficult thing” strategy in the first place. Might he not now resist admitting that he asked too much of you? Isn’t this a bias on his part? On the other hand, perhaps you are using this “I had to do that difficult thing” business as an excuse for sub-par performance on the long-term-resource-management front!)

(2) Reserve capacity is not useless. The unexpected may occur. If we plan to use all of our capacity, what happens when things don’t go according to plan? “Exploit all possible reserve capacity to generate maximum performance” is a brittle strategy; it fails to be resilient. A good strategy is resilient both to the sorts of subjectively random, relatively small fluctuations in conditions that result from a variety of sources, and to large fluctuations caused by specific failure modes that may be easy to reduce in probability, but difficult to eliminate entirely.

The problem, of course, is that some challenges are so difficult that a brittle strategy is the only way to succeed. All the more resilient strategies are also going to be lower in maximum possible performance; these maxima may all fall below what is required to overcome a given challenge. In that case, it’s brittle or go home. How do you tell if that’s the situation you face? By choosing the brittle strategy, might you be neglecting opportunities to improve overall performance in many small, subtle, difficult, unexciting, robust ways? There is no easy answer.

Finally we are equipped to answer the question I asked at the beginning of this comment: in what scenarios does the heuristic “take personal responsibility for avoiding the negative consequences of your actions; do not foist it off on others” fail to generate optimal collective performance?

It seems to me that the answer, suggested by the analysis above, is this:

Ask two questions: first, whether shifting the responsibility for preventing the doom your actions cause improves the overall output (by whatever is the relevant metric of performance) of the group. If the answer is “yes”, then ask the second and more important (and more difficult) question: can you be confident that the remaining reserves of capacity—after using up the reserve capacity you propose to convert into effective output—are sufficient to maintain a suitable degree of resilience in your strategy? (You should, of course, work in a suitable margin of error into your estimates of available reserve capacity and of required capacity to effect your proposed responsibility shift, taking into account various sources of uncertainty—about the task at hand, about the capabilities of the group members, etc.)

If the answer is “yes” also to the second question, then seriously consider violating the given heuristic.

In short: ask whether it is beneficial to shift the burden, and ask whether you can afford to shift the burden. If so, do it.

Reply

[-]Vaniver8y40

The healers may not appreciate being asked to work so much harder, just so that the DPSers can work a bit less hard, and “but this benefits the raid” may not suffice to persuade them.

I note also that healers are much less replaceable than DPS are--or at least, that was the way of things when I was playing WoW--and so the maintenance of healer morale is considerably more important for the guild than the maintenance of DPS morale, or potentially even finishing the encounter sooner or more successfully.

Reply

[-]Said Achmiz8y50

Very true! This is an excellent point. (Furthermore, healing is a more stressful role than DPSing, and healers are more prone to burnout—and raid healing takes more skill[1] than DPSing, so for these reasons they are certainly less replaceable; despite a raid needing much fewer healers than DPS, the supply of good healers is lower still.)

[1] Or, to be more precise: the combination of type of skill set + level of competence + disposition, that is required to play a good raid healer, is more rare than the corresponding things that are required to play a good DPSer.

Reply

[-]habryka6y40Nomination for 2018 Review

I've referenced this post a few times a very good and concrete example of Goodhart's law, that felt like it both illustrated the costs, while also showing the actual (usually good) reasons for why people put metrics in place in the first place.

Reply

[-]Ben Pace6y20Nomination for 2018 Review

Seconding Habryka.

Reply

[-]Kaj_Sotala8y20

Curated this post for:

Having a detailed empirical analysis of all the different ways by which measurements are useful, but also susceptible to Goodharting

Reply

[-]Said Achmiz8y20

Uh, what happened to the images in this post? They showed up just fine when it was a draft, but now I don’t see them.

Reply

[-]Raemon6y20

Reading this thread in the future, I find myself kinda wishing for ways comment threads like this could be auto-collapsed or resolved or something after reaching their conclusion.

Reply

[-]Said Achmiz6y20

Agreed, that would be a nice feature. The trick would be to have a good way of identifying such “now totally irrelevant, except for esoteric academic reasons” threads that wouldn’t run into any controversy or require non-trivial moderator attention.

Reply

[-]Raemon6y20

The latest version of the "offtopic comment" feature that the team had chatted about was a "collapse" feature, where some comments are just forcibly collapsed with a flag, and this is just a generic tool that admins and some authors have access to. Doesn't really require anything automatic, just, when you notice such a thread, you can close it. (It's still appear in the comment list, just collapsed as if it had low karma, possibly with a reason displayed)

Reply

[-]Said Achmiz6y20

Yes, that is exactly the sort of thing I had in mind, which would clearly be open to all sorts of, perhaps not “abuse”, but at least—controversial application. It seems to me that it would be useful to differentiate such threads as this one we are discussing now, where nothing “on-topic” is really being discussed, and no one has nor could have any strong feelings about, etc. (This is not to say that the general-purpose tool you’re talking about would not also be useful—very plausibly it would.)

Reply

[-]Raemon8y20

Huh. Were you editing on greaterwrong or lesswrong, and if the latter, were you in Rich or Markdown editing mode?

Reply

[-]Said Achmiz8y20

The former. But this hasn’t happened before…

Edit: Also, the code for the images is still there when I edit the post…! They just don’t display…

Reply

[-]Raemon8y20

Interestingly, the code *isn't* there in the lesswrong markdown editor.

In the past month or so, when we added the markdown editor, we made some changes to how the markdown and Rich editors work (basically making it sure every time the post is saved, it updates both versions of the data to be in sync with each other). If the last time you made a post with images was a month+ ago that might be related.

(I think you said greaterwrong specifically saves what _it_ remembers you entering for markdown, so that you don't have to deal with frustration of our markdown editor, say, preferring underscores to asterixes. Is it possible the code that you still see is a result of that saved version?)

Reply

[-]Said Achmiz8y20

It seems that I have now fixed it by opening the post for editing (on GW) and re-saving it (without changing anything).

Reply

[-]Raemon8y20

Hmm – apparently this just happened again when Kaj moved the post to curated. Apologies, but could you try re-saving it on greaterwrong again? I haven't had time to fix the images bug yet.

Reply

[-]Said Achmiz8y20

Done.

Reply

[-]clone of saturn8y20

It looks like opening a markdown post in the rich editor causes all the images to disappear, which probably happened when Ben moved the post to frontpage.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

37

Everything I ever needed to know, I learned from World of Warcraft: Goodhart’s law

37

37

The combat log

The damage meters

The problem

The Thing is valuable, but it’s not the only valuable thing

We can’t afford to specialize

Tunnel vision kills

Tunnel vision kills… other people

Optimization has a price

Everyone wants the chance to show off their skill

A good excuse for incompetence