If you want to understanding Goodharting in advertising, this is a great article for that.
At the heart of the problems in online advertising is selection effects, which the article explains with this cute example:
Picture this. Luigi’s Pizzeria hires three teenagers to hand out coupons to passersby. After a few weeks of flyering, one of the three turns out to be a marketing genius. Customers keep showing up with coupons distributed by this particular kid. The other two can’t make any sense of it: how does he do it? When they ask him, he explains: "I stand in the waiting area of the pizzeria."
It’s plain to see that junior’s no marketing whiz. Pizzerias do not attract more customers by giving coupons to people already planning to order a quattro stagioni five minutes from now.
The article goes through an extended case study at eBay, where selection effects were causing particularly expensive results without anyone realizing it for years:
The experiment continued for another eight weeks. What was the effect of pulling the ads? Almost none. For every dollar eBay spent on search advertising, they lost roughly 63 cents, according to Tadelis’s calculations.
The experiment ended up showing that, for years, eBay had been spending millions of dollars on fruitless online advertising excess, and that the joke had been entirely on the company.
To the marketing department everything had been going brilliantly. The high-paid consultants had believed that the campaigns that incurred the biggest losses were the most profitable: they saw brand keyword advertising not as a $20m expense, but a $245.6m return.
The problem, of course, is Goodharting, by trying to optimize for something that's easy to measure rather than what is actually cared about:
The benchmarks that advertising companies use – intended to measure the number of clicks, sales and downloads that occur after an ad is viewed – are fundamentally misleading. None of these benchmarks distinguish between the selection effect (clicks, purchases and downloads that are happening anyway) and the advertising effect (clicks, purchases and downloads that would not have happened without ads).
And unsurprisingly, there's an alignment problem hidden in there:
It might sound crazy, but companies are not equipped to assess whether their ad spending actually makes money. It is in the best interest of a firm like eBay to know whether its campaigns are profitable, but not so for eBay’s marketing department.
Its own interest is in securing the largest possible budget, which is much easier if you can demonstrate that what you do actually works. Within the marketing department, TV, print and digital compete with each other to show who’s more important, a dynamic that hardly promotes honest reporting.
The fact that management often has no idea how to interpret the numbers is not helpful either. The highest numbers win.
To this I'll just add that this problem is somewhat solvable, but it's tricky. I previously worked at a company where our entire business model revolved around calculating lift in online advertising spend by matching up online ad activity with offline purchase data, and a lot of that involved having a large and reliable control group against which to calculate lift. The bad news, as we discovered, was that the data was often statistically underpowered and could only distinguish between negative, neutral, and positive lift and could only see not neutral lift in cases where the evidence was strong enough you could have eyeballed it anyway. And the worse news was that we had to tell people their ads were not working or, worse yet, were lifting the performance of competitor's products.
Some marketers' reactions to this were pretty much as the authors' capture it:
Leaning on the table, hands folded, he gazed at his hosts and told them: "You’re fucking with the magic."
To this I'll just add that this problem is somewhat solvable, but it's tricky.
This is a very important point. I will self-promote and mention my pre-print paper on metric design and avoiding Goodharting (not in the context of AI): https://mpra.ub.uni-muenchen.de/90649/1/MPRA_paper_90649.pdf
Abstract: Metrics are useful for measuring systems and motivating behaviors. Unfortunately, naive application of metrics to a system can distort the system in ways that undermine the original goal. The problem was noted independently by Campbell and Goodhart, and in some forms it is not only common, but unavoidable due to the nature of metrics. There are two distinct but interrelated problems that must be overcome in building better metrics; first, specifying metrics more closely related to the true goals, and second, preventing the recipients from gaming the difference between the reward system and the true goal. This paper describes several approaches to designing metrics, beginning with design considerations and processes, then discussing specific strategies including secrecy, randomization, diversification, and post-hoc specification. Finally, it will discuss important desiderata and the trade-offs involved in each approach.
(Currently working on a rewrite, but feedback on the ideas and anything missing is especially appreciated.)
Cool! I don't have time to look into this now, but I'm excited to see what you produce in this direction. As you know I'm pretty pessimistic that we can totally solve Goodhart effects, but I do expect we can mitigate them enough that for things other than superintelligent levels of optimization we can do better than we do now.
Agreed on both points.
I read this article and its referenced papers when it was published on Hacker News 12 days ago, and I have reservations against accepting its conclusions regarding the broken nature of digital advertising.
The article's conclusion is predicated on two specific kinds of evidence:
1. That brand-keyword ads overwhelmingly demonstrate selection effects.
2. That advertising for companies with large advertising impact across different channels demonstrates more selection effects than advertising effects.
The evidence is compelling, but it doesn't warrant the conclusion that digital advertising is ineffective because:
1. Brand-keyword ads (when someone searching for "Macy's" gets an ad linking to Macy's website) are not the only kind or even the most common kind of keyword ads. Targeted keyword ads (having an ad for Macy's website when someone looks up "cashmere sweater") are more common and more competitive, yet haven't been covered or studied in the provided literature.
2. All the studies cited in this article (such as Lewis and Rao 2015 and Gordon et al. 2018) either explicitly deal with firms that are described as "large" or "having millions of customers" (Lewis and Rao, or the eBay intervention), or neglect to disclose or characterize the firms involved in the study (such as Gordon et al). A possible selection bias might be ocurring where only brands with large pre-existing brand identity are being studied - in such a case, it would not be surprising the literature demonstrates more selection effects than advertising effects, as customers would have already heard about the brands by the time these studies ran.
Ideally, the following pieces of evidence would be needed to conclude that digital advertising as-is really is broken:
1. A survey of the effectiveness of targeted keyword ads.
2. The impact of digital advertising among companies with no large brand presence among different channels.
I was unable to find anything in the literature for either, but I confess I did not try very hard beyond a perfunctory Google Scholar search.
I agree that the examples cited in this article are compelling evidence for an application of Goodhart's law in digital advertising.
What you are saying is reasonable, but it feels to me like you put the burden of proof on the author of the article. The question is, why should we believe advertising works at all? So the way I see it, the burden of proof is on the people doing advertising, and the article is asserting that they have not met it.
I partly agree, but burden of proof is often the wrong framing for truth seeking.
The article provides strong evidence that ads are ineffective in certain classes of cases, and that fact in turn provides weaker evidence that ads are ineffective more generally. To support Akshat's skepticism that the result generalizes, we'd need to evidence or priors that points towards ads being differentially effective depending on the type - targeted keywords vs. brand-ad keywords, and brand presence verus no brand presence.
In the first case, I'm somewhat skeptical that the difference between targeted and brand keywords will be large. My prior for the second difference is that there would be some difference, as Gordon argued in another comment. I don't know of any evidence in either direction, but I haven't looked. (The actual result doesn't matter to me except as an exercise in Bayesian reasoning, but if it matters to you or others, it's plausible high VoI to search a bit. )
What you are saying is reasonable, but it feels to me like you put the burden of proof on the author of the article. The question is, why should we believe advertising works at all?
It seems like a a reasonable prior would be that telling people about your product who didn't already know about your product makes them more likely to buy your product.
I think you can certainly make the case against the above statement, but I don't know why you wouldn't start with that prior.
That prior of course, doesn't make a case for brand advertising, which eBay was doing, but that's not what David's objection was about.
The question is, why should we believe advertising works at all?
This is a fair objection. I decided to look for a review paper summarizing the existing literature on the subject of advertising effectiveness.
Via Google Scholar, I was able to find a particularly useful review paper, summarizing both empirical effects and prior literature reviews for advertisements as well as political and health campaigns across multiple channels (print, TV, etc.). Overall, the literature paints a disjointed, inconclusive view of the value of advertising - there is insufficient data to conclude that advertising in general has no impact.
I invite you or anyone interested to read it in depth, but will, for the purpose of this discussion, summarize its concluding remarks (as available in the section "Behavioral Effects of Advertising or Commercial Campaigns"):
1. Advertising interventions appear to be correlated with short-term boosts in product sales.
A set of case studies has shown strong short-term effects of campaigns on sales (Jones 2002). In a recent study, the buying of a service (use of the weight room in a training facility) increased to almost five times the initial use after an outdoor advertising campaign (Bhargava and Donthu 1999). In another study, exposure to printed store sale flyers led to a doubling of the number of advertised products bought, and more than a doubling of the amount spent on items in ads (Burton, Lichtenstein et al. 1999).
2. There *is* disagreement on the long-term effects.
Ninety percent of advertising effects dissipate after three to fifteen months. The first response is most important; the share returns for advertising diminish fast. After the third exposure advertisers should focus on reach rather than frequency, according to research findings from advertising effects research (Vakratsas and Ambler 1999).
While some claim that advertising seems not to be important for sales in the short term, although more important in the longer term (Tellis 1994; see also Tellis, Chandy et al. 2000), others disagree. Jones found that advertisements must work in the short term to be able to have any medium or long-range effect on sales (Jones 2002).
3. Despite contributing to a short-term boost, advertising by itself is weaker compared to other kinds of promotional activities. Increased advertising spend yields diminishing results:
The influence of advertising has been estimated to be 9% of the variation in sales for consumer products. The effect of promotional activities – such as offers of reduced prices for shorter periods of time – was more than double that size (Jones 2002). In some studies price reductions have been found to be 20 times more effective for increasing sales than is advertising (Tellis 1994), a consequence being that since the late 1980s the industry has changed its emphasis from advertising to promotion (Turk and Katz 1992; Vakratsas and Ambler 1999; Jones 2002). The solution to the problem of small effects may be that most advertising research has not taken into consideration the fact that only a small amount of advertising seems to increase sales. Increased spending on advertising (increased number of exposures and increased gross rating points) has been found to induce larger sales when ads were persuasive, but not when they were not (Stewart, Paulos et al. 2002).
4. "Likeability", medium, and what it's selling matters a lot in the effectiveness of advertising:
The advertising copy and novelty in ads seemed more important than the amount of advertising itself (Tellis 1994). The two most important qualities of ads that sell products are likeability of the ad (Biel 1998) and its ability to make people believe that a company has an excellent product (Joyce 1998: 20). A study has shown that advertising likeability predicted sales winners 87% of the time (Biel 1998). It is no news that copy research works (Caples 1997; for a review, see Jeffres 1997: 252-263), but new data-processing techniques have made it possible to apply this knowledge almost instantly to TV advertising as well (Woodside 1996). Channel selection may also be an important influence on sales (Tellis, Chandy et al. 2000). For some groups of products (lower-priced daily consumer goods) the first exposure to advertising may contain most of the ad’s effect on behavior (Jones 1995; Jones 2002).
There are many other aspects of advertising influence that is covered in the conclusions which I have not summarized - I have selected the few that seem most salient here.
Overall, I think a reasonable prior is that advertising *has* an impact, but has strong situational limits to its effectiveness compared to other sales growth techniques. Since digital advertising is a specific case of advertising in general and there are some effects for advertising in general, it would be difficult to make the case that no digital advertising works at all - it is much safer to expect that digital advertising has *some* (albeit situational and weak) impact.
The other half of this comment is re:
the article is asserting that they have not met it
This is a reductive picture. It is true the article is setting out to check marketer's claims of the effects of digital advertising. However, it is also setting out to provide an overview of the evidence for whether digital advertising works in general. This last aspect was the focus of my prior comment.
My comment was meant to highlight the flaws in their methodology for reviewing whether digital advertising does not work. Their review focus has been restricted to a very specific set of claims and cases targeted at larger advertising platforms, and one should not generalize early from those remarks. To do a better job of truth-seeking, articulating what specifically is currently not known after the analysis is necessary, I think - hence my last comment.
It's true that not all of online advertising does nothing. We should expect, if nothing else, online advertising to continue to serve the primary and original purpose of advertising, which is generating choice awareness, and certainly my own experience backs this up: I am aware of any number of products and services only because I saw ads for them on Facebook, Google search, SlateStarCodex, etc.. To the extent that advertising helps people become aware of choices they otherwise would not have become aware of such that on the margin they may take that choice (since you make none of the choices you don't know how to make), it would seem to function successfully, assuming it can be had at a price low enough to produce positive return on investment.
However, my own experience in the industry suggests that most spend that goes beyond generating more than zero awareness is poorly spent. Much to the dismay of marketing departments, you can't usually spend your way through ads to growth. Other forms of marketing look better (content marketing can work really great and can be a win-win when done right).
This experience has been corroborated by countless reviews (summarized in my other comment), so I agree with you.
I think this is relevant: Banner Ads Considered Harmful
My understanding of Brand Advertising in an adwords style bidding environment is that its' important to do when products are in high-competition/knife edge markets.
Note that this is not because your ad is going to make people more likely to know about your brand, its' because if you don't bid on your brand, your competitors will, so a search for you will actually cause brand awareness for THEM if you don't outbid their spot. This amounts to a shakedown by google that they can do because they have an essential monopoly on search.
For eBay, this isn't as important because network effects basically mean they don't have any competitors who can actually compete with what they're offering, which is a large market of buyers and sellers for auctions.
It's possible that earlier in eBay's history, these ads actually were effective in preventing a switch to competition but became less effective over time as they monopolized the online auction space.
All this to say:
1. Blindly following tactics like "optimize conversions" without understanding context will lead to goodharting.
2. This particular scenario doesn't mean that online advertising is ineffective, just that you have to know what you're doing.
Side-question: do we have a useful model for whether there WAS a dot-com bubble? Seems like there was a fair bit of churn, and a temporary loss, but the category is bigger than ever. Buying at the peak and holding until now did pretty well, right?
There was a bubble, and there is also secular growth in the market, with a lot of churn that makes buy-and-hold a fairly bad idea. Those aren't inconsistent. Here's a graphic of the churn. Most of the early companies died.
But if you put all your money in the hot IPOs of Netscape, Yahoo, Lycos, and Excite in 1995, you'd have done very poorly. If you extended this to 1996, you could add Mindspring and Checkpoint (the only one that did well, so far, which is up 29x, for a 16% annualized return to date.) It took until 1997 to get any long-term fantastic return, for Amazon - which is up 1000x since 1997, or a 37% annual return - fantastic, but if you were prescient, and it was an entire tenth of your portfolio, on average you did just OK. Skipping ahead to 1999-2000, the height of the bubble, here's the list. Nothing made big bucks.
So we can construct a portfolio with 10 stocks, 8 of which went bust, and 2 of which, checkpoint and Amazon, did well. Your compound 22-year return? 5.25% (And if you bought an S&P index fund in 1998 at 1,000, you'd have made 6.5% annually.)
neat graphic, thanks!
What audience is this for? The amount of useless stuff that other people spend their money on is ... overwhelming. A lot of it (say, alcohol and tobacco) don't even have anyone claiming it helps in any way. Pointing out that something which may be helpful (to the advertiser; it's definitely helpful to the ad vendors and arguably helpful to subsidized media consumers) doesn't seem like it'll have any impact on any decision-maker.
How Brands Grow is a good take on one part of this space.