tl;dr: In this installment, we look at methods of avoiding the problems related to optimization by proxy. Many potential solutions cluster around two broad categories: Better Measures, and Human Discretion. Distribution of decisions to the local level is a solution that seems more promising and is examined in more depth.
In the previous article I had promised that if there was a good reception, I would post a follow-up article to discuss ways of getting around the problem. That article made it to the front page, so here are my thoughts on how to circumvent Optimization by Proxy (OBP). Given that the previous article was belabored over at least a year and a half, this one will be decidedly less solid, more like a structured brainstorm in which you are invited to participate.
In the comments of the previous article I was pointed to The Importance of Goodhart's Law, a great article, which includes a section on mitigation. Examining those solutions in the context of OBP seems like a good skeleton to build on.
The first solution class is 'Hansonian Cynicism'. In combination with awareness of the pattern, pointing out that various processes (such as organizations) are not actually optimizing around their stated goal, but some proxy, creates cognitive dissonance for the thinking person. This sounds more like a motivation to find a solution than a solution itself. At best, knowing what goes wrong, you can use the process in a way that is informed by its weaknesses. Handling with care may mitigate some symptoms, but it doesn't make the problems go away.
The second solution class mentioned is 'Better Measures'. That is indeed what is usually attempted. The 'purist' approach to this is to work hard on finding a computable definition of the target quality. I cannot exclude the possibility of cases where this is feasible no immediate examples come to mind. The proxies that I have in mind are deeply human (quality, relevance, long-term growth) and boil down to figuring out what is 'good', thus, computing them is no small matter. Coherent Extrapolated Volition is the extreme end of this approach, boiling a few oceans in the process, certainly not immediately applicable.
A pragmatic approach to Better Measures is to simply monitor better, making the proxy more complex and therefore harder to manipulate. Discussion with Chronos in the comments of the original article was along those lines. By integrating user activity trails, Google makes it harder to game the search engine. I would imagine that if they integrated those logs with Google Analytics and Google Accounts, they would significantly raise the bar for gaming the system, at the expense of user privacy. Of course by removing most amateur and white/gray-hat SEOs from the pool, and given the financial incentives that exist, they would make it significantly more lucrative to game the system, and therefore the serious black hat SEOs that can resort to botnets, phishing and networks of hacked sites would end up being the only games in town. But I digress. Enriching the proxy with more and more parameters is a pragmatic solution that should work in the short term as a part of the arms race against manipulators, but does not look like a general or permanent solution from where I'm standing.
A special case of 'Better Measures' is that of better incentive alignment. From Charlie Munger's speech A Lesson on Elementary, Worldly Wisdom As It Relates To Investment Management & Business:
From all business, my favorite case on incentives is Federal Express. The heart and soul of their system—which creates the integrity of the product—is having all their airplanes come to one place in the middle of the night and shift all the packages from plane to plane. If there are delays, the whole operation can't deliver a product full of integrity to Federal Express customers.
And it was always screwed up. They could never get it done on time. They tried everything—moral suasion, threats, you name it. And nothing worked.
Finally, somebody got the idea to pay all these people not so much an hour, but so much a shift—and when it's all done, they can all go home. Well, their problems cleared up overnight.
In fact, my initial example was a form of naturally occurring optimization by proxy, where the incentives of the actors are aligned. I guess stock grants and options are another way to align employee incentives with company incentives. As far as I can tell, this has not been generalised either, and does not seem to reliably work in all cases, but where it does work, it may well be a silver bullet that cuts through all the other layers of the problem.
Before discussing the third and more promising avenue, I'd like to look at one unorthodox 'Better Measures' approach that came up while writing the original article. Assume that the proxy involves possessing the target quality to produce, and faking it is an NP-complete problem. The only real-world case where I can see an analog to this is cryptography. Perhaps we can stretch OPB such that WWII cryptography can be seen as an example of it. By encrypting with Enigma and keeping their keys secret (the proxies), the Axis forces aimed to maintain the secrecy of their communications (the target quality). When the allies were able to crack Enigma, this basic assumption stopped being reliable. Modern cryptography makes this actually feasible. As long as the keys don't fall into the wrong hands, and assuming no serious flaws in the cryptographic algorithms used, the fact that a document can be decrypted with someone's public key (the proxy) authenticates that document to the owner of the key (the target quality). While this works in cryptography, it may be stretching the OBP analogy too far. On the other hand, there may be a way to transfer this strategy to solve other OBP problems that I have not yet seen. If you have any thoughts around this, please put them forward in the comments.
The third class of solutions is 'Human Discretion'. This is divided in two, diametrically opposite solutions. One is 'Hierarchical rule', inspired by the ideas of Mencius Moldbug. Managers are the masters of all their subordinates, and the slaves of their higher-ups. No rules are written, so no proxies to manipulate. Except of course, for human discretion itself. Besides the tremendous potential for corruption, this does not transfer well to automated systems. Laws may be a luxury for humans, but for machines, code is everything. There is no law-independent discretion that a machine can apply, even if threatened with obliteration. The opposite of that is what the article calls 'Left anarchist Ideas'. I think that puts too much of a political slant to an idea that is much more general. I call it simply 'distribution'. The idea here is that if decisions are taken locally, there is no big juicy proxy to manipulate, but it is splintered to multitudes of local proxies, each different than the other. I think this is the way that evolution can be seen to deal with this issue. If for instance we see the immune system as an optimizer by proxy, the ability of some individuals to survive a virus that kills others is a demonstration of the fact that the virus has not fooled everyone's immune system. Perhaps the individuals that survived are vulnerable to other threats, but this would mean that a perfect storm of diseases that exploit everyone's weaknesses would have to affect a population at the same time to extinguish it. Not exactly a common phenomenon. Nature's resilience through diversity usually saves the day.
So distribution seems to be a promising avenue that deserves further examination. The use case that I usually gravitate towards is that of the spread of news. Before top-down mass media, news spread from mouth to mouth. Humans seem to have a gossip protocol hard-coded into their social function centre that works well for this task. To put it simply, we spread relevant information to gain status, and the receivers of this information do the same, until the information is well-known between all those that are reachable and interested. Mass media took over this function for a time, especially with regard to news that was of general interest but of course on a social circle level the old mechanisms kept working uninterrupted. With the advent of social networks, the old mechanisms are reasserting themselves, at scale. The asymmetric following model of Twitter seems well-suited for this scenario and re-tweeting also helps broadcast news further than the original receivers. Twitter is now often seen as a primary news source, where news breaks before it makes the headlines, even if the signal to noise ratio is low. What is interesting in this model is that there is a human decision at each point of re-broadcast. However, by the properties of scale-free networks, it does not require too many decisions for a piece of information to spread throughout the network. Users that spread false information or 'spam' are usually isolated from the graph, and therefore end up with little or no influence at all (with a caveat for socially advantageous falsities). Bear in mind that Twitter is not built or optimised around this model, so these effects appear only approximately. There are a number of changes that should make these effects much more pronounced, but this is a topic for another post. What should be noted is that contrary to popular belief, this hybrid man-machine system of news transmission scales pretty well. Just because human judgment is involved in multiple steps of the process, it doesn't make the system reliably slower, since nobody is on the critical path, and nobody has the responsibility of filtering all the content. A few decisions here and there are enough to keep the system working well.
Transferring this social-graph approach to search is less than straightforward. Again, looking at human societies pre-search engine, people would develop reputation for knowledge in a specific field. Questions in their field of expertise find their way to them sooner or later. If an expert did not have an answer but another did, a shift in subjective, implicit reputation would occur, which if repeated on multiple occasions would result in a shift of the the relative trust that the community places on the two experts. Applying this to internet search does not seem immediately feasible, but search engines like Aardvark and Q&A sites like StackOverflow and Yahoo! Answers seem to be heading in such a direction. Wikipedia, by having a network of editors trusted in certain fields also exhibits similar characteristics. The answer isn't as obvious in search as it is in news, and if algorithmic search engines disappeared tomorrow the world wouldn't have an plan B immediately at hand, the outline of an alternative is beginning to appear.
To conclude, the progression I see in both news and search is this:
- top-down human curated
- node-to-node (distributed)
In news this is loosely instantiated as: gossip -> newspapers -> social news sites -> twitter-like horizontal diffusion, and in search the equivalents are: community experts -> libraries / human-curated online directories -> algorithmic search engines -> social(?) search / Q&A sites. There seems to be a pattern where things are coming full circle from horizontal to vertical and back to horizontal, where the intermediate vertical step is a stopgap to allow our natural mechanisms to adapt to the new technology, scale, and vastness of information, but ultimately managing to live up to the challenge. There may be some optimism involved in my assessment as the events described have not really taken place yet. The application of this pattern to other instances of OBP such as governments and large organizations is not something I feel I could undertake for now, but I do suspect that OBP can provide a general, if not conclusive, argument for distribution of decision making as the ultimately stable state of affairs, without considering smarter than human AGI/FAI singletons.
Update: Apparently, Digg's going horizontal. This should be interesting.
Update2: I had mixed up vertical and horizontal. Another form of left-right dyslexia?