Critiquing "What failure looks like"
I find myself somewhat confused as to why I should find Part I of “What failure looks like” (hereafter "WFLL1", like the pastry) likely enough to be worth worrying about. I have 3 basic objections, although I don't claim that any are decisive. First, let me summarize WFLL1 as I understand it: In general, it's easier to optimize easy-to-measure goals than hard-to-measure ones, but this disparity is much larger with ML models than with humans and human-made institutions. As special-purpose AI becomes more powerful, this will lead to a form of differential progress where easy-to-measure goals become optimized well past the point when they correlated with what we actually want. (See also: this critique, although I agree with the existing rebuttals to it). Objection 1: Historical precedent In the late 1940s, George Dantzig invented the simplex algorithm, a practically efficient method for solving linear optimization problems. At the same time, the first modern computers were coming around, which he had access to as a mathematician in the US military. For Dantzig and his contemporaries, a wide class of previously intractable problems suddenly became solvable, and they did use the new methods to great effect, playing a major part in developing the field of operations research. With the new tools in hand, Dantzig also decided to use simplex to optimize his diet. After carefully poring over prior work, and putting in considerable effort to obtain accurate data and correctly specify the coefficients, Dantzig was now ready, telling his wife: > whatever the [IBM] 701 says that's what I want you to feed me each day starting with supper tonight. The result included 500 gallons of vinegar. After delisting vinegar as a food, the next round came back with 200 boullion cubes/day. There were several more iterations, none of which worked, and after everything Dantzig simply went with a "common-sense" diet. The point I am making is, whenever we create new methods for solving p