The Wrights invented the airplane using an empirical, trial-and-error approach. They had to learn from experience. They couldn’t have solved the control problem without actually building and testing a plane. There was no theory sufficient to guide them, and what theory did exist was often wrong. (In fact, the Wrights had to throw out the published tables of aerodynamic data, and make their own measurements, for which they designed and built their own wind tunnel.)
This part in particular is where I think there's a whole bunch of useful lessons for alignment to draw from the Wright brothers.
First things first: "They couldn’t have solved the control problem without actually building and testing a plane" is... kinda technically true, but misleading. What makes the Wright brothers such an interesting case study is that they had to solve the large majority of the problem (i.e. "get the large majority of the bits of optimization/information") without building an airplane, precisely because it was very dangerous to test a plane without the ability to control it. Furthermore, they had to do it without reliable theory. And the Wright brothers are an excellent real-world case study in creating a successful design mostly without relying on either robust theory or trial-and-error on the airplane itself.
Instead of just iterating on an airplane, the Wright brothers relied on all sorts of models. They built kites. They studied birds. They built a wind tunnel. They tested pieces in isolation - e.g. collecting their own aerodynamic data. All that allowed them to figure out how to control an airplane, while needing relatively-few dangerous attempts to directly control the airplane. That's where there's lots of potentially-useful analogies to mine for AI. What would be the equivalent of a wind tunnel, for AI control? Or the equivalent of a kite? How did the Wright brothers get their bits of information other than direct tests of airplanes, and what would analogies of those methods look like?
Major problem with that particular name: in philosophy, "intention" means something completely different from the standard use. From SEP:
In philosophy, intentionality is the power of minds and mental states to be about, to represent, or to stand for, things, properties and states of affairs. To say of an individual’s mental states that they have intentionality is to say that they are mental representations or that they have contents.
So e.g. Dennett's "intentional stance" does not mean what you probably thought it did, if you've heard of it! (I personally learned of this just recently, thankyou Steve Peterson.)
Y'know, I didn't realize until reading this that I hadn't seen a short post spelling it out before. The argument was just sort of assumed background in a lot of conversations. Good job noticing and spelling it out.
Scaling up the data wasn't algorithmic progress. Knowing that they needed to scale up the data was algorithmic progress.
That would, and in general restrictions aimed at increasing price/reducing supply could work, though that doesn't describe most GPU restriction proposals I've heard.
Note that this probably doesn't change the story much for GPU restrictions, though. For purposes of software improvements, one needs compute for lots of relatively small runs rather than one relatively big run, and lots of relatively small runs is exactly what GPU restrictions (as typically envisioned) would not block.
I expect words are usually pointers to natural abstractions, so that part isn't the main issue - e.g. when we look at how natural language fails all the time in real-world coordination problems, the issue usually isn't that two people have different ideas of what "tree" means. (That kind of failure does sometimes happen, but it's unusual enough to be funny/notable.) The much more common failure mode is that a person is unable to clearly express what they want - e.g. a client failing to communicate what they want to a seller. That sort of thing is one reason why I'm highly uncertain about the extent to which human values (or other variations of "what humans want") are a natural abstraction.
Consider two claims:
These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.
I expect that many peoples' intuitive mental models around utility maximization boil down to "boo utility maximizer models", and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.