Research Scientist at DeepMind. Creator of the Alignment Newsletter. http://rohinshah.com/
This just doesn't match my experience at all. Looking through my past AI papers, I only see two papers where I could predict the results of the experiments on the first algorithm I tried at the beginning of the project. The first one (benefits of assistance) was explicitly meant to be a "communication" paper rather than a "research" paper (at the time of project initiation, rather than in hindsight). The second one (Overcooked) was writing up results that were meant to be the baselines against which the actual unpredictable research (e.g. this) was going to be measured against; it just turned out that that was already sufficiently interesting to the broader community.
(Funny story about the Overcooked paper; we wrote the paper + did the user study in ~two weeks iirc, because it was only two weeks before the deadline that we considered that the "baseline" results might already be interesting enough to warrant a conference paper. It's now my most-cited AI paper.)
(I'm also not actually sure that I would have predicted the Overcooked results when writing down the first algorithm; the conceptual story felt strong but there are several other papers where the conceptual story felt strong but nonetheless the first thing we tried didn't work. And in fact we did have to make slight tweaks, like annealing from self-play to BC-play over the course of training, to get our algorithm to work.)
A more typical case would be something like Preferences Implicit in the State of the World, where the conceptual idea never changed over the course of the project, but:
If you want a deep learning example, consider Learning What To Do by Simulating the Past. The biggest example here is the curriculum -- that was not part of the original pseudocode I had written down and was crucial to get it to work.
You might look at this and think that "but the conceptual idea predicted the experiments that were eventually run!" I mean, sure, but then I think your crux is not "were the experiments predictable", rather it's "is there any value in going from a conceptual idea to a working implementation".
It's also pretty easy to predict the results of experiments in a paper, but that's because you have the extra evidence that you're reading a paper. This is super helpful:
This is also why I often don't report on experiments in papers in the Alignment Newsletter; usually the point is just "yes, the conceptual idea worked".
I don't know if this is actually true, but one cynical take is that people are used to predicting the results of finished ML work, where they implicitly use (1) and (2) above, and incorrectly conclude that the vast majority of ML experiments are ex ante predictable. And now that they have to predict the outcome of Redwood's project, before knowing that a paper will result, they implicitly realize that no, it really could go either way. And so they incorrectly conclude that of the ML experiments, Redwood's project is a rare unpredictable one.
It also seems uncharitable to go from (A) "exaggerated one of the claims in the OP" to (B) "made up the term 'fake' as an incorrect approximation of the true claim, which was not about fakeness".
You didn't literally explicitly say (B), but when you write stuff like
The term ‘faking’ here is turning a claim of ‘approaches that are being taken mostly have epsilon probability of creating meaningful progress’ to a social claim about the good faith of those doing said research, and then interpreted as a social attack, and then therefore as an argument from authority and a status claim, as opposed to pointing out that such moves don’t win the game and we need to play to win the game.
I think most (> 80%) reasonable people would take (B) away from your description, rather than (A).
Just to be totally clear: I'm not denying that the original comment was uncharitable, I'm pushing back on your description of it.
That's a good example, thanks :)
EDIT: To be clear, I don't agree with
But at the same time, I think that Abram wins hands-down on the metric of "progress towards AI alignment per researcher-hour"
but I do think this is a good example of what someone might mean when they say work is "predictable".
In the comments to the OP that Eliezer’s comments about small problems versus hard problems got condensed down to ‘almost everyone working on alignment is faking it.’ I think that is not only uncharitable, it’s importantly a wrong interpretation [...]
Note that there is a quote from Eliezer using the term "fake":
And then there is, so far as I can tell, a vast desert full of work that seems to me to be mostly fake or pointless or predictable.
It could certainly be the case that Eliezer means something else by the word "fake" than the commenters mean when they use the word "fake"; it could also be that Eliezer thinks that only a tiny fraction of the work is "fake" and most is instead "pointless" or "predictable", but the commenters aren't just creating the term out of nowhere.
^ This response is great.
I also think I naturally interpreted the terms in Adam's comment as pointing to specific clusters of work in today's world, rather than universal claims about all work that could ever be done. That is, when I see "experimental work and not doing only decision theory and logic", I automatically think of "experimental work" as pointing to a specific cluster of work that exists in today's world (which we might call mainstream ML alignment), rather than "any information you can get by running code". Whereas it seems you interpreted it as something closer to "MIRI thinks there isn't any information to get by running code".
My brain insists that my interpretation is the obvious one and is confused how anyone (within the AI alignment field, who knows about the work that is being done) could interpret it as the latter. (Although the existence of non-public experimental work that isn't mainstream ML is a good candidate for how you would start to interpret "experimental work" as the latter.) But this seems very plausibly a typical mind fallacy.
EDIT: Also, to explicitly say it, sorry for misunderstanding what you were trying to say. I did in fact read your comments as saying "no, MIRI is not categorically against mainstream ML work, and MIRI is not only working on HRAD-ish stuff like decision theory and logic, and furthermore this should be pretty obvious to outside observers", and now I realize that is not what you were saying.
(Responding to entire comment thread) Rob, I don't think you're modeling what MIRI looks like from the outside very well.
I don't particularly agree with Adam's comments, but it does not surprise me that someone could come to honestly believe the claims within them.
That one makes sense (to the extent that Eliezer did confidently predict the results), since the main point of the work was to generate information through experiments. I thought the "predictable" part was also meant to apply to a lot of ML work where the main point is to produce new algorithms, but perhaps it was just meant to apply to things like Ought.
A confusion: it seems that Eliezer views research that is predictable as basically-useless. I think I don't understand what "predictable" means here. In what sense is expected utility quantilization not predictable?
Maybe the point is that coming up with the concept is all that matters, and the experiments that people usually do don't matter because after coming up with the concept the experiments are predictable? I'm much more sympathetic to that, but then I'm confused why "predictable" implies "useless"; many prosaic alignment papers have as their main contribution a new algorithm, which seems like a similar type of thing as quantilization.
I assume that meant "instead of 80% of the value for 20% of the effort, we're now at least at 85% of the value for 37% of the effort", which parses fine to me
Caveat: epistemic status of all of this is somewhat tentative, but even if you assign e.g. only 70% confidence in each claim (which seems reasonable) and you assign a 50% hit to the reasoning from sheer skepticism, naively multiplying it out as if all of the claims were independent still leaves you with a 12% chance that your brain is doing this to you, which is large enough that it seems at least worth a few cycles of trying to think about it and ameliorate the situation.
Fwiw, my (not-that-well-sourced-but-not-completely-made-up) impression is that the overall story is a small extrapolation of probably-mainstream neuroscience, and also consistent with the way AI algorithms work, so I'd put significantly higher probability on it (hard to give an actual number without being clearer about the exact claim).
(For someone who wants to actually check the sources, I believe you'd want to read Peter Dayan's work.)
(I'm not expressing confidence in specific details like e.g. turning sensory data into implicit causal models that produce binary signals.)