I don't think that the example of kings losing their powers really supports your thesis here. That wasn't a seamless, subtle process of power slipping away. There was a lot of bloodshed and threat of bloodshed involved.
King Charles I tried to exercise his powers as a real king and go against the Parliament, but the people rebelled and he lost his head. After that, his son managed to restore the monarchy, though he needed to agree to some more restrictions on his powers. After that, James II tried to go against the Parliament again, and got overthrown and replaced by another guy who agreed to relinquish the majority of royal powers. After that, the king still had some limited say, but he they tried to do unpopular taxes in America, the colonies rebelled, and gained independence through a violent revolution. Then next door from England, Louis XVI tried to go against the will of his Assembly, and lost his head. After these, the British Parliament started to politely ask their kings to relinquish the remainder of their powers, and they wisely agreed, so their family could keep their nominal rulership, their nice castle, and most importantly, their head.
I think the analogous situation would be AIs violently over-taking some countries, and after that, the other countries bloodlessly surrendering to their AIs. I think this is much closer to the traditional picture of AI takeover than to the picture you are painting in Gradual Disempowerment.
Unfortunately, I don't think that "this is how science works" is really true. Science focuses on having a simple description of the world, while Solomonoff induction focuses on the description of the world plus your place in it, being simple.
This leads to some really weird consequences, which people sometimes refer to as the Solomonoff induction being malign.
Even more dramatically, it looks like Haiti's GDP per capita is still lower today than what it was during the time of slavery in the 1770s. This of course doesn't mean that the Haitians were better off back then than they are now (Haitian slavery was famously brutal, I think significantly worse even than US slavery). Still, it's an interesting data point for how efficient slavery-based cash crop production was in some places.
(My main source is this paper on Haitian economic history, plus looking at historical franc to usd conversion rates and inflation calculators.)
Counter-evidence: I first read and watched the play in Hungarian translation, where there is no confusion about "wherefore" and "why". It still hasn't occurred to me that the line doesn't make sense, and I've never heard anyone else in Hungary pointing this out either.
I also think you are too literal-minded in your interpretation of the line, I always understood it to mean "oh Romeo, why are you who you are?" which makes perfect sense.
Interesting result.
Did you experiment with how the result depends on the base rate of hacking?
Suppose you start with a model that you already finetuned to reward-hack 90% of the time. Then you do recontextualized training on non-hack examples. I assume this would decrease reward-hacking, as it doesn't make much of a difference that the training teaches the model to consider hacking (it already does), but the main effect is that it trains on examples where the model eventually doesn't hack. While in your case, the base rate was low enough that teaching the model to consider reward-hacking was the stronger effect.
Do you agree with this assessment? Do you have a guess what the level of base-rate hacking is where recontextualized learning on non-hack example starts to decrease reward-hacking?
Other question:
What do you think would happen if you continued your recontextualized training on non-hack examples for much longer? I would expect that in the long run, the rate of reward-hacking would go down, eventually going below the original base-rate, maybe approaching zero in the limit. Do you agree with this guess? Did you test how the length of training affects the outcome?
Sorry, I made a typo, the Fan Hui match was in 2015, I have no idea why I wrote 2021.
I think Scott's description is accurate, though it leaves out the years from 2011-2015 when AIs were around the level of the strongest amateurs, which makes the progress look more discontinuous than it was.
Separately from my specific comment on Go, I think that "people are misinformed in one direction, so I will say something exaggerated and false in the other direction to make them snap out of their misconception" is not a great strategy. They might notice that the thing you said is not true, ask a question on it, and then you need to back-track and they get confirmation of their belief that these AI people always exaggerate everything.
I have once seen an AI safety advocate once talking to a skeptical person who was under the impression that AIs still can't piece together three logical steps. The advocate at some point said the usual line about the newest AIs having reached "PhD level capabilities" and the audience immediately called them out on that, and then they needed to apologize that of course they only meant PhD-level on specific narrow tests, and they didn't get to correct any of the audience's misconceptions.
Also, regardless of strategic considerations, I think saying false things is bad.
One could argue that Go engines instantly went from "can't serve as good opponents to train against" to "vastly outstripping the ability of any human to serve as a training opponent" in a similar way.
This is still not true. In 2011, Zen was already 5 (amateur) dan, which is better than the vast majority of hobbyists, and I've known people use Zen as a a training opponent. I think by 2014 it was already useful as a training partner even for people who were preparing for getting their professional certification.
And even at the professional level, 'instantly' is still an exaggeration. AlphaGO defeated the professional Go player and European champion Fan Hui in October 2015, and Lee Sedol still said at the time that he could defeat AlphaGo, and I think he was probably right. It took another half year, until March 2016 for Lee Sedol to play against AlphaGo, where AlphaGo won, but still didn't vastly outstrip human ability: Lee Sedol still won one out of the five matches.
(Also, this is nitpicking, but if you restrict the question to a computer serving as a training partner in Go, then I'm not sure that even now the computers vastly outstrip the human ability. There are advantages of training against the best Go programs, but I don't think they are that vast, most of the variance is still in how the student is doing, and I'm pretty sure that professional players still regularly train against other humans too.)
In the Euripides play, I think the moral message is fairly clear: sacrificing an innocent for the greater good (as Agamemnon wants to do) is a vile, cowardly act, but sacrificing yourself for the greater good (as Iphigenia volunteers in the end) is heroic.
I think this a quite good and maybe nontrivial moral message, but I wouldn't classify a play written by a professional playwright in the highly civilized Athens as a myth. And I don't know if we have good records of what the older, folk version of the myth said, and whether it had a positive message.
On the other hand, there is another interesting factor in kings losing power that might be more related to what you are talking about (though I don't think this factor is as important as the threat of revolutions discussed in the previous comment).
My understanding is that part of the story for why kings lost their power is that the majority of people were commoners, so the best writers, artists and philosophers were commoners (or at least not the highest aristocrats), and the kings and the aristocrats read their work, and these writer often argued for more power to the people. The kings and aristocrats sometimes got sincerely convinced, and agreed to relinquish some powers even when it was not absolutely necessary for preempting revolutions.
I think this is somewhat analogous to the story of cultural AI dominance in Gradual Disempowerment: all the most engaging content creators are AIs, humans consume their content, the AIs argue for giving power to AIs, and the humans get convinced.
I agree this is a real danger, but I think there might be an important difference between the case of kings and the AI future.
The court of Louis XVI read Voltaire, but I think if there was someone equally witty to Voltaire who also flattered the aristocracy, they would have plausibly liked him more. But the pool of witty people was limited, and Voltaire was far wittier than any of the few pro-aristocrat humorists, so the royal court put up with Voltaire's hostile opinions.
On the other hand, in a post-AGI future, I think it's plausible that with a small fraction of the resources you can get close to saturating human engagement. Suppose pro-human groups fund 1% of the AIs generating content, and pro-AI groups fund 99%. (For the sake of argument, let's grant the dubious assumption that the majority of economy is controlled by AIs.) I think it's still plausible that the two groups can generate approximately equally engaging content, and if humans find pro-human content more appealing, then that just wins out.
Also, I'm kind of an idealist, and I think part of the reason that Voltaire was successful is that he was just right about a lot of things, parliamentary government really leads to better outcomes than absolute monarchy from the perspective of a more-or-less shared human morality. So I have some hope (though definitely not certainty) that AI content creators competing in a free marketplace of ideas will only convince humanity to voluntarily relinquish power if relinquishing power is actually the right choice.