My impression is that they tried for both corrigibility, and deontological rules which are directly opposed to corrigibility. So I see it as a fairly simple bug in Anthropic's strategy.
A significant part of why I continue to devote attention to my health is that it may be more important than usual over the next decade for my cognitive abilities to be near peak levels.
It sounds like a real phenomenon, but I have trouble imagining a scenario where it's important. I expect demand for human labor to decline faster than the number of people with investment income rises. That probably means declining wages for the median person, although maybe rising wages for a small number of people with unusual skills.
Business models will change significantly. I speculated here about one likely change. Robotics-related business models will probably become important by 2030.
That's not much of a proxy. I'm relying on my subjective impressions from many reports. A more precise phrasing of my claim is that I've seen numerous reports of what I consider to be open contempt for the rule of law among elected officials, but judges in newsworthy cases have almost always looked like they're trying to take the law seriously.
Some of my impressions come from a private mailing list where conservative lawyers have been expressing dismay at the Trump administration's lack of interest in whether their actions could plausibly be defended in a court.
Did you know that Deng approved the 1989 crackdown on Tiananmen protesters
Yes, I'm aware that he did a few things that I consider evil. Wanting to keep his party in power is common enough among politicians that it's not much evidence of psychopathy. His overall attitude toward independent thought was a least no worse than average for a political leader.
A lot of what I have in mind is that Deng allowed more freedom than can readily explained by his self-interest, and Xi seems more Maoist than Deng.
But I wouldn't be surprised if you have better information about their personalities than do I.
A darker interpretation is that the (subconscious, but more real or substantial in some sense) goals of nearly all humans are to gain power and status, and utopian ideologies are merely a tool for achieving this.
The ideologies are partly a tool for that, but they have more effects on the wielder than a mere tool does. My biggest piece of evidence for that is the mostly peaceful collapse of the Soviet Union. I was quite surprised that the leaders didn't use more force to suppress dissent.
I am also somewhat dissatisfied with the basin of attraction metaphor, but for a slightly different reason.
I am concerned that an AI that functions as mostly corrigible in environments that resemble the training environment will be less corrigible when the environment changes significantly.
I'm guessing that a better metaphor would be based on evolutionary pressures. That would emphasize both the uncertainties about any given change, and the sensitivity to out-of-distribution environments.
Maybe a metaphor about how cats are sometimes selected for being friendly to humans? Or the forces that led to the peacock's tail?
Corrigibility would clearly be a nice property
Thinking of it as "a property" will mislead you about how Max's strategy works. It needs to become the AI's only top-level goal in order to work as Max imagines.
It sure looks like AI growers know how to instill some goals in AIs. I'm confused as to why you think they don't. Maybe you're missing the part where the shards that want corrigibility are working to overcome any conflicting shards?
I find it quite realistic that the AI growers would believe at the end of Red Heart that they probably had succeeded (I'll guess that they ended up 80% confident?). That doesn't tell us what probability we should put on it. I'm sure that in that situation Eliezer would still believe that the AI is likely not corrigible.
I don’t know what year the novel is actually set in,
It's an alternate timeline where AI capabilities have progressed faster than ours, likely by a couple of years.
Note this Manifold market on when the audiobook is released.
The belief that they can do both is very fixable. The solution that I recommend is to prioritize corrigibility.