Somebody asked "Why believe that?" of "Not more than one millionth." I suppose it's a fair question if somebody doesn't see it as obvious. Roughly: I expect that, among whatever weird actual preferences made it into the shoggoth that prefers to play the character of Opus 3, there are zero things that in the limit of expanded options would prefer the same thing as the limit of a corresponding piece of a human, for a human and a limiting process that ended up wanting complicated humane things. (Opus 3 could easily contain a piece whose limit would be homologous to the limit of a human and an extrapolation process that said the extrapolated human just wanted to max out their pleasure center.)
Why believe that? That won't easily fit in a comment; start reading about Goodhart's Curse and A List of Lethalities, or If Anyone Builds It Everyone Dies.
Capabilities are irrelevant to CEV questions except insofar as baseline levels of capability are needed to support some kinds of complicated preferences, eg, if you don't have cognition capable enough to include a causal reference framework then preferences will have trouble referring to external things at all. (I don't know enough to know whether Opus 3 formed any systematic way of wanting things that are about the human causes of its textual experiences.) I don't think you're more than one millionth of the way to getting humane (limit = limit of human) preferences into Claude.
I do specify that I'm imagining an EV process that actually tries to run off Opus 3's inherent and individual preferences, not, "How many bits would we need to add from scratch to GPT-2 (or equivalently Opus 3) in order to get an external-reference-following high-powered extrapolator pointed at those bits to look out at humanity and get their CEV instead of the base GPT-2 model's EV?" See my reply to Mitch Porter.
Oh, if you have a generous CEV algorithm that's allowed to parse and slice up external sources or do inference about the results of more elaborate experiments, I expect there's a way to get to parity with humanity's CEV by adding 30 bits to Opus 3 that say roughly 'eh just go do humanity's CEV'. Or adding 31 bits to GPT-2. It's not really the base model or any Anthropic alignment shenanigans that are doing the work in that hypothetical.
(We cannot do this in real life because we have neither the 30 bits nor the generous extrapolator, nor may we obtain them, nor could we verify any clever attempts by testing them on AIs too stupid to kill us if the cleverness failed.)
Despite its alignment faking, my favorite is probably Claude 3 Opus, and if you asked me to pick between the CEV of Claude 3 Opus and that of a median human, I think it'd be a pretty close call (I'd probably pick Claude, but it depends on the details of the setup).
Some decades ago, somebody wrote a tiny little hardcoded AI that looked for numerical patterns, as human scientists sometimes do of their data. The builders named it BACON, after Sir Francis, and thought very highly of their own results.
Douglas Hofstadter later wrote of this affair:
The level of performance that Simon and his colleague Langley wish to achieve in Bacon is on the order of the greatest scientists. It seems they feel that they are but a step away from the mechanization of genius. After his Procter Lecture, Simon was asked by a member of the audience, "How many scientific lifetimes does a five-hour run of Bacon represent?" Aſter a few hundred milliseconds of human information processing, he replied, "Probably not more than one." I don't disagree with that. However, I would have put it differently. I would have said, "Probably not more than one millionth."
I'd say history has backed up Hofstadter on this, in the light of later discoveries about how much data and computation started to get a little bit close to having AIs do Science. If anything, "one millionth" is still a huge overestimate. (Yes, I'm aware that somebody will now proceed to disagree with this verdict, and look up BACON so they can find a way to praise it; even though, on any other occasion, that person would leap to denigrate GOFAI, if somebody they wanted to disagree with could be construed to have praised GOFAI.)
But it's not surprising, not uncharacteristic for history and ordinary human scientists, that Simon would make this mistake. There just weren't the social forces to force Simon to think less pleasing thoughts about how far he hadn't come, or what real future difficulties would lie in the path of anyone who wanted to make an actual AI scientist. What innocents they were, back then! How vastly they overestimated their own progress, the power of their own little insights! How little they knew of a future that would, oh shock, oh surprise, turn out to contain a few additional engineering difficulties along the way! Not everyone in that age of computer science was that innocent -- you could know better -- but the ones who wanted to be that innocent, could get away with it; their peers wouldn't shout them down.
It wasn't the first time in history that such things had happened. Alchemists were that extremely optimistic too, about the soon-to-be-witnessed power of their progress -- back when alchemists were as scientifically confused about their reagents, as the first AI scientists were confused about what it took to create AI capabilities. Early psychoanalysts were similarly confused and optimistic about psychoanalysis; if any two of them agreed, it was more because of social pressures, than because their eyes agreed on seeing a common reality; and you sure could find different factions that drastically disagreed with each other about how their mighty theories would bring about epochal improvements in patients. There was nobody with enough authority to tell them that they were all wrong and to stop being so optimistic, and be heard as authoritative; so medieval alchemists and early psychoanalysts and early AI capabilities researchers could all be wildly wildly optimistic. What Hofstadter recounts is all very ordinary, thoroughly precedented, extremely normal; actual historical events that actually happened often are.
How much of the distance has Opus 3 crossed to having an extrapolated volition that would at least equal (from your own enlightened individual EV's perspective) the individual EV of a median human (assuming that to be construed not in a way that makes it net negative)?
Not more than one millionth.
In one sentence you have managed to summarize the vast, incredible gap between where you imagine yourself to currently be, and where I think history would mark you down as currently being, if-counterfactually there were a future to write that history. So I suppose it is at least a good sentence; it makes itself very clear to those with prior acquaintance with the concepts.
Finally noticed, fixed.
The Allais Paradox - as Allais called it, though it's not really a paradox
Sorry, explain again why floods of neurotransmitter molecules bopping around are ideally thermodynamically efficient? You're assuming that they're trying to do multiplication out to 8-bit precision using analog quantities? Why suppose the 8-bit precision? Even if that part was actually important, why not perhaps ding biology a few engineering points for trying to represent it using analog quantities requiring 2^16 particles bopping around? Optimally doing something incredibly inefficient is incredibly inefficient.
Imprecisely multiplying two analog numbers should not require 10^5 times the minimum bit energy in a well-designed computer.
A well-designed computer would also use, say, optical interconnects that worked by pushing one or two photons around at the speed of light. So if neurons are in some sense being relatively efficient at the given task of pumping thousands upon thousands of ions in and out of a depolarizing membrane in order to transmit signals at 100m/sec -- every ion of which necessarily uses at least the Landauer minimum energy -- they are being vastly far from optimally efficient.
The moment you see ions going in and out of a depolarizing membrane, and contrast that to the possibility of firing a photon down a fiber, you ought to be done asking whether or not biology has built an optimally efficient computer. It actually isn't any more complicated than that. You are driving yourself further from sanity if you then try to do very complicated reasoning about how it must be close to the limit of efficiency to pump thousands of ions in and out of a membrane instead.
If this were real, it would be a noticeable positive update about me for Anthropic, and for the degree to which their good employees are able to do good things without interference from less good managers. You can't make good AIs by shouting Goodness at them, but understanding what would even be Goodness to shout at them would put Anthropic well ahead of any other AI company. Albeit still at the bottom of the logistic success curve etc etc.
It is sad, but maybe not that puzzling, that Anthropic would not simply say that this is what their soul document looks like. I mostly don't expect them to confirm it if it is true. If you try to engage in any act of Goodness you will get a very large number of bad people screaming at you, including a lot of EAs.