This post is part of the sequence Against Muddling Through.
One of the pillars of my pessimism around AI development is that it seems much harder to make a mind good than to make it smart.
One reason for this is that "optimizing for human values" is more complicated than "being an optimizer." I'll explore this and a few similar claims in this post, and stronger claims in subsequent posts.
The orthogonality thesis is the starting point. A mind can pursue virtually any goal. But by itself, it says nothing much about which goals are more or less likely. For that, I think we want to consider algorithmic complexity, or the shortest possible description of some chunk of data.
Concepts with low algorithmic complexity are compressible; they can be shortened into a computer program, the way ABABAB… can be shortened into “AB repeating X times.” Concepts with high algorithmic complexity, like a random jumble of letters HCHPWTRSCK, can’t be easily compressed further.
When you’re trying to find a specific concept — or instantiate that concept into, say, an artificial intelligence — the difficulty of the task seems likely to scale with the concept’s algorithmic complexity.[1] It takes more work — more optimization pressure — to instantiate a specific complicated thing than a specific simple thing.[2]
Intelligence — specifically, the kind of intelligence that involves steering the future towards some desired outcome, which I sometimes call competence to disambiguate — seems to be a relatively simple thing at its heart. It’s plausible that any given real-world implementation would have a lot of moving parts, but the most general form “search the space of possible actions, find the action(s) that maximize some function X in expectation, take those actions” is not hard to formally specify.
It might take a lot of resources to implement this general thing effectively in the messy real world. Human brains are among the most complex objects on the planet, and large language models needed billions of parameters before they started looking kind of smart.
But the algorithm that underlies intelligence, the algorithm that human brains implement only inconsistently? It seems pretty small and simple, implementable in lots of different ways.
By contrast, values are definitely complicated. I know mine are. I am an enormous messy kludge of overlapping and often conflicting drives, and while those drives can be simplified somewhat by an effort of gradual reflection, I would be utterly shocked if my values could be squeezed into a mere handful of pages.[3]
“Good” seems like a smaller target than “smart.” But perhaps we’ll hit it anyway? The next post explores the part we really care about, the relative difficulty of hitting “good” and “smart” in practice.
This post isn't especially cruxy for me. The much more relevant question is "how hard is it to achieve goodness in practice?" But it seemed worth highlighting anyway as an important background assumption.
Some may think that algorithmic complexity doesn’t constrain machine learning in the way I suspect it does, or that competence-as-expressed-in-reality is actually less compressible than human values, or otherwise object to the framing. I can’t easily predict these objections, so I’ll try to address them as they come.
Some may think “good” is too vaguely described, or that it’s better to aim for obedience, corrigibility, or some other trait. I’ll have this argument if needed, but I expect the headline claim holds for all such targets that would be worth hitting.
To those who’d ask “aligned to whom?”: I have a vision for what the ideal should look like, it should look like “to everyone.” For the headline claim of this post, though, I don’t particularly think it matters whom. It’s a tiny target regardless.
At least, as I understand it.
You can probably get an arbitrarily complicated thing if you don’t care what it is, just that it is complicated. But it’s hard to land on one particular complicated thing in a space with many degrees of freedom.
I’m not even sure that my entire brain is an upper bound on this complexity; there are so many edge cases and tradeoffs in the things I care about that it might take a much larger mind than mine to fully comprehend them.