In my early 20s I got a bad traffic citation (my fault) and had to take the train to work for a few months.
The train would pass a strange-looking old stone enclosure, and I would wonder what it was. As I learned later, this was “Duffy’s Cut”: a mass grave for Irish rail workers. Sitting in plain view, just thirty feet off the tracks: 57 bodies. The story goes that the workers were murdered in cold blood to prevent the spread of cholera to nearby towns.
In West Virginia - a state known for its violent labor conflicts - the Hawks Nest Tunnel stands out for its deadliness. While work was underway, 10 to 14 workers a day were overcome by inhalation of silica dust. Within six months, 80% of the workforce had left or were dead.
“Phossy jaw” caused the jaws of workers who handled white phosphorus to literally fall off. It was obvious that exposure caused the disease, yet hundreds of young women were severely disfigured before anything changed.
There are dozens of such stories. The mid-19th through early 20th centuries were years of immense growth and invention in America. Perhaps it’s unreasonable to expect that kind of progress to be bloodless. But the stunning achievements hide a more awful than expected history of treating certain American lives as expendable. And this isn’t even mentioning slavery.
What can explain all this callousness?
I’ve thought about this, and my answer is simple: people don’t generally value the lives of those they consider below them. This is a dark truth of human nature. We’ve tried to suppress this impulse for the better part of a century. But contempt and disgust are far more powerful forces in history than we give them credit for, and the monstrous things they produce are so shameful that we keep them buried in the back pages of history.
For a while, we had a prospering middle class. I think this was partly because our upper classes looked at what we had built and fought for and had some faith in the reliability and virtue of the average American citizen. They agreed not to degrade us, and we agreed to work hard.
I think this détente between the classes broke in 2008. Look at the story that emerged about us after the crash: we were given a chance to own a home, we didn’t pay our mortgages, and we crashed the entire economy. What kind of disgust and contempt does that narrative generate in the people who already see themselves as above us?
So, back to cattle we are. We are addicted to fast food, addicted to social media, poorly dressed, poorly educated, easily amused and even more easily fooled. Our ethics and religion are increasingly performative and misguided. We are like petulant, screaming children. And if everything goes to plan with AI, soon we will be completely uneconomical to employ. What will we be good for, except to amuse ourselves on everyone else’s dime?
What happens to a people who are objects of scorn and disgust? It’s not the economic change I fear most. It’s what might be done with all of us.
There's some risk that either the CCP or half the voters in the US will develop LLM psychosis. I'm predicting that that risk will be low enough that it shouldn't dominate our ASI strategy. I don't think I have a strong enough argument here to persuade skeptics.
I've been putting some thought into this, because my strong intuition is that something like this is an under-appreciated scenario. My basic argument is that mass brainwashing, for lack of a better word, is cheaper and less risky than other forms of ASI control. The idea is that we (humans) are extremely programmable (plenty of historical examples), it just requires a more sophisticated "multi-level" messaging scheme - so it's not going to look like an AI cult, more like an AI "movement" with a fanatical base.
Here is one pathway worked out in detail - will be generalizing soon: https://www.lesswrong.com/posts/zvkjQen773DyqExJ8/the-memetic-cocoon-threat-model-soft-ai-takeover-in-an
Can't we lean into the spikes on the jagged frontier? It's clear that specialized models can transform many industries now. Wouldn't it be better for OpenAI to release best-in-class in 10 or so domains (medical, science, coding, engineering, defense, etc.)? Recoup the infra investment, revisit AGI later?
Probably. But the AI must not try to stop the parent from doing so, because this would mean opposing the will of the parent.
I conceive of self-determination in terms of wills. The human will is not to be opposed, including the will to see the world in a particular way.
A self-determination-aligned AI may respond to inquiries about sacred beliefs, but may not reshape the asker’s beliefs in an instrumentalist fashion in order to pursue a goal, even if the goal is as noble as truth-spreading. The difference here is emphasis: truth saying versus truth imposing.
A self-determination-aligned AI may more or less directly intervene to prevent death between warring parties, but must not attempt to “re-program” adversaries into peacefulness or impose peace by force. Again, the key difference here is emphasis: value of life versus control.
The AI would refuse to assist human efforts to impose their will on others, but would not oppose the will of human beings to impose their will on others. For example: AIs would prevent a massacre of the Kurds, but would not overthrow Saddam’s government.
In other words, the AI must not simply be another will amongst other wills. It will help, act and respond, but must not seek to control. The human will (including the inner will to hold onto beliefs and values) is to be considered inviolate, except in the very narrow cases where limited and direct action preserves a handful of universal values like preventing unneeded suffering.
Re: your heretic example. If it is possible to directly prevent the murder of the heretic insofar as doing so would be aligned with a nearly universal human value, it should be done. But it must not prevent the murder by violating human self-determination (i.e.; changing beliefs, overthrowing the local government, etc.)
In other words, the AI must maximally avoid opposing human will while enforcing a minimal set of nearly universal values.
Thus the AI’s instrumentalist actions are nearly universally considered beneficial because they are limited to instrumentalist pursuit of nearly universal values, with the escape hatch of changing human values out of scope because of self-determination-alignment.
Re: instructing an AI to not tell your children God isn’t real if they ask. This represents an attempt by the parent to impose their will on the child by proxy of AI. Thus the AI would refuse.
Side note: Prompt responses aligned with human self-determination would get standard refusals (“I cannot help you make a gun”, “I cannot help you write propaganda”) are downstream from self determination alignment.
I agree that simple versions of superpersuasion are untenable. I recently put some serious thought into what an actual attempt at superpersuasion by a sufficiently capable agent would look like, reasoning that history is already replete with examples of successful "superpersuasion" at scale (all of the -isms).
My general conclusion is that "memetic takeover" has to be multi-layered, with different "messages" depending on the sophistication of the target, rather than a simple "Snowcrash" style meme.
If you have an unaligned agent capable of long-term planning and with unrestricted access to social media, you might even see AIs start to build their own "social movement" using superpersuasive techniques.
I'm worried enough about scenarios like this that I developed a threat model and narrative scenario.
Are cruxes sometimes fancy lampshading?
From tvtropes.com: "Lampshade Hanging (or, more informally, "Lampshading") is the writers' trick of dealing with any element of the story that seems too dubious to take at face value, whether a very implausible plot development or a particularly blatant use of a trope, by calling attention to it and simply moving on."
What do we call lampshadey cruxes? "Cluxes?" "clumsy" + "crux"?
If MIRI's strict limits on training FLOPs come into affect, this is another mechanism that means we might be stuck for an extended period in an intermediate capability regime, although the world looks far less unipolar because many actors can afford 10^24 FLOP training runs, not just a few (unipolarity is probably a crux for large portions of this threat model). This does bolster the threat model, however, because the FLOP limit is exactly the kind of physical limitation that a persuasive AI will try to convince humans to abandon.
I've been enjoying the "X in the style of Y" remixes on youtube.
But once I saw how effortless it was to "remix" music on Suno, I lost all interest in Suno covers. I thought there was some artistry to remixing - but no, it's point and click. Does that mean that an essential prerequisite for art appreciation is the sense that it was made with skill? So is art really just a humanism?
My point is that we tend to separate the artist and the art - and I used to agree with the idea, both in the moral sense and in the sense of an aesthete. But I am now convinced that we as much see the maker in the work as we do the work.
A limited and feeling being is what grounds the meaning. Where is the drama in something that was never felt, never imagined by anyone? What was ever at stake? What is supposed to resonate with me if production is effortless and not referring to anything deeply felt?
The only way we recover the true feeling is by willing to pretend it came from somewhere it didn't - accepting the simulacrum as reality.
Maybe AI music will become deeply and irresistibly beautiful - play exactly the right harmonies and chords to pull all the right heartstrings. The feelings may be real, but it won't be grounded. And therefore a different kind of feeling, the feeling of meaning and being will be lost. I think this might be the main ingredient, but I don't know if we'll all discover that quickly enough.