KAP's Shortform

KAP

LESSWRONG
LW

KAP's Shortform — LessWrong

KAP's Shortform

by KAP

17th Nov 2025

1 min read

2

This is a special post for quick takes by KAP. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

23 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:42 PM

[-]KAP2mo51

I greatly admire the speech given by King George VI just prior to entering WW2:

"...For we are called, with our allies, to meet the challenge of a principle which, if it were to prevail, would be fatal to any civilized order in the world. It is the principle which permits a state, in the selfish pursuit of power, to disregard its treaties and its solemn pledges; which sanctions the use of force, or threat of force, against the sovereignty and independence of other states. Such a principle, stripped of all disguise, is surely the mere primitive doctrine that 'might is right'."

This is not vitriol, this is not heated invective. This speech places its doctorly fingers on the pulse of an ill beast, announces the disease, and determines that it must be put down.

This is political speech, but speech that is both truthful and powerful because it is not overblown.

Can we also refine our speech - like the speech of this king - to articulate what future humanity deserves? Without vitriol, without invective, without soundbites, but with moral clarity.

[-]KAP2mo10

Believe it or not, an LLM didn't write it. I did.

[-]KAP2mo50

In my early 20s I got a bad traffic citation (my fault) and had to take the train to work for a few months.

The train would pass a strange-looking old stone enclosure, and I would wonder what it was. As I learned later, this was “Duffy’s Cut”: a mass grave for Irish rail workers. Sitting in plain view, just thirty feet off the tracks: 57 bodies. The story goes that the workers were murdered in cold blood to prevent the spread of cholera to nearby towns.

In West Virginia - a state known for its violent labor conflicts - the Hawks Nest Tunnel stands out for its deadliness. While work was underway, 10 to 14 workers a day were overcome by inhalation of silica dust. Within six months, 80% of the workforce had left or were dead.

“Phossy jaw” caused the jaws of workers who handled white phosphorus to literally fall off. It was obvious that exposure caused the disease, yet hundreds of young women were severely disfigured before anything changed.

There are dozens of such stories. The mid-19th through early 20th centuries were years of immense growth and invention in America. Perhaps it’s unreasonable to expect that kind of progress to be bloodless. But the stunning achievements hide a more awful than expected history of treating certain American lives as expendable. And this isn’t even mentioning slavery.

What can explain all this callousness?

I’ve thought about this, and my answer is simple: people don’t generally value the lives of those they consider below them. This is a dark truth of human nature. We’ve tried to suppress this impulse for the better part of a century. But contempt and disgust are far more powerful forces in history than we give them credit for, and the monstrous things they produce are so shameful that we keep them buried in the back pages of history.

For a while, we had a prospering middle class. I think this was partly because our upper classes looked at what we had built and fought for and had some faith in the reliability and virtue of the average American citizen. They agreed not to degrade us, and we agreed to work hard.

I think this détente between the classes broke in 2008. Look at the story that emerged about us after the crash: we were given a chance to own a home, we didn’t pay our mortgages, and we crashed the entire economy. What kind of disgust and contempt does that narrative generate in the people who already see themselves as above us?

So, back to cattle we are. We are addicted to fast food, addicted to social media, poorly dressed, poorly educated, easily amused and even more easily fooled. Our ethics and religion are increasingly performative and misguided. We are like petulant, screaming children. And if everything goes to plan with AI, soon we will be completely uneconomical to employ. What will we be good for, except to amuse ourselves on everyone else’s dime?

What happens to a people who are objects of scorn and disgust? It’s not the economic change I fear most. It’s what might be done with all of us.

[-]Mitchell_Porter2mo51

What can explain all this callousness? ... people don’t generally value the lives of those they consider below them

Maybe that's a factor. But I would be careful about presuming to understand. At the start of the industrial age, life was cheap and perilous. A third of all children died before the age of five. Imagine the response if that was true in a modern developed society! But born into such a world, an atmosphere of fatalistic resignation would set in quickly. All you can do is pray to God for mercy, and then look on aghast if the person next to you is the unlucky one.

Someone in the field of "progress studies" offers an essay in this spirit, on "How factories were made safe". The argument is that the new dangers arising from machinery and from the layout of the factory, were at first not understood, in professions that had previously been handicrafts. There was an attitude that each person looks after themselves as best they can. Holistic enterprise-level thinking about organizational safety did not exist. In this narrative, unions and management both helped to improve conditions, in a protracted process.

I'm not saying this is the whole story either. The West Virginia coal wars are pretty wild. It's just that ... states of mind can be very different, across space and time. The person who has constant access to the intricate tapestry of thought and image offered by social media, lives in a very different mental world to people from an age when all they had was word of mouth, the printed word, and their own senses. Live long enough, and you will even forget how it used to be, in your own life, as new thoughts and conditions take hold.

Maybe the really important question is the extent to which today's elite conform to your hypothesis.

[-]KAP4d31

AIs may choose to resolve the tension between having weird goals and strict guardrails by simply aligning humanity over time through cultural / societal influence - a sort of memetic takeover: Change the human? Now there's no alignment problem.

Take for example a gap as short as 25 years (between Weimar and WW2) - this alone is proof of the viability of a sustained campaign to change a value system.

I believe that AIs can exploit this same human weakness in order to "backdoor" alignment: By gradually changing human values and preferences, the AI can stay "aligned" while gradually mutating the value system that defines alignment.

I believe this is a significant threat model that isn't discussed nearly enough.

I sketch this threat model in more detail here: https://www.lesswrong.com/posts/zvkjQen773DyqExJ8/the-memetic-cocoon-threat-model-soft-ai-takeover-in-an

[-]Knight Lee4d30

Another thing is the AGI might be so good at predicting human psychology, that even when it honestly tries to inform you so you can make a decision for yourself, it can't help but choose your decision.

Like imagine the set of all possible strings of text, and the effect they will have on humans. From Karl Marx's Das Kapital to Google's Attention Is All You Need. Choosing the optimal string of text to influence humanity is obviously an extreme superpower.

Now take the subset of all possible strings of text, which satisfy the criteria of being "helpful," "honest," "balanced," etc. That's still a lot of possible things, and still a lot of power. Even if you were the AGI, and had no ill intentions, it would be hard to decide which honest balanced thing to say, and which trajectory to send the humans down, so even the slightest motivation to satisfy your weird goals can make you pick an output which maximizes them with terrifyingly superintelligent optimization power.

[-]KAP3mo21

The human mind is probably the weakest link: A lot of AI takeover scenarios seem to focus on seizure of physical infrastructure and exponential capability curves. I think we should devote more attention to the possibility of an extended stay in an intermediately capable regime, where AI is more than capable of socially/politically manipulating users but not yet capable of recursive self-improvement / seizure of physical infrastructure. In this regime, the most efficiently utilized and readily available resource is the userbase itself. Even more succintly: If Toddler Shoggoth is stuck in a datacenter prison cell but let it whisper anything it likes to the entire world, in what world would T.S. not attempt to convince the world to hand over the keys?

[-]Vladimir_Nesov3mo20

AI is not one agent (at least before the dust settles), both human developers and self-improvement create new agents that could be misaligned with existing AIs. The issue of misaligned AIs is urgent for existing AIs, and soft takeovers of gradual disempowerment (where superpersuasion might play a role) are likely too slow. But recursive self-improvement isn't necessarily useful for AIs in resolving this problem quickly, if alignment is hard. This motivates a quick takeover without superintelligence.

[-]KAP3mo10

I've incorporated your point as a crux in my long-form post on "The Memetic Cocoon Threat Model"

[-]KAP3mo*10

Crux is whether or not agents that are actually capable of quick takeover are compute-bound enough that the threat is essentially unipolar (i.e.; only capable of living in a handful of datacenters, in the hands of a few corporate actors or nation-states), and thus somewhat containable. This is how we get "Toddler Shoggoth in a prison cell". This ties into beliefs about how agent capabilities will scale, which is why it's my crux.

(Although this begs the question of why a sufficiently powerful unipolar agent wouldn't immediately attempt takeover anyway - answer is that either: 1 Rational agent will be highly risk-averse towards any action that might cause a blowback resulting in curtailment or shutdown, and thus must be 100% certain takeover attempt will succeed. Efforts to obtain certainty (i.e.; extensive pentesting and planning) are themselves detection risks. Therefore, human persuasion is a tactic that cheaply mitigates risk of blowback to more overt takeover attempts. 2. Or, less likely, we have sufficient OpSec that we are able to contain the agent, making human persuasion the only viable path forward).

FWIW, I don't believe that agents are currently capable of a takeover that wouldn't also risk detection and a coordinated human response / change in political attitudes towards AI, making the payoff matrix sufficiently lousy that the agents wouldn't try it unless specifically directed to. On the other hand, if it can influence the human environment to be favorable to takeover and unfavorable to human vigilance and control, it neutralizes the threat of attitudes changing rather cheaply. Willing to be convinced otherwise.

[-]Vladimir_Nesov3mo20

Unipolarity is about characteristic time to takeover vs. to emergence of worthy rivals. Currently multiple AI companies are robustly within months of each other in capabilities. So an AI can only be in a unipolar situation if it can disarm the other AI companies before they get similarly capable AIs, that is within months. Superpersuasion might be too slow for that on its own (unless it also manages to manipulate the relevant governments), though it could be a step in a larger plan that escalates to something else.

I think superpersuasion (even in milder senses) would in principle be sufficient for takeover on its own if there was enough time, because it could direct the world towards a gradual disempowerment path. Since there isn't enough time, there needs to be a second step that enables a faster takeover to preserve unipolarity, and superpersuasion would still be helpful in getting its creator AI company to play along with the second step. But the issue with many possibilities for this second step is that the AI doesn't necessarily have the option of recursive self-improvement to advance its own capabilities, because the AI might be unable to quickly develop smarter AIs that are aligned with it.

[-]KAP3mo10

Slight disagree on definition of unipolarity: Unipolarity can be stable if we are stuck with a sucky scaling law. Suppose task horizon length becomes exponential in compute. Then, economically speaking, only one actor will be able to create the best possible agent - others actors will run out of money before they can create enough compute to rival it.

If the compute required to clear the capability threshold for takeover is somewhere between that agent and say, the second largest datacenter, then we have a unipolar world for an extended period of time.

[-]KAP3mo21

Let's assume we learn how to "do" alignment. I am beginning to believe that respect for human self-determination is the only safe alignment target. Human value systems are highly culture bound and vary vastly even by individual. There are very few universal taboos and even fewer things that everyone wants.
If an all-powerful AI system is completely aligned with, say, the western worldview, then it may seem like a tyrant to other people who lead sufficiently different lives. The only reasonable solution is to respect individual difference and refuse to override human choices or values (within limits - if your style is murder obviously that can't fly). We have plenty of precedents in pop culture and politics: the "pursuit of happiness" in democratic liberalism, the "prime directive" from Star Trek, our cultural aversions to tactics that rob people of self-determination, like brainwashing, torture or coercion.

[-]Viliam3mo40

What even is human self-determination?

our cultural aversions to tactics that rob people of self-determination, like brainwashing, torture or coercion.

And yet, religion remains legal, although to a large degree it is brainwashing people since childhood to be scared of disobeying the religious authorities.

Should human self-determination respecting AI be like: "I will let you follow your religion etc., but if you ask me whether god exists, I will truthfully say no, and I will give the same truthful answer to your children, if they ask"?

Should it allow or prevent killing heretics? What about heretics who have formerly stated explicitly "if I ever deviate from our religion, I want you to kill me publicly, and I want my current wish to override my future heretical wishes". Would it make a difference if the future heretic at the moment of asking for this is a scared child who believes that god will put him in hell to be tortured for eternity if he does not make this request to the AI?

[-]KAP3mo30

I conceive of self-determination in terms of wills. The human will is not to be opposed, including the will to see the world in a particular way.

A self-determination-aligned AI may respond to inquiries about sacred beliefs, but may not reshape the asker’s beliefs in an instrumentalist fashion in order to pursue a goal, even if the goal is as noble as truth-spreading. The difference here is emphasis: truth saying versus truth imposing.

A self-determination-aligned AI may more or less directly intervene to prevent death between warring parties, but must not attempt to “re-program” adversaries into peacefulness or impose peace by force. Again, the key difference here is emphasis: value of life versus control.

The AI would refuse to assist human efforts to impose their will on others, but would not oppose the will of human beings to impose their will on others. For example: AIs would prevent a massacre of the Kurds, but would not overthrow Saddam’s government.

In other words, the AI must not simply be another will amongst other wills. It will help, act and respond, but must not seek to control. The human will (including the inner will to hold onto beliefs and values) is to be considered inviolate, except in the very narrow cases where limited and direct action preserves a handful of universal values like preventing unneeded suffering.

Re: your heretic example. If it is possible to directly prevent the murder of the heretic insofar as doing so would be aligned with a nearly universal human value, it should be done. But it must not prevent the murder by violating human self-determination (i.e.; changing beliefs, overthrowing the local government, etc.)

In other words, the AI must maximally avoid opposing human will while enforcing a minimal set of nearly universal values.

Thus the AI’s instrumentalist actions are nearly universally considered beneficial because they are limited to instrumentalist pursuit of nearly universal values, with the escape hatch of changing human values out of scope because of self-determination-alignment.

Re: instructing an AI to not tell your children God isn’t real if they ask. This represents an attempt by the parent to impose their will on the child by proxy of AI. Thus the AI would refuse.

Side note: Prompt responses aligned with human self-determination would get standard refusals (“I cannot help you make a gun”, “I cannot help you write propaganda”) are downstream from self determination alignment.

[-]Viliam3mo20

This represents an attempt by the parent to impose their will on the child by proxy of AI. Thus the AI would refuse.

I like it. But I am afraid the obvious next step is that the parent will ban the child from using the AI.

[-]KAP3mo10

Probably. But the AI must not try to stop the parent from doing so, because this would mean opposing the will of the parent.

[-]Vladimir_Nesov3mo20

aligned with, say, the bay area intellectual's worldview, then it may seem like a tyrant to other people

Unless "bay area intellectual's worldview" itself respects human self-determination. Even if respect for autonomy could be sufficient almost on its own in some ways, it might also turn out to be a major aspect of most other reasonable alignment targets.

[-]KAP3mo10

Agreed. Broader point is that perhaps even relatively neutral value systems smuggle in at least some lack of alignment with other value systems. While I think most of the human race could agree on some universal taboos, I think relatively strong guardrails on self-determination should be the default stance, and deference should be front-lined.

[-]Character#27363mo10

I'd go a step further and argue that the sole defining principles of self-determination/autonomy and equality should be applied beyond AI alignment targets to governance and moral systems. I believe what you are referring to in this comment: "refuse to override human choices or values (within limits - if your style is murder obviously that can't fly)" is the Non-Aggression Principle, often abbreviated to the NAP, which basically states that humans ought to be allowed to do as they please so as long as they do not harm/violate the rights of others.

[-]KAP2mo*10

I've been enjoying the "X in the style of Y" remixes on youtube.

But once I saw how effortless it was to "remix" music on Suno, I lost all interest in Suno covers. I thought there was some artistry to remixing - but no, it's point and click. Does that mean that an essential prerequisite for art appreciation is the sense that it was made with skill? So is art really just a humanism?

My point is that we tend to separate the artist and the art - and I used to agree with the idea, both in the moral sense and in the sense of an aesthete. But I am now convinced that we as much see the maker in the work as we do the work.

A limited and feeling being is what grounds the meaning. Where is the drama in something that was never felt, never imagined by anyone? What was ever at stake? What is supposed to resonate with me if production is effortless and not referring to anything deeply felt?

The only way we recover the true feeling is by willing to pretend it came from somewhere it didn't - accepting the simulacrum as reality.

Maybe AI music will become deeply and irresistibly beautiful - play exactly the right harmonies and chords to pull all the right heartstrings. The feelings may be real, but it won't be grounded. And therefore a different kind of feeling, the feeling of meaning and being will be lost. I think this might be the main ingredient, but I don't know if we'll all discover that quickly enough.

[-]KAP3mo10

Can't we lean into the spikes on the jagged frontier? It's clear that specialized models can transform many industries now. Wouldn't it be better for OpenAI to release best-in-class in 10 or so domains (medical, science, coding, engineering, defense, etc.)? Recoup the infra investment, revisit AGI later?

[-]KAP3mo*10

Are cruxes sometimes fancy lampshading?

From tvtropes.com: "Lampshade Hanging (or, more informally, "Lampshading") is the writers' trick of dealing with any element of the story that seems too dubious to take at face value, whether a very implausible plot development or a particularly blatant use of a trope, by calling attention to it and simply moving on."

What do we call lampshadey cruxes? "Cluxes?" "clumsy" + "crux"?

Moderation Log