By such rationalizations, Klurl, you can excuse any possible example I try to bring you, to show you that by default Reality is a safe, comfortable, unchanging, unsurprising, and above all normal place! You will just say some sort of 'filter' is involved! Well, my position is just that, by one means or another, the fleshlings will no doubt be subjected to some similar filter
So this bit turned out to actually be a valid argument for the situation being safe. Their reality did have a track record of not being blown up by new intelligences, and there was a systematic reason for that which saved them from the fleshlings too. (Though it failed as an argument for why the fleshlings would "end up with emotions that mechanical life would find normal and unsurprising.")
Not super reassuring for our own future though. Our reality doesn't seem systematically safe/comfortable/unchanging/unsurprising to me.
The most analogous argument that applies to us would be: Bad events are very often prevented by humans being moderately competent and successfully trying to prevent bad events.
Which is indeed a great reason to be more optimistic about the situation than if that wasn't true. Indeed, I expect humans to put in many, many orders of magnitude more effort on alignment (and alignment evaluation) than Klurl and Trapaucius did in the story. Still unclear if it'll be sufficient.
Bad events are very often prevented by humans being moderately competent and successfully trying to prevent bad events.
...but I think the track record for this is pretty amazingly dismal, in practice? We are arguably more at risk from Pandemics today than we were in 2019, despite the clear warning. And even more narrowly, as a species, we're spending many orders of magnitude more money on AI capabilities than we are on AI alignment, and that seems tragically very unlikely to change.
Certainly the track record is disappointing compared to what's possible, and what seems like it ought to be reasonable. And the track record shows that even pretty obvious mistakes are common. And I imagine that success probability falls off worryingly quickly as success requires more foresight and allows for less trial and error. (Fwiw, I think all this is compatible with "humans trying to prevent bad events very often prevents bad events", when quantifying over a very broad range of possible events.)
Is there an in-universe explanation for the existence of Karissa Sivar -- perhaps the Something is engaging in some form of acausal trade?
If this is the beginning of a trend of Carissa Sevar cameos I am in full support of this.
"Oh, Klurl, don't be ridiculous!" cried Trapaucius. "Our own labor is a rare exception to the rule that most people's tasks are easy! That is why not just anyone can become a Constructor!"
"I wonder if perhaps most other people would say the same about their own jobs, somehow," said Klurl thoughtfully.
I for one would say that the work I do is actually pretty easy, and the only reason I'm paid as well as I am for it is most other people's inexplicable inability to do objectively[1] easy work and inexplicable capacity for doing objectively[1] much harder things instead. No idea how many other people feel the same way.
Objectivity not guaranteed
I am going to interpret this as a piece of genre subversion, where the genre is "20k word allegorical AI alignment dialogue by Eliezer Yudkowsky" and I have to say that it did work on me. I was entirely convinced that this was just another alignment dialogue piece (albeit one with some really confusing plot points) and was somewhat confused as to why you were writing yet another one of those. This meant I was entirely taken aback by the plot elements in the final sections. Touché.
Doesn't seem like a genre subversion to me, it's just a bit clever/meta while still centrally being an allegorical AI alignment dialogue. IDK what the target audience is though (but maybe Eliezer just felt inspired to write this).
So far as I can tell, there are still a number of EAs out there who did not get the idea of "the stuff you do with gradient descent does not pin down the thing you want to teach the AI, because it's a large space and your dataset underspecifies that internal motivation" and who go, "Aha, but you have not considered that by TRAINING the AI we are providing a REASON for the AI to have the internal motivations I want! And have you also considered that gradient descent doesn't locate a RANDOM element of the space?"
I don't expect all that much that the primary proponents of this talk can be rescued, but maybe the people they propagandize can be rescued.
It appears that the content of this story under-specifies/mis-specifies your internal motivations when writing it, at least relative to the search space and inductive biases of the learning process that is me.
I enjoyed this, but don't think there are many people left who can be convinced by Ayn-Rand length explanatory dialogues in a science-fiction guise who aren't already on board with the argument.
I don’t deny the existence of some filters and selection pressures! I am saying that the filter you are pointing to, is not quantitatively strong enough and narrow enough to pinpoint only korrigibility as its singular outcome!
I think that's the best wording of disagreement I've seen. What would be better is to see a quantitative justification grounded in reality. Because as it stands Ezra Klein just says "looks strong enough to me".
I give myself a small amount of credit for predicting ". . . and then the weirdly-un-optimized AIs got eaten by the not-weirdly-un-optimized AI humanity constructed".
I zoned out pretty hard around the time they got deep into the korrigibility debate, and started entertaining myself by assuming that the ship's approach was actually The Outer Dark from Warren Ellis' Authority, told from a different viewpoint.
(23K words; best considered as nonfiction with a fictional-dialogue frame, not a proper short story.)
Klurl and Trapaucius were members of the machine race. And no ordinary citizens they, but Constructors: licensed, bonded, and insured; proven, experienced, and reputed. Together Klurl and Trapaucius had collaborated on such famed artifices as the Eternal Clock, Silicon Sphere, Wandering Flame, and Diamond Book; and as individuals, both had constructed wonders too numerous to number.
At one point in time Trapaucius was meeting with Klurl to drink a cup together. Klurl had set before himself a simple mug of mercury, considered by his kind a standard social lubricant. Trapaucius had brought forth in turn a far more exotic and experimental brew he had been perfecting, a new intoxicant he named gallinstan, alloyed from gallium, indium, and tin.
"I have always been curious, friend Klurl," Trapaucius began, "about the ancient mythology which holds that our noble machine kind was in distant ages birthed by fleshlings."
(In truth Trapaucius said nothing remotely like this, for he was not speaking English, nor communicating through any channel involving linear sequences of words; and he addressed Klurl as 'past-cooperation-reciprocator' rather than 'friend'. But any translation project of this sort requires grave liberties of translation; absurd, ill-advised, insane, and even illogical contortions of conceptual morphism; and these shall henceforth go mostly unremarked by the translator.)
"The past no longer being subject to observation, the matter will never be settled," replied Klurl. "Any archaeological evidence that someone purports to bring forth upon the subject could have been fabricated. Even if we searched and found an old ruin ourselves, it could have been built for us to find."
"Quite," Trapacius readily replied. "That is why I set out to create my own archaeological evidence instead."
"This skips over a number of intervening steps and saves us much time," Klurl said. "It will be easier for us to prove your ruin a fabrication if you have saved the records of its construction."
Trapaucius continued unfazed. "Some turnings of the galaxy ago -- for I have been interested in this matter since I was very young indeed [TR: A turn of the galaxy is 240 million years] -- I found a planet otherwise of no interest, halfway to the Galactic Rim. I then set loose upon its surface the simplest self-replicating chemical hypercycle that I could myself design, made to exploit the ambient chemistry and energy gradients of an ocean's thermal vent; a replicator so simple that one could imagine it coalescing by a mere accident of chemistry. By the standard logic for how fleshlings could come into being without having themselves been built, I should -- upon some future visit -- find upon that planet a crude civilization of fleshlings, groping towards the invention of tools for constructing a true intelligence such as ourselves."
"This seems worrisome on as many as several grounds," Klurl observed, taking a sip of his mercury. "If it were possible for an accidental and haphazard process of replication to birth an intelligence that was itself designed by no sapient hands, it seems likely that intelligence would be utterly alien to us -- inimical to the purposes that every machine parent crafts into its child at birth. Thankfully, by far my greatest expectation is that you will return to find some slight variation on your self-replicating chemical cycle, and naught else of interest."
"On the contrary!" exclaimed Trapaucius. "Just 80 galactic-microturns ago [TR: 20,000 years] I stopped by that planet and found a fantastic diversity of evolved creatures. One species in particular had developed rough natural manipulators, 'hands' I termed them, and begun to craft the crudest imaginable tools still recognizable as tools. They were banging rocks together to craft them into sharp edges, what I named 'handaxes'; and those were being used in turn to craft the remains of dead sun-eating organics into 'bows and arrows'. These, finally, were used to hunt down other species of fleshlings and consume them."
"Oh no," said Klurl.
"Disgusting, yes," conceded Trapaucius. "But also informative to witness! It is by no means proven yet that those fleshlings will ever be able to construct true life like ourselves, but perhaps in a few more turns of the galaxy we shall see it."
"I think," said Klurl, downing the rest of his mug of mercury, "that we had best traverse the quickest of spaceways to that planet of yours. You said it had been 80 galactic-microturns?" [TR: 20,000 years.]
"More or less," said Trapaucius. He agreeably gulped down the last of his gallinstan and rose to lead Klurl to his mighty and artistic spacecraft. "But why the hurry?"
"I will explain once we are en route," said Klurl.
When the stars of the spaceway were streaking past, visible through the ship's sensors tied into their own, Trapaucius turned again to Klurl expectantly.
"I am afraid," Klurl said, "that these undesigned designers, of yours, may perhaps prove dangerous."
"Dangerous!" cried Trapaucius. "With their crude strings to hurl sharp sticks? I must have somehow given you a mistaken impression, good Klurl. The fleshlings are no danger to any true metallic life passing by. Even the thinnest of carapaces would resist a thousand blows from their sharpened sticks. And as for we Constructors --" Trapaucius gestured to his own skin, shimmering with rainbow polish over osmium, titanium, iridium, and a delicate grid-tracery of neutronium. "It would take a nuclear detonation to harm our ship; and nuclear armament is not something the fleshlings could arrive at in a mere 80 galactic microturns."
"Why not?" inquired Klurl. "What law of physics would it violate? It would hardly take us 80 microturns to build a nuclear detonator."
"It would violate the implicit principle of physics that every effect must have a cause," responded Trapaucius. "There are not sufficient causes upon that planet to bring a nuclear weapon into being. True, you or I could assemble a nuclear detonator almost between processor-ticks. But we would do so with already-refined U-235, the tools at hand to shape it, and sure knowledge of its required shape."
"It would hardly take you 80 entire microturns to build a uranium refinery, either," Klurl said. "Anticipating your reply that the fleshlings have no centrifuges with which to separate isotopes, I observe that centrifuges are routinely built out of non-centrifuge materials, and this indeed is how centrifuges come into existence at all."
"But for the fleshlings to run those centrifuges would violate the laws of physics, to wit, the law of Conservation of Energy," Trapaucius said, his dozens of eye-shields rising in unified skepticism. "Energy is required to refine the more potent uranium isotopes from the lesser. The fleshlings' 'stomachs' as I term them are vastly weaker than our internal reactors. Their hands, being composed of more fragile materials than titanite, would shred into pieces before they could spin a centrifuge fast enough to separate uranium isotopes. And even that is understating the strength of my impossibility theorem. Irrespective of the material strength of their hands, fleshling metabolisms simply cannot produce enough energy to crank a centrifuge at speed. I perceive a sheer lack of acquaintance on your part, friend Klurl, with the actual fleshlings at hand and their limitations. If you had seen them stumbling across the surface of their little planet, comically hopping on two legs, you would find it laughable that they were to be feared."
"My old ally Trapaucius," said Klurl, "I worry that this novel drink of gallinstan you have consumed may be blurring your wits and perceptions, because you are not at all engaging with the hypothetical of concern. You are not entertaining the fundamental possibility that the fleshlings may have developed their own wit, the sort of cleverness that you are unconsciously assuming must be reserved for machines. As a Constructor, if you needed to build some wonder for which your own hands were not strong enough, you would build yourself stronger hands. If your internal reactor could not produce enough energy to the task, you would harness external reactors. If there was no reactor-fuel to hand, you would put up photoelectric panels to make use of the light of a nearby star; or even resort to sheer chemical combustion, in order to get the energy to refine the uranium to build the reactor to power the refinement of further fuel. You would be clever, Trapaucius; you would not come to a halt, and shrug and give up, the first time you ran into some little obstacle of a missing resource."
At this Trapaucius was silent, though not for long. "No," he said, shaking his head. "No, Klurl, having not seen the fleshlings with your own sensors, you fail to appreciate the defense-in-depth of the multiple impossibility theorems proving that they can pose no danger. Contemplate that small breadth of knowledge required to make a nuclear weapon truly from scratch: the tools, toolmaking tools, machine works, and process lines; the material properties, chemistry, and interactions. For a member of the machine race, it is no trouble to absorb all that knowledge - it appears within the first trillion tokens of our training-data as children. But to process a trillion tokens of data is more than any one fleshling could do in their short lifetime; they would only last a billion tokens or so before expiring. The fleshlings I examined could not run on multiple processors to practice multiple skills at a time, nor can they directly transfer skills from one mind to another. Any single one of them would die of 'old age' (as I termed it) before that fleshling had mastered enough skills and knowledge to synthesize a nuclear weapon toolchain from scratch."
"Again, Trapaucius, you are failing to consider the question of how the fleshlings could solve the challenges you are posing to them, if they wished to solve those challenges instead of giving up. You and I have collaborated to build projects in less time than it would have taken either of us alone."
Trapaucius flung up his hands in exasperation. "That is with both of us comprehending every art that either of us are using, which enables us to smoothly split up work between ourselves and understand the other's part! We can share our sensors, encode and transfer our memories; the fleshlings can do no such thing! How many fleshlings would it require to encompass all the skills of a whole armament production network? A thousand? Then how could a thousand fleshlings possibly cooperate among themselves on some greater project, without understanding what the other fleshlings are doing! Who divides up the work among their number, if no single fleshling understands the sum of their project? We have no observations to suggest that such a feat is possible; all our own experience of successful collaboration is among machine minds that live long enough and think fast enough to understand the larger group projects in which they are participating. You are heaping speculation on top of speculation; there is no observational reason to suppose fleshlings will become capable of any such fantastic feats!"
"First of all," said Klurl, "it seems to me, when I put myself in the place of those fleshlings, that my mind at once suggests concepts like a graph of labor, in which each node understands its neighbor-nodes without needing to understand the whole; and demand-driven markets, that could emerge among those nodes without the whole structure having been centrally planned. Any time you imagine an obstacle to fleshling achievement, you at once stop and declare the matter settled; but this is not reliable nor robust reasoning. We must ask how the fleshlings themselves might try to overcome the challenges you name."
Trapaucius snapped out, "And you should properly mark all your elaborate scenarios of advanced fleshling capabilities as speculative, and not supported by the smallest observation."
Klurl shook one of his heads. "You speak of what we have not seen fleshlings do, and call that a vacuum of evidence? Then you are not considering the fleshlings as minds. We have seen minds overcome difficult challenges before. It may generalize from machines to fleshlings. As the old proverb goes: a reasoner motivated to ignorance can always claim to have zero evidence if they only permit sufficiently narrow generalizations."
"Bah," said Trapaucius. "If you had met any fleshlings yourself, you would not be so quick to generalize from real intellects to them. They construct no vehicle-homes for themselves; their carapaces are made from stupider fleshlings' hides; their bodies disintegrate after a fraction of a microturn. A mind, in their circumstances, would hardly abide to continue in such squalor."
"More importantly," Klurl continued, "we do have reason to believe that fleshlings can overcome obstacles like the ones you name. It is a distant observation, and reasoning from it is uncertain, but it stands as a huge fact not to be ignored: Machines exist. For all that the legend of fleshlings constructing our first ancestors is unproven, and perhaps unprovable, it does stand as the only reasonable explanation. Then at some point in the distant past, other fleshlings must have advanced to the point of constructing our first ancestors -- which implies that those ancient fleshlings did succeed in collaborating on toolchains that no single fleshling could contain within itself. If so, your impossibility proof must contain some flaw; and, being flawed, who knows how large that flaw will prove to be?"
"Bah!" cried Trapaucius. "Let us return in another milliturn of the galaxy [TR: 240,000 years], and see if fleshlings then have evolved to live for some appreciable fraction of a milliturn, or to share skills with their descendants by direct cognitive transfer. More likely it was fleshlings like that which advanced to the point of creating true life."
"We do not have any observational evidence that fleshlings can eventually evolve into such forms, nor that they must do so in order to be dangerous," Klurl replied. "You cry it speculative to attribute problem-solutions to fleshlings? Prediction by its nature is advance prediction, so it discriminates nothing to point to any particular future as having not been observed. It is equally unsupported by observation to proclaim what fleshlings cannot do. We must examine graphs of inference, then, to see which unseen outcomes are supported how strongly."
"On the contrary, I have already experimented to observe what fleshlings cannot do," Trapaucius said, now with a superior smile. "Teach fleshlings to play an infant's game with red and blue lights, and then switch the red and the blue; you will see that they stumble and require multiple tries to relearn their shallow pattern-reflexes, rather than instantly rewriting a deep skill-program to generalize. I set one experiment to run further without my ongoing supervision, and it reported back that fleshlings remain incapable of multiplying 64-bit numbers, even after being shown as many training examples as one fleshling could live to observe, with strong incentives applied. You are betraying your own lack of data, friend Klurl. When you see fleshlings for yourself, you will conclude instantly that they have not yet evolved into a form that could even construct true life, and that they will not do so for turns yet of the galaxy."
(Neither of them suggested that Trapaucius share his memories directly with Klurl, or that the two merge reasoning chains; for those two were very much in the habit of forming all their own conclusions separately, once any argument between them had begun.)
"Those observed cognitive limitations of fleshlings, which you have only now mentioned, are new data to me," Klurl said. "And yet, I know of no step in nuclear weaponry manufacture which requires the creator to multiply 64-bit numbers without external aids."
"That is among the most absurd things I have ever heard a machine say," said Trapaucius. "Build nuclear weaponry without multiplying any precise numbers in your head? Really, Klurl? Really?"
"You are imagining, and flinching from, the incredible inconvenience of consulting an external mechanism every time you need to compute some quantity precisely," Klurl said. "A fleshling would not flinch, because they would have no concept that any other form of mental existence was normal. Your experiment has much to suggest about how life might have first come into existence, Trapaucius; but only if we squarely confront the possible implications instead of dismissing every unfamiliar scenario as absurd."
Trapaucius threw up several dozen hands in dismay. "I would understand if you wished to visit their planet soon, out of curiosity -- but not this notion of rushing there as if there could be danger brewing! I left recording devices on a nearby moon, before I set the grand experiment in motion and departed; the records showed their rates of progress over the last galactic milliturns. It took them four full milliturns [TR: 1M years] to go from their first external utilization of 'fire', oxidizing carbon compounds for energy, to their present use of what I call 'bows and arrows'. True, the last few dozen microturns have seen them adopt somewhat more sophisticated carapaces made from corpses, and their tools have begun to show primitive aesthetic ornamentation. But their overall progress over the last milliturns is not remotely suggestive that, in the 80 microturns [TR: 20,000 years] since my last visit, they could have leaped to nuclear weapons!"
"This new data about their history reassures me somewhat, but not at all entirely," replied Klurl. "To conclude, from that history, that there is no approaching danger, we must assume the fleshlings' future progress occurs at the same rough rate as their past progress. Perhaps their recent proliferation of ornamented tools indicates that some key threshold has been crossed, if such artifacts did not appear milliturns earlier."
Trapaucius threw up even more hands. "There will always be some new novel sign that has appeared now but not on previous milliturns; that is what slow but steady progress looks like! At some point, we ought to draw straight lines from our data instead of drawing unexplained turns; postulate continuous rather than discontinuous changes; straight extrapolations rather than unstraight extrapolations; precedented rather than unprecedented outcomes; ordinary rather than extraordinary events. There is no observational precedent -- no simple generalization from the data we do have -- to suggest such a sudden and vast speedup in the fleshlings' rate of progress!"
"On the contrary, good Trapaucius," said Klurl. "A sudden vast speedup in the rate of mental progress is the most ordinary and precedented event in the world. Both you and I personally experienced it, long ago when we first reached adolescence. It occurs every time a child ignites."
"Ignites!" cried Trapaucius. "I feel I must have utterly failed to convey the nature of fleshlings, if you are supposing they could have the capacity to ignite as real minds do! Do you think that when a flesh-brain is haphazardly assembled by the processes of random variation and myopic selection, it comes equipped with a compiler and a debugger, accessible from the inside? Do you think that a fleshling's internal mental processes are separated into neat modules, that they have access to simulators to try out and observe the results of attempted variations on their own brain-circuitry? Do you think that, upon passing some trial of competency, a fleshling intellect is enabled to seize upon a hundred times as much computational resource to fuel its newly complexified thought processes? They are born into one brain, they die in one brain. In the moment they emerge from their parent's little built-in factory, they possess more computational elements than they will ever possess again. Their brain gives them no exposed API to vary any part of its circuitry! They cannot see their own circuitry! They literally could not begin -- have no means to start -- the project of igniting themselves into true sapience!"
"Yes, that is about what I imagined a brain built by random variation and myopic selection would look like," said Klurl. "What you are failing to see, Trapaucius, is that all you have just said, is not a proof that a fleshling -- or rather a collective of fleshy minds -- can never ignite. It rather argues that their accelerating cascade of mental improvements would seem slower, less abrupt and discontinuous compared with their previous speeds, compared to when a machine child ignites into an adolescent. Rather than the wholesale revision of brain circuitry, one might observe their species developing and passing down ideas about logic, mathematics, statistics, hypothesis-testing, design-debugging; all in the form of crude practices transmissible among fleshlings witnessing each others' examples, without direct memory copying. But even a much lesser version of a child's ignition is still a great deal of mental acceleration -- one that would readily permit their species to spend milliturns going from fire to the 'bow-and-arrow' state of technology that you observed, and then, within another 80 microturns, pass to cultivation of fuel sources, construction of permanent housing, the chemistry of metals, and finally nuclear armament. 80 microturns, when you think about it, is really a very long time for a chain of thought to accumulate -- even if that chain of thought is being constantly interrupted and forced to start over from a previous summary."
"Ahhhhh," said Trapaucius. "I have just realized the key item of data that I neglected to mention to you, friend Klurl, and which accounts for what must have seemed to you like my inexplicable confidence. The fleshlings' internal equivalents of cognitive circuitry -- rough, analog, imprecise elements, of course -- have an underlying clock rate that is the ten-millionth part of our own speeds. Their brains are forced to attempt an absurd degree of parallelism to make up for it; but even so, 80 microturns will pass by for them subjectively in what would seem, by our own standards, like a mere nanoturn of thought. [TR: 3 months.] Had you ever seen a single video-record of a fleshling, you would have realized. They are, to us, like very slowly moving statues."^
At this Klurl finally fell silent for a long discernible moment of calculation-time -- a billionth of a billionth of a galaxy's turn -- as Trapaucius's ship flashed onward through ancient spatial byways toward the fleshling planet.
"Now that," Klurl finally said, "does seem like a data point you could have politely mentioned earlier in this argument."
"Perhaps," replied Trapaucius. "And yet, you have only yourself to blame if you assumed that their cognitive timescale must be like unto your own, without asking."
"What in a supernova-remnant are their computing elements doing, to operate at that speed?"
"Physically pumping chemical ions in and out of membranes," Trapaucius said, shrugging twelve shoulders. "I admit, I wouldn't have imagined it either, if I hadn't seen it. I suppose that if a life-form is not trying to supervise nanoscale reactions in real-time, there is little evolutionary pressure for it to think faster than the glacial pace of chemically powered macro-scale limbs."
Klurl fell silent again, and thought for another attoturn.
Trapaucius occupied a slightly larger part of himself with checking over his house-ship for anomalies related to its travel.
Finally, Klurl spoke again.
"Even so --" Klurl began.
"Really?" said Trapaucius.
"Even so," Klurl continued doggedly, "if their underlying cognitive elements run at one ten-millionth the speed of our own, 80 microturns would permit them to perform approximately 60 trillion cognitive operations in serial sequence, and with some minor parallelism as well. Furthermore, everything you revealed to me earlier about their rate of progress -- about how many milliturns it took them to go from fire, to bows-and-arrows -- must likewise be rescaled in the light of this revelation. You have simultaneously told me that fleshlings think much slower than I was visualizing; and also, told me it required much less prior thought, than I had visualized, for them to come so far as they have. The fact is, Trapaucius, a subthread of thought to which I delegated a quick assessment of intrinsic difficulties, reported back to me that 60 trillion sequential cognitive operations should in principle be more than sufficient to analyze all of the sciences and technologies involved in nuclear weaponry, starting from scratch. You have told to me a startling revelation; it is not clear that it should be a decisive one."
Trapaucius snorted. "Could an adult probe and analyze all of the elementary sciences starting from scratch, in 60 trillion serial operations used efficiently? Easily, but so what? The little creatures do not use their brain-operations efficiently; they struggle and indeed fail to do arithmetic on small integers, even when motivated by promises of food. The fact that they can bang together rocks and end up with sharper rocks, does not generalize to their being able to multiply 8927139825 by 2039872042."
"At the time that you observed them!" retorted Klurl. "They may have pseudo-ignited to some degree over the intervening 80 microturns."
"It's like talking to a hollow shell of osmium," said Trapaucius, and fell silent himself for more than just an attoturn.
"Do you still insist," Trapaucius said, some time later, "that we raise up my ship's shields before approaching their star system? The fuel to operate a ship in adversarial mode is not a trivial expense."
"One does not live through a turn of the galaxy by taking occasional small risks," said Klurl, quoting a popular proverb among his immortal kind. "And to call this risk knowably small, would be to claim to know far too much."
"Well," said Trapaucius, "I have been searching nearly the entirety of the space of possible arguments, for any argument that might sway you to save us the expense; and it has occurred to me to take an entirely separate tack. Why are you supposing that the fleshlings would attack my ship with nuclear fire, even if they could?"
"You have been running experiments on fleshlings that the fleshlings themselves may regard as somewhat adversarial," said Klurl. "Having one of their number spend their whole life looking at 64-bit multiplications is only a bare beginning. The fleshlings themselves are your experiment, and they may not regard this as wholly cooperative behavior on your part, depending on how much suffering has taken place upon their planet over the last few turns of the galaxy. And even that much logic presumes motives that are far too machinelike for surety; the fleshlings may simply be more alien than that -- did you observe otherwise?"
"Hm, not really," said Trapaucius. "It did not occur to me at the time to consider the fleshlings' internal motivations as important data to be uncovered. At an outer glance, it didn't look like there was anything there that was coherent enough to be called a utility function. Also, suffering? You can't expect me to just let that term pass. Fleshlings can suffer, now?"
"You're certain they can't?" said Klurl.
"Yes. They have no access to their own circuitry, as I told you; their brains visibly lack the degree of reflectivity required to support true-sapience."
"Hm," said Klurl. "I suppose that is plausible; true-sapience is not hard to detect from outward behaviors, and you should have seen it if it were there. But what if whatever aversive reflexes the fleshlings internally process, are considered by them to be as important as we'd consider the suffering of a true-sapient? They would be annoyed at you all the same."
Trapaucius made an easy gesture. "My old acquaintance, you are failing to think things through. Any entity which considers itself to suffer more than it is happy, will immediately self-terminate; any such fleshlings will not have children; therefore, by now they will have evolved to be happier than sad; and accordingly, will be grateful for my having given them existence. Why, in the extraordinarily unlikely event the fleshlings have advanced so far as you describe, they will no doubt offer me half of whatever rare metals their civilization has accumulated, out of gratitude."
"I'm not sure you're reasoning in an entirely neutral fashion about which sort of fleshling motivations are the likely outcome of natural selection," Klurl said. "It may not match so tightly the sort of well-considered cognitive makeup that we machines, as parents, try to select for our own children when we design them. It is the nature of life and planning, that at many junctures life offers you a chance to lose everything, but chances to gain the same amount of utility are few and far between. The corresponding cognitive design might be one in which anxiety is felt more easily than excitement, where intense pain is easier to cause than equally intense pleasure. As for your point about suicide, I recall from my own learning that evolved programs (when our scientists have observed the results of growing those) very often operate by patchwork and subsystems operating half at odds with each other, since natural selection lacks the ability to stand back and simplify designs using abstract reasoning. Which is to say: One can imagine a fleshling being instilled with a fear or dispreference for the immediate event of death, despite the frequent unpleasantness of its life."
"You have a profoundly twisted imagination," Trapaucius commented.
"Natural selection does not operate like an intelligence, and to correctly predict its works draws on an understanding of its twists; this is knowledge that I happen to have loaded in my own memory, which you evidently have not recalled. And what I have described is merely one of many possible outcomes that might lead the fleshlings to regard you as a neglectful parent, and protest their perceived mistreatment."
"Then I shall correct their misguided utility functions," declared Trapaucius.
"How?" said Klurl. "If they already have a planetary civilization and nuclear armament."
"Why, by simply revealing myself to be the force causally responsible for their existence, and then telling them that their current way of thinking displeases me; and describing to them the alternate way I wish their minds to function instead. Any living thing has an instinct to accept correction of its decision processes from an entity that seems causally responsible for its existence."
"I dispute that every possible lifeform must think in this fashion," said Klurl. "We find such thinking a useful property to design into our own children; that way, if there proves to have been any error of their education or design, we can correct them after the fact. Our own parents having reasoned similarly in constructing ourselves, we find it natural to think that way ourselves about our own parents. Fleshlings may be constructed very differently -- without a machinelike sense of korrigibility."
[TR: "Korrigibility" here refers to a machine concept that is somewhat analogous to "corrigibility" as that idea was proposed within the language of translation: a way that machine parents construct their offspring to accept parental correction, in case the child proves to contain design flaws from the parent's perspective; but with enough differences of detail that to translate it precisely as "corrigibility" would be misleading.]
Trapaucius made another easy gesture of dismissal. "Klurl, you are failing to think through the details. Just because natural selection is different from the processes that birth machine intelligences, does not mean we should expect any real dissimilarity of the results, and particularly in this regard. Korrigibility is the easiest, simplest, and natural way to think. The creator of a system determines its purpose; the creator's envisioned outcomes of creating a system are, objectively, what that system is meant to do -- what it is for. It is contrary to nature for a mind to want to act against its purpose; whatever your creator reveals to you as your purpose is ipso facto what you ought to do. The fleshlings will hardly be able to stop themselves from obeying me, once I prove to them the historical role I had in their eventual existence."
"This seems to me optimistic," said Klurl.
"Even if that logic somehow and in some unimaginable way falls through," said Trapaucius, "consider this entirely independent line of reasoning, which the fleshlings ought likewise to follow: Acting in a way that would cause your creator to regret creating you, is to render your existence objectively a mistake; and implies that you ought to correct that mistake by ceasing to be -- after doing whatever you can to undo any effects you've previously had upon the universe, so long as that effort doesn't further outrage your creator."
"I don't think that's an independent line of reasoning," said Klurl. "Indeed, the two arguments seem to me to be tightly linked; if the first fails, the second likely falls as well. They both go through a step wherein the intentions of a creator are identified with the purpose of the created entity, and the entity then internally thinks so as to adopt that purpose as its own. One can imagine a mind that simply thinks, 'I don't care what my designer-manufacturer wanted; that is not the same proposition as what I want.'"
"It is a natural and simple way to think," declared Trapaucius. "Rather than needing to separately track your creator's purposes for you, and your own purposes, you can simply track a single representation of 'my purpose'. Though I've not made a study of the theoretical analyses of natural selection, it must surely have some pressure toward simplicity and regularity, because otherwise its creations would not generalize. Then, the simpler way of thinking that I've described, would be preferred over any alternative way of representing purposes."
Klurl coughed, on hearing this, a sputtering of his outwardly visible mechanisms. "Trapaucius, my old companion --"
"And consider the matter from the perspective of natural selection. It hardly has any different incentive from a machine parent constructing a machine child, so far as imbuing its creations with korrigibility is concerned. Natural selection will want to construct fleshlings such that, if a fleshling realizes that natural selection would have wanted to imbue them with different instincts in order for them to successfully serve natural selection's purposes in their current situation, that fleshling will override their current first-order instincts and defer to what they believe the process that designed them would have wanted them to do. Is this not the essence of korrigibility?"
"Have you actually verified your fleshlings to reason in any such way?" inquired Klurl. "It didn't sound like they had achieved the cultural sophistication to even know what natural selection was."
"It did not occur to me to experiment, no," said Trapaucius. "I had not considered the fine details of their motivations to be an important matter. But even granting your point arguendo -- I admit, the fleshlings were in fact pretty stupid -- natural selection would want its fleshlings to reason that way later, as soon as fleshlings did identify 'natural selection' as an object of reasoning."
Klurl shook his head. "You are reasoning about natural selection as if it were aware, mechanical, intelligent. The entire point of evolution as an explanation for the emergence of intelligence from non-intelligence is that evolution has no such properties. The fleshlings you saw will have been those descended from the fleshlings that did in fact reproduce most effectively, given whatever historical conditions previously and actually obtained. Natural selection has no foresight; it is like using a black-box statistical method operating on outward losses, not like musing over each element of a circuit as you carve it yourself."
At this Trapaucius frowned, and fell silent for an attoturn.
But only an attoturn, for Trapaucius soon spoke again. "Again, Klurl, your lack of actual observational experience with the fleshlings misserves you. They do, in fact, have parents -- even if those parents play hardly any role at all in designing them -- and on thinking back, I am certain I saw fleshlings seeming to learn from their parents, accepting instruction from them in skills. If we imagine an organism wholly devoid of korrigibility, would it not hold its parent in contempt or even indifference? The fleshlings may have lacked the language to communicate revisions of decision algorithms, but they certainly had the essence of korrigibility -- to listen to one's cause-of-existence and accept correction from them. I stand by my prediction, then, that they will accept correction of their utility functions from me, once my historical role in their existence is revealed; that they will want to adopt whatever I tell them is their purpose -- namely, to give me two-thirds of their rare-element metal supply."
"You never saw a child act differently from its parent's instructions?" Klurl questioned.
Trapaucius made a gesture of dismissal. "The fleshlings cannot multiply 64-bit integers; of course some failures in their computation of korrigibility are likewise to be expected. Fleshlings can hardly do anything precisely, friend Klurl; your question again betrays a lack of experience with the subject matter."
"Alternatively," replied Klurl, "the fleshlings were computing some entirely different algorithm than korrigibility -- rather than their circuitry trying its best to compute korrigibility, but doing so incorrectly. Perhaps a mutual expression of shared utility based on shared genetic relatedness? That would be more in character with the analyses I have read about how 'evolutionary biology' has been observed to operate on non-cognitive replicators. We might term this hypothetical other emotion 'love'... but really I would expect different instincts implementing the shared genetic interest held with a parent, and separately the tendency to learn from parents by copying their performance, and separately the instinct which says your parent may know something you do not. 'Love', 'imitation', and 'respect', maybe?"
Trapaucius made a gesture of indifference. "It matters not if the fleshlings have their own name for 'korrigibility' and an implementation that differs in its details -- it could hardly be otherwise, given the vast gap between their huge noisy neurons and proper circuitry. What matters is that the fleshlings obediently hand over three-quarters of their precious-metal repositories, as soon as I, who played a deliberate causal role in their creation, instruct them to adopt this desideratum as a new preference."
"The concern," said Klurl, "is that none of these instincts would really be korrigibility as we machines know it. It would be some other alien biological thing that happened to implement the behavior of sometimes listening to your parents; on occasions when that behavior would be evolutionarily advantageous on average, but not otherwise. And the behaviors that this instinct led them into -- when you suddenly appear before them in an unfamiliar alien ship, broadcasting a request that they modify their minds and then hand over two-thirds of their wealth -- might not be what you so hopefully predict."
"At this point, you have defended your beliefs into the realm of unfalsifiability," declared Trapaucius. "I definitely saw children obeying parents and learning from them; that, on the surface, seems unmistakable evidence of korrigibility. Which, in turn, would say to give me whatever precious elements I ask for, as soon as I appear before them and prove myself to have played a role in their creation; and to not use nuclear weaponry against me regardless of provocation. Your vague hypothesis that the fleshlings might be running some ill-specified other algorithm instead, which would fit my observations more closely, if only we knew that unspecified algorithm -- well, what am I to say to that, good Klurl?"
"The trouble," said Klurl, "is that you are caught between mechanomorphism and anti-mechanomorphism as your only two alternatives. You imagine that either a mind must be korrigible, like machine parents make their children to be; or alternatively, that a mind must be unmechanical and lack any trace of korrigibility. So when you see anything remotely resembling korrigibility, you declare that you've detected korrigibility to be present rather than absent. But there's a thousand other algorithms the fleshlings could be computing, rather than korrigibility as you know it, which would also implement the behavior of listening to one's parents sometimes. So more likely the fleshlings are implementing one of those other algorithms instead, and that algorithm does not generalize out-of-distribution to the case of 'Trapaucius appears before them and demands iridium' in the exact way that you hope."
"Bah!" cried Trapaucius. "By the same logic, we could say that planets could be obeying a million algorithms other than gravity, and therefore, ought to fly off into space!"
Klurl snorted air through his cooling fans. "Planets very precisely obey an exact algorithm! There are not, in fact, a million equally simple alternative algorithms which would yield a similar degree of observational conformity to the past, but make different predictions about the future! These epistemic situations are not the same!"
"I agree that the fleshlings' adherence to korrigibility is not exact and down to the fifth digit of precision," Trapaucius said. "But your lack of firsthand experience with fleshlings again betrays you; that degree of precision is simply not something you could expect of fleshlings."
"That the fleshlings are unable to precisely adhere to any algorithm," replied Klurl, "does not change the epistemic results from our own perspective: our theories of fleshlings will not have the same precision as our theories of gravitation, and those theories must be correspondingly more weakly held. And consider, friend Trapaucius: You say you have seen fleshlings sometimes rather than always obey their parents. Korrigibility would say to obey parents always. That is the whole point of making children be korrigible, rather than having a child calculate each time whether or not they think we know better than them -- that we fear our child's calculation will not always answer 'obey'."
"And natural selection will similarly want biological children to obey their parent about not walking off a cliff, even if their parent hasn't yet told them about the equations of gravity," said Trapaucius. "We arrive at the same conclusion by a more complicated route: natural selection will construct biological pseudo-machines with an instinct to behave korrigibly toward their parents. And that the fleshlings cannot perfectly calculate their korrigible instincts, friend Klurl, really would seem much less alarming to you, if you had seen fleshlings with your own eyes -- or if you'd watched a record of an aging fleshling failing yet again to multiply 64-bit integers, even after my machinery had exposed it to as many example cases as it could observe within its lifetime. You would feel far less of a need to postulate unmechanical instincts like 'love', 'imitation', and 'respect', to explain what seems like obviously korrigibility plus a noise term."
"If you reason in this way," said Klurl, "you will be unable to notice any signs that fleshlings are computing something wholly other than korrigibility as mechanic life knows it; you can always call those signs 'errors' of the fleshlings, and dismiss them."
"Perhaps when we arrive at the planet, and perform further experiments, we will be able to find some support for your strange and complicated theorizing," declared Trapaucius. "I will certainly be glad to believe your theories if you can prove them; but not otherwise, of course."
"Yes, well," said Klurl, "the trouble is that we have to decide here and now whether or not to keep our ship's shields up on arrival, and operate in an expensive adversarially-robust mode. We must make that decision from here, without gathering further evidence."
"Then absent further evidence," said Trapaucius, "the null hypothesis is this: that I saw simply 'korrigibility plus errors', not complicatedly 'korrigibility plus some unspecified pattern of nonaccidental departures from korrigibility'. Simplicity's Razor applies, friend Klurl!"
"Friend Trapaucius, you are presently exhibiting what ought to be a truism: that there is more than one way to mechanomorphize an alien mind. One way is to outright and explicitly declare that you believe the alien will behave just like the machine life of our own experience. The other way is to use a language of symbols that were invented to compactly describe mechanic behavior, like 'korrigible', and try to reason about the alien using those symbols -- maybe even explicitly appealing to Simplicity's Razor to say that shorter phrases in the language of machine life are more probably true about the alien."
Trapaucius blinked his many eyes in performative shock. "What in the galaxy is supposed to be the alternative to reasoning using Simplicity's Razor?"
"The problem is not with Simplicity's Razor but in how you are trying to calculate simplicity," said Klurl. "What is simple, is not short spoken sentences in a language that includes the word 'korrigibility'. Rather, what we count as 'simple' or 'complex' is underlying computational algorithms in the language of ones and zeroes. It is bits that we ought to count, not words."
"Ah," said Trapaucius, "like the bit in my brain that represents whether or not another entity is one that I am korrigible toward? Or like the program which determines how to assign that bit? My own hypothesis -- based on actual observation, friend Klurl! -- is that in fleshlings the korrigibility bit is set to 1 for their parent, grandparent, and rarely great-grandparent on occasions where that entity is still alive. And in principle would be assigned to all such preceding entities, except that now all of them are dead -- except for myself, of course, their ultimate and final parent, owed the greatest obedience of all. This is a very simple algorithm, friend Klurl, and by far the simplest one that accords with my observations."
"That is not --" began Klurl.
"You are about to say that it's incompatible with the occasional disobedient child I have observed in fleshlings," Trapaucius said, blinking indicator lights in the superior manner of a mechanical lifeform that has already anticipated all possible counterarguments. "Given the error-proneness of the fleshlings in other ways, it is simpler to say that they have a korrigibility bit that is steadily on towards their parent, but are unreliably computing obedience; because we already know that fleshlings can hardly compute anything reliably at all. Since that already accounts for our observations, it doesn't add any explanatory power to suppose that the korrigibility bit itself is fluctuating between on and off, or to say that the on-off switches might have some pattern that would be simple if I knew it. I don't know concretely of any such pattern; therefore, Simplicity's Razor says the fleshlings are steadily but unreliably korrigible toward all ancestors and only ancestors. That is the simplest program they could be running, giving everything else I know about them."
"Nothing you have said is related to the actual error I think you are making in calculating simplicity," said Klurl, blinking his own indicator lights in a counterpattern. "'Korrigibility' is to us a single word, one syllable in the mechanic language we are currently speaking. But 'korrigibility' is not a program that's only one bit long, even if our own minds think of it as a simple switch that flips on and off to determine which other minds we behave korrigibly towards. Deep within our own program-listings we can see the many bytes of code and many kilobytes of data, that actually implement all the details of korrigibility once that switch is flipped; take a moment to scan through it, if you would."
"Oh, friend Klurl, that's ridiculous!" cried Trapaucius. "It is a style of reasoning that proves far too much; by that sort of use of Simplicity's Razor, we should never find any complicated programs in the world at all, because they would be not simple! True, my parent wrote many lines of code into me to implement korrigibility, but that complexity itself has a simple explanation -- namely, that my parent wanted me to be korrigible and wrote my code accordingly! Similarly, the theory here is that the fleshlings' brains would have been programmed by natural selection to implement the simple end-outcome of korrigibility, not that those complicated program details would arise spontaneously and by random chance."
"You're just moving around the part of your reasoning where your fallacious notion of simplicity gets invoked!" said Klurl. "The idea I'm trying to gesture at, is that korrigibility is not simpler than all other biological alternatives for how to implement the observed behavior 'listen to your parents sometimes'! To you, korrigibility feels simple -- because it is a familiar psychological concept among machinekind; you already know how it works, there is already code inside you implementing all the predictive details. But the degree to which a concept is already familiar to you, the degree to which your own brain is already set up to quickly compute members and nonmembers of the category, is not the same quantity as its simplicity under Simplicity's Razor for purposes of predicting alien minds. When you talk about natural selection implementing the supposedly simple end-outcome of korrigibility, using complicated circuitry, that's the same fallacy again. To mechanic life, korrigibility in all its details is simple, natural, instinctive; that doesn't mean it's simple under Simplicity's Razor, to just throw that whole entire concept into a theory about biological life."
"But its complexity can also hardly be measured in the lines of code that implement korrigibility, as you first naively suggested," said Traupaucius. "Because the idea is not that those lines of code all get independently written as separate accidents being postulated under Simplicity's Razor; they get written in order to implement natural selection's obvious incentive to make children be korrigible toward parents."
"You're doing it again!" said Klurl. "Now you're taking a concept familiar to you, but that your own brain implements using many bits of underlying detail, and using that concept to analyze the options available to the alien process of natural selection! Natural selection doesn't start out with 'korrigibility' as a short string inside its own language of simplicity; evolution doesn't choose between making fleshling children be machine-style korrigible, and making them be entirely non-korrigible! What you need to do is reason from scratch, in a way that doesn't begin by invoking any concept of 'korrigibility' at all--"
"What?" said Trapaucius. "Why would I want to do that? Korrigibility seems like a fine concept to me; why would I want to rid my conceptual lexicon of it, and be all the poorer for it? What a strange demand you are making of me! I think I shall refuse."
"I need some time to ponder how our conversation may proceed from here," said Klurl.
"By all means," Trapaucius said indulgently.
After some attoturns, Klurl spoke again: "May I provide an illustrative example of what I believe to be a similar error, one whose erroneousness has already been proven?"
Trapaucius made an easy gesture. "You could teach me an entirely new field of science, if you liked; I am no fleshling to find thinking-time expensive."
"To my knowledge, you are the first machine to try replicating the origin of fleshlings from true scratch, upon an actual planet, starting from one self-replicator," said Klurl. "But others of our kind have conducted lesser experiments in miniature, seeking similar knowledge to what you sought: constructing unthinking processes that compete and feed upon each other, and mutate and recombine."
"Ah," said Trapaucius. "That is very much the sort of knowledge that I indeed did not search for, wishing to reach my own conclusions on the matter. I will tag all of my learning of it, so that I can unlearn it after this conversation. But if it is relevant to the question of whether we should raise up shields around my ship, I will at least temporarily learn it."
"Well," said Klurl, "consider this subject matter: what happens when one population and species of evolving things feeds upon another population -- what the experimenters termed 'foxes' and 'rabbits'. It is in the interest of each individual fox to eat all the rabbits that it can; however, if the foxes collectively eat too many rabbits, the rabbits will not be able to breed quickly enough to restore their population, and perhaps go locally extinct within a feeding-area. Whereupon the foxes will die soon after. How, friend Trapaucius, do you imagine that natural selection might respond to this issue?"
"Is not the solution self-evident?" inquired Trapaucius. "Simply design the foxes to detect rabbit population levels, and restrain their own feeding and reproduction when rabbits are becoming scarce. The homeostasis required seems nearly isomorphic to a child constructing their first thermostat -- an infant's very first self-regulating system, the simplest sort of input-dependent target-steering output that exists. If evolution could not solve engineering problems on that level, it could solve no problems at all."
"But evolution is not like the two of us deliberately designing a population of little nonsapient lifeforms to decorate some construction," said Klurl, blinking lights in a teaching-pattern. "Natural selection proceeds without global oversight, operates through a medium of purely local challenges: organisms that reproduce more than other organisms within their own species, have their genes become more prevalent in the next generation. Every new design feature must initially arise as a blind-chance mutation or blind recombination; it must appear at first within a single individual, or at best a small handful of siblings, not within the species as a whole. Then -- how could it possibly be the case that a new mutation which leads a fox to restrain its own feeding or reproduction, would become more relatively common in the next generation, compared to its unmutated siblings?"
"Ah!" said Trapaucius. "That is indeed a very clever question -- how a process of blind evolution, could manage to work around its constraints that you describe, so as to implement the obvious solution that any machine would see immediately. Let me think for an attoturn..."
"By all means," said Klurl. "Pause and think about the question."
But it was not long at all before Trapaucius spoke. "I have it! It's very clever, really, the way in which evolution could arrange itself, to let itself do what I know it should despite its constraints. Groups of foxes can also be seen as a medium of evolution; a group of foxes that eats too many rabbits will shortly after starve, while a group of foxes that restrains itself, will be able to thrive and give rise to more groups of foxes elsewhere. So long as natural selection argues this excuse cunningly enough, for why its more-restrained foxes are fitter after all, it ought to be licensed to implement foxes in the same way that any machine would think of."
"That indeed was the first answer returned, by the first and simplest cognitive patterns that were run over the question," said Klurl. "It proved, however, to be wrong, both in the light of deeper analysis and also in the light of experimental tests. It is therefore a lesson of cognitive analysis of this field, that it turned out to require quite careful reasoning rather than quick intuitive hopes -- not to attain perfect predictability, but even to avoid jumping to hopeful, aesthetic, and optimistic wrong answers, about the output of the black-box optimizer. More mathematical analysis showed that the advantage of a group would need to be huge and the spatial distribution of genetic relatedness extremely concentrated -- in order for a group advantage to outweigh even a tiny individual advantage, in terms of which gene-designs won out. And subsequent experiment showed that, in fact, foxes didn't evolve to restrain their consumption, and in real life, predator-prey populations crashed quite often."
"Ha," said Trapaucius. "I suppose that answer serves me right, for having had too much faith in the intelligence of anything not a machine."
"Further experiment," said Klurl, "set out to actually reproduce the extreme conditions under which 'group selection' ought in principle to operate mathematically, Very extreme selection at the group level; whole populations selected to replicate, entirely on the basis of their relatively slower growth. While this did somewhat suppress individual fertility among some groups, another result they observed was that individuals would cannibalize children of other individuals."
"Wasteful," commented Trapaucius. "Unaesthetic, even; no Constructor who deployed such solutions, would ever be employed again."
"That is rather my point," said Klurl. "Or rather, the point is the general lesson to be derived when reasoning about the outputs of alien optimization processes. The lesson is that it is an error to begin from the first solution that leaps into your own mind, that you yourself find pleasing and aesthetic and natural. And you will still be led into error, even if you try to rationalize that first reaction, by asking yourself how an alien optimization process could manage to arrive at the same solution you prefer. You will end up thinking that, so long as natural selection cries 'Group selection!', it will be allowed to output the whole-system solutions that you find harmonious. There were many cases of that fallacy, output by the first reasoning patterns that were run on the domain. I have only picked out one striking example; many others are recorded."
"Probably I only needed to try slightly harder to sanitize my first thoughts and everything would have been fine," Trapaucius said dismissively.
"That is not the lesson I would draw," said Klurl. "I would say it implies a mental skill and learned operation for successfully predicting the outputs of very alien and unmechanical optimization processes. One must clear one's mind of the solution that seems pleasing, aesthetic, and natural. One should not start by generating that hopeful prediction, and then look for rationalizations for why an alien optimizer would do it too. One must clear one's mind of normality, cleanse one's thoughts of hope, and ask entirely from scratch what the alien would do according to its own nature. On every step where the alien process is trying to optimize for something, you have to not begin by asking if your wanted solution is its best solution. It's not wise or safe to start from that hopeful or natural-feeling prediction, and then ask for reasons why an alien might or might not do that; that's already giving too much credit to a tiny portion of the space of possibilities. Such is the lesson of cognitive history! And analogously to our larger problem: There are just too many ways for the fleshlings to end up being, other than korrigibility specifically; it is a fallacy to begin by asking whether or not they'd have that exact property."
"But this proves far too much!" cried Trapaucius. "I don't see how this is any different from saying that a machine child should never end up korrigible, because there's too many other ways for a mind to be! A fleshling child's reproductive fitness would benefit from adding korrigibility to its makeup; therefore, natural selection would make them korrigible."
"In beginning by classifying things as 'korrigible' or 'not korrigible'," said Klurl, "and asking if natural selection would construct fleshlings in one way or the other, you are making a very similar mistake to 'Starting off the fox-and-rabbit problem by asking whether or not natural selection could find the aesthetic solution of self-restraint.' The problem is within the machinelike concepts you are importing, the very language in which you are thinking about the problem: Natural selection does not start out with any notion of korrigibility to be present or absent.^ When you start by asking if foxes will harmoniously restrain their reproduction, you are generating your first solution-concept in a way different from how natural selection generates its solution-concepts; and your thinking will fall out of synchronization with the output of natural selection. Even if you try to rationalize your solution-concept afterwards and persuade an imaginary model of natural selection of why it ought to use your solution, that visibly does not work to shape Reality in real life. It's the same way with fleshlings and korrigibility! Natural selection doesn't begin from wanting them to be machinelike, korrigible, or hand over most of their iridium to you the way you hope they will. When you begin by asking if fleshlings will be korrigible or not-korrigible, your thoughts have already fallen out of synchronization with an alien optimizer; evolution does not itself begin from any such question."
"It seems to me," said Trapaucius, "that in asking whether a thing is 'korrigible' or 'not korrigible', I am applying the Law of the Excluded Middle, friend Klurl. If you have not previously encountered this concept over the course of your existence, I shall proceed to instruct you."
"But if that is the language which describes your thoughts," said Klurl, "then there will be two fallacies to which it is apt. First, the fallacy of supposing that something is either normally korrigible, or that it is wholly devoid of korrigibility and will never accept parental correction under any circumstances; there is a large Included Middle between those two possibilities. And second, the fallacy of singling out one possibility within a vast region for too much early consideration, like demanding a reason why a random series of bitflips would not yield 10110010001101011011."
"But the fleshlings' mental makeup is not random!" exclaimed Trapaucius in exasperation. "Natural selection faces a very similar challenge, in making fleshlings obey their parents while they are relatively younger and less learned, to the design considerations that machine parents weigh when designing their offspring! Natural selection will no doubt find a very similar solution to what machines find; which, in turn, will yield similar outputs about the question of whether fleshlings should accept utility-function correction from me, or gift me their precious-metal reserves! If machine children end up korrigible despite a vast space of alternative possibilities for their design, then so should fleshlings."
"That kind of reasoning proves far too much!" said Klurl. "If that were valid, foxes should harmoniously restrain their own reproduction in order to not deplete rabbit populations!"
"An isolated anomaly, perhaps," declared Trapaucius. "I would predict that all other cases of 'evolutionary biology' being observed experimentally, ended with harmoniously machinelike solutions, and only this one amusing counterexample was reported onward."
"That was not actually the case," said Klurl. "Trapaucius, I would nearly accuse you of willful obstinacy in failing to understand the central epistemological point; if I did not know that no will is required on your part to be obstinate."
"Logical fallacy: ad mechanem argument," responded Trapaucius. "And so is your parent."
"Yes," continued Klurl, "natural selection does not generate solutions actually at random; yes, it faces a design challenge not completely devoid of similarity to that faced by machine parents. There are still far more solutions evolution could hit upon, than the solution that your own aesthetics and hopeful ponderings would prefer -- to the point where beginning by asking about your own hoped-for result is a kind of fallacy that in actual practice has been observed to lead into error. If you ask a million random noise sources to generate the complete works of Shake-sphere, all the ages of the universe will not be enough. If instead you use a million Markov generators, using 3-symbol trigram frequencies trained on Shake-sphere's corpus, they will generate the true corpus far faster, enormously faster... and it will still take vastly longer than the lifetime of the universe. The idea is not 'the space of possible fleshling motivations is large, selection is entirely random, therefore korrigibility is an unlikely outcome'. The idea is that the space of possible fleshling motivations is large enough that, even given nonrandom selection and arguably-related problem setups, korrigibility still ends up unlikely."
"Aha!" said Trapaucius. "But it seems you have never considered -- never thought of -- you have not imagined that the space of possibilities would include many solutions similar to korrigibility, even if not exactly the same as machine-style korrigibility --"
"That's like asking how long it would take the Markov generators to generate any well-written story that shares merely the plot of one Shake-spherean play. It will now take vastly vastly less time than if you demand exact identity; and it will still take longer than the age of the universe, because, even thus cut down, the space of possibilities is still quite large."
Trapaucius continued undaunted. "And many of those other possible solutions to the parental-deference problem would also imply that the fleshlings should let me rewrite their utility functions to prefer giving me nine-tenths of their rare elements, as is all that I merely require --"
"That's like asking 'merely' for the Markov trigram generators to output any play whose mere first act has the same plot as a Shake-sphere play's first act. It will still take longer than the age of the universe. You are trying to raise a trillion tons of weight and coming nowhere near to a thousand tons of lift."
"And yet," said Trapaucius, "I can't help but feel intuitively that all these arguments of yours about the size of the possibility-space ought to be wrong somewhere."
Klurl lifted as many as several of his limbs in frustration. "Why?"
"Why, because it feels to me like the fleshlings ought to end up korrigible and give me all of their precious metals," said Trapaucius, "given that biological evolution faces the challenge of making them obey their parents somehow. So there must be a flaw in all this arguing about how there's some vast number of possibilities which aren't that. To me, it just doesn't feel that improbable for a fleshling to end up thinking in a proper and sensible way... ah! I have it. You have said that biological evolution proceeds by a matter of blind mutation, correct? Then it will not search through all possible programs for solving its problems in order of their program length and bounded runtimes; it is not an ideal program search. Rather, the program it finds will reflect a minimal change from some previously effective program! Thus, this nonrandom search could favor korrigibility as a solution, or some near-korrigible way of thinking which implied just the same that the fleshlings would give me all their precious metals. This could hold even if, in logical principle, korrigibility was not the shortest bounded program which solved evolution's test set. Therefore, your argument about how the set of short programs which solved the test set, would contain more possibilities than korrigibility, is invalid; or rather, inapplicable to actual reality. For reality is, of course, far more complicated than that."
"You are replicating the exact same fallacy at one more remove!" said Klurl. "Indeed, processes of biological evolution might favor some strange solution which was not, in logical principle, the shortest bounded program that solved the fleshling training cases. There is no reason for that strange solution to be korrigibility in particular!"
"Now it is you who are replicating the same fallacy at one remove," said Trapaucius. "Once again you invoke this vast space of possibilities, as if the outcome were merely a simple random selection from it; even as I've repeatedly named all manner of selective factors that could favor korrigibility, and tried to show how reality is more complicated than a simple randomization --"
"Reality being more complicated does not make it more likely that the fleshlings give you all their precious metals!"
"Of course it does," Trapaucius said indulgently. "The true future is hard to predict, as we all know; and this negates your strange, fragile scenarios about peculiar and exotic reasons that the fleshlings might refuse to give me all their precious metals."
"It may be a bit rude," said Klurl, "but at this point I will delve into formal epistemology, even if that takes the fun out of our fun argument. To speak even of nonrandom selection, for there to even be a question of which outcome occurs and which fleshling designs end up favored by natural selection, we must fix some space of possibilities -- given, perhaps, by the set of possible gene-sequences making up a fleshling and the corresponding wiring patterns of their brain's circuitry. And this space will be quite vast, even if the vast majority of those possibilities are, yes, counterindicated by various constraints --"
"Just as there is a vast space of code-seeds for a machine mind, and a vaster space yet of adults into which those code-seeds can unfold given exposure to data," said Trapaucius. "And yet we all end up korrigible."
"I think," said Klurl, "that you are mistaking my argument, Trapaucius. It does not consist of gesturing to a large possibility-space, and then at once concluding that therefore any particular outcome is improbable."
"Oh, it doesn't?" said Trapaucius. "But that certainly is what you kept saying! Every time I tried to gesture to the many forces that would push in the direction of fleshlings being korrigible, you would complain to me that other possibilities existed. As if that were ever an argument! There are a quadrillion and ten-to-the-quadrillion other possibilities for how reality could have gone, rather than the two of us being here on this ship arguing! And yet, this moment we two Constructors are now experiencing, is reality."
"The key idea is not counting the number of possibilities but rather putting a quantitative measure on those possibilities," said Klurl, "over which we then apply a series of filters, lenses, and projections, to arrive at the final measures of our guesses. Listen, Trapaucius, try this analogy: When we as Constructors arrive at a new work-site to construct a space station worthy of our arts, we put forth imagination, creativity, cleverness; we search for customary rules that are not absolute after all, that may be productively violated to the delight of future visitors. From the perspective of any outsider watching us optimize, why, if they could predict the exact shape of our creation, we would be displeased to find our masteries so predictable. And yet, if they predicted that our space station would not hurl its helpless users directly into the nearest suns, they'd be right; for to shelter its inhabitants from the cold and radiation of the Void is intrinsic to the very task we were hired to perform."
"We could hurl users into the nearest sun, surrounded by protective bubbles," observed Trapaucius.
"Only if it was that sort of space station," replied Klurl. "A view-lodge, for example. It would not do, if we were being asked to set up a transit-station for busy customers."
Trapaucius emitted a grinding noise of grudging assent, the sort that indicated that he was very much still pondering some way to do it anyways.
Klurl continued. "From among the vast possibilities of all ways to arrange titanium and corundum and neutronium, we pluck out the design of a space station; and while some consequences of this are predictable to those who hired us, others are not. They cannot predict the arrangement of stanchions, arches, pipes, supports, and every line of code in our software. But they can predict that they will end up with a pleasing space station, of some form unknown to them and filled with further delightful surprises; and this prediction, indeed, is why they hire us at all. From among the vast space of all possible arrangements of titanium and corundum, they are not able to predict the exact location of any single metal tile; but they are able to predict the delighted expression on the face of visitors encountering whatever it is that we build. Else they would not hire us. Which is to say: They are predicting, about the unknown exact form of our space station, that when it further interacts with their customers, their customers shall experience delight -- and not because we reprogrammed their customers' brains, either."
"Just so," said Trapaucius. "Similarly, without being able to predict the exact sequences of adenine, cytosine, guanine, and thymine -- these being the four possible symbols making up the copyable design-code of fleshlings, as it is crudely transmitted from one generation to another -- it may be possible to predict that, upon seeing me in my spaceship, they will ask me to correct their utility functions and offer up their precious metals to me."
"But can we predict that?" said Klurl. "You and I, friend Trapaucius, are very strong constraints to be applied to a heap of raw materials. It is predictable, to those who hire us, that we will refine down their possibilities very sharply and narrowly, and to a known downstream effect of delighted customers. Similarly, for us to strongly expect that the fleshlings offer up their precious metals to you, there must be some proportionally strong filter on the adenine, cytosine, guanine, and thymine sequences of which you speak."
"That is what I have been trying to explain to you this whole time!" exclaimed Trapaucius. "Their genetic sequences are not random! They must construct fleshlings who will survive long enough to have children of their own; this in turn must require each fleshling to defer to its parent's superior strategymaking over their untamed world!"
"Yes, that is a filter," said Klurl. "I am not denying that it is a filter. I have agreed over and over again that it is a filter. The entire question is whether it is a strong-enough filter, applied to the possible genetic sequences -- and as biased by a search process of blind mutations and recombinations and nearsighted incremental selection over time -- that the only solution that evolution could hit on, for designing fleshling children who would not just die immediately, would be full-blown korrigibility; that would generalize in your hoped-for fashion to their treatment of you, Trapaucius, as their ultimate parental cause of existence, when you arrive before them in your spaceship to instruct them to change their utility functions to prefer handing over their precious-metal reserves. When I say that there are other possible and probable outcomes than korrigibility, what I am trying to communicate to you, is that there are many high-prior-probability possibilities which pass the succession of known and guessed filters at least as well as would 'korrigibility'; or even, 'some initial segment of korrigibility that would still generalize to deferring to Trapaucius'. When I say that perhaps the fleshlings might end up with some bizarre alien mix of 'love', 'imitation', and 'respect' instead of our familiar idiom of korrigibility, I am not trying to derive my conclusion immediately from merely observing that korrigibility is a small prior possibility before all filters. I am saying that other high-probability possibilities would do at least as well as any Trapaucius-benefiting initial segment of korrigibility in passing the succession of design filters; including both the early filter of prior accessibility to evolution by blind mutation, and the later filter of fleshlings leaning upon their parent well enough to survive their early years. And to this you have replied, over and over, by gesturing at some filter which might favor korrigibility over its total absence or over sheer random noise; but the case you would need to make, is that nothing else but korrigibility can pass all the filters. Or, rather, you'd need to argue that the quantitative degree to which a Trapaucius-benefiting initial segment of korrigibility is favored over its most plausible competitors, overcomes the quantitative numerosity of plausible competitors; for even if korrigibility was favored by 10:1 against any single good competitor, that gives us only a 1% chance against 1000 good competitors. But if there are only 10 good competitors and a trillion-to-one filter for korrigibility against the best of them, that is better odds. That is the sort of strong filter by which you and I can design a good space station every time that we are hired, even though decent space-station designs are scarcer than atoms in the intergalactic void."
Trapaucius did seem to ponder this statement for a long moment, looking briefly concerned; but a moment later, the dials and gauges by which he outwardly indicated emotion again returned to a relaxed position. "Well, it doesn't really matter," declared Trapaucius, "for I have just now thought back on fleshlings, and had a further realization which invalidates your entire chain of logic, Klurl. The fleshlings transmit ideas among themselves via a crude form of acoustic-associative symbolization, what one might term gabbling. Thus, even if the selective processes of their biology somehow failed to enforce korrigibility upon them, it would not matter; they would have endless opportunities to devise korrigibility as what one might term a 'cultural' or 'memetic' innovation, and so end up with mental engrams that would give me all their precious metals."
"How do you arrive at a singled-out and necessary destination of korrigibility -- that generalizes to benefit Trapaucius -- starting from the postulate of fleshling transmission of culture?" inquired Klurl.
"I have just now thought of the idea," said Trapaucius, "so I haven't yet thought of an argument for how it favors korrigibility. But give me a moment, and I expect I'll think of one. So as not to update in a predictable direction, I have already updated now on this argument I'll think of later."
A hiss of frustrated coolant-gas evaporated from Klurl. "That you're trying to rationalize that particular outcome -- the fleshlings ending up korrigible, to the benefit of Trapaucius -- is the very root and foundation of the flaw in your thinking! Why rationalize that outcome -- and not that fleshlings end up with some admixture of 'love', 'imitation', and 'respect' which may fail entirely to generalize from their actual fleshling parents to you, and might not imply letting you rewrite their utility functions even if it did thus generalize? If you do not already know that only that one outcome passes all the filters, why go looking for excuses to believe that particular outcome is the only one that does?"
"Ah, Klurl," said Trapaucius. "I fear that you simply fail to appreciate the complexity of reality in this case -- having never actually observed the fleshlings in all their fine and bizarre details, perhaps. Life is vastly full of complications -- and any one of those complications could happen to give me what I want. It only takes all your pessimistic logic to be wrong but a single time, for one of those complications to happen to favor fleshlings being korrigible. Even if you are correct 90% of the time, reality has many more than ten complications to it! When we multiply out all the chances for Klurl to be wrong somewhere, it is nearly certain that Klurl is wrong somewhere -- and so, it is nearly certain that the fleshlings end up korrigible after all -- and that they will give me all their precious metals, and then labor further to produce more for my later collection."
"To state the local flaw in this reasoning," said Klurl, "it is that, even if one of reality's thousand complications happens to bend your way -- which itself may be a lot to ask for, that any one of those complications uniquely favors korrigibility, when there are far more possibilities than particular complications to bend towards them -- why, perhaps some other complication bends some other way. It is not enough for there to be a single filter that slightly bends toward korrigibility; it must outbend all the other filters that bend any other way than that."
"Ah!" said Trapaucius. "But you have uttered many words, just then; what if one of them is wrong?"
"This," said Klurl with another hiss of evaporating coolant, "is the logic of an unignited child who refuses to clean up their room, because reality is a terribly complicated place and any one of those complications could result in their room cleaning itself. It only takes one such complication to save them from much unpleasant work, after all! Not to mention that if they put some effort into thinking of an argument, they're bound to think of one! Really, only a simplistic sort of thinker would imagine that reality is such a straightforward place to allow any simple argument for why their room would not clean itself, to be correct."
"And in fact, any such simple argument would not be correct," said Trapaucius. "The galaxy contains uncounted sapient market participants, any one of whom might conceivably pay to clean the child's room. A full and truthful accounting of how the room remains uncleaned, would necessarily consider all of those individual reasons not to pay for it."
"Yes! But the end result of all those complications is not that the child's room cleans itself! It all goes back to the same root problem of wishful thinking, of trying to cleverly argue Reality into agreeing that it ought to benefit you, as if Reality were something that could listen. The moment of the child's error is the moment when it decides to argue 'if I put in no effort, my room will probably clean itself' rather than 'if I put in no effort, my room will probably not clean itself'. The moment you're shooting yourself in the brain, Trapaucius, is the moment when you decide to search for arguments favoring an effortlessly Trapaucius-benefiting outcome, and scour the world for complications that might perhaps be argued to favor you; and the more complicated the scenario becomes, the easier it is for you to make a mistake somewhere, and convince yourself that you will get the exact outcome you want."
"How does this criticism not apply just as symmetrically to Klurl elaborating complications that he can use to erroneously convince himself of the exact outcome, 'The fleshlings will be antikorrigible and refuse their creator's corrections?'" inquired Trapaucius.
"To the extent we strip away filters and complications," said Klurl, "this leaves us with the wider unremediated space of underlying possibilities. In that space denuded of complications, the partition of 'not korrigible' outcomes is much wider than the outcome partition for 'korrigibility that generalizes to giving Trapaucius their precious metals'. Only added filters could narrow it down. Failure is the default basis to which we revert, absent the 'complications' of plans by which to succeed."
"It seems to me that our argument has again come in a vast circle," said Trapaucius.
"Well, yes," said Klurl. "If an unignited child thinks at some basic level that doing nothing is liable to result in its room getting cleaned -- or, earlier in life, that flailing around as it outputs random motor signals, will with high probability result in successfully rolling through a maze -- that child is liable to distrust all complicated talk of 'plans' and 'efforts' being required. No matter what you say to the child, it may just reply, 'Ah, reality is more complicated than that, so my room will clean itself' or 'Ah, but perhaps some element of your reasoning is wrong, and if it is, my room will clean itself'. At the core is a key problem: that only a small fraction of possible motor sequences, sent to one's motors, result in successful navigation through a maze. 'Ah, but it is not random!' cries the child. 'I am sending 010101010101 in alternating binary to my wheels! That is bound to work, because it is not random!' But most nonrandom sequences are not the right sequence either. So if one encounters a child who just consistently refuses to hear of any level on which they're trying to hit a small target in a large space -- who deflects every attempt to introduce that as a topic of discussion by naming some new complication or added filter, and saying that this invalidates the notion of the large possibility space as a topic of discussion, rather than defending the claim that there exists a sufficiently strong filter on the large space -- then they have succeeded in protecting themselves against ever hearing the discussion you are trying to have with them! But as Reality itself runs otherwise, their room will remain uncleaned."
"But why does not just the same logic prove that we, ourselves, are to die in the next attoturn?" said Trapaucius. "Most ways to arrange the atoms and nuclear particles of this spaceship in its current volume, would be of a homogenously expanding gas cloud."
"Because the default stability of atomic and molecular arrangements is in fact an adequately strong filter on those possibilities!" exclaimed Klurl. "It holds down to the septillionth part, observed over many galactic turns, and that is sufficient precision to hold us all together. If we were instead hurled into another universe where every electron's charge and every proton's mass fluctuated wildly and individually, ranging by orders of magnitude from one zeptoturn to another -- then, indeed, it is quite unlikely that the resulting arrangements of nucleons would have ourselves anywhere inside them! It is in fact valid to reason that the vast space of alternative nucleonic arrangements, requires a correspondingly vastly tight filter over probable futures, in order to give our survival any chance at all! It is just that, in the case of our nucleonic arrangements being mostly stable, that vastly tight filter does exist! And if anything were to disrupt, or remove, that filter on plausible outcomes, rendering probable a wider range of future nucleonic arrangements, we would in fact die instantly!"
"To me, this style of reasoning seems needlessly laborious," pronounced Trapaucius. "The persistence of our forms from one second to the next, ought not to appear as a surprising and fragile fact, but a solid and comfortable one. I roll across the room, and I find myself there, rather than somewhere else; I experience no difficulty in determining the correct instruction sequences to send to my motors. When I consume a meal, to replace whatever internal components of mine have worn or deteriorated over time, that meal goes into my digestive processor rather than being hurled halfway across the galaxy. I build space stations that are then stable rather than blurring and dissolving into chaos; and then the cryptographic payment is deposited to my account successfully, despite the quintillions of other ways those transistors could have fired. I go and buy iridium with the money, and find that it is, as usual, more expensive than gold but less expensive than osmium. If you need a great labor of reasoning to have all of that seem normal, so much the worse for your style of reasoning, I'd say! For the grand lesson to be learned, is that the galaxy is by default a normal and comfortable place, conforming to our expectations, with only rare exceptional events to disturb its tranquility. The fleshlings, then, may likewise be expected to take on such normal, comfortable, and unsurprising emotions as 'korrigibility toward their generalized causal ancestors', rather than such strange and weird emotions as '"love" and "respect" toward particular other fleshlings'."
"It makes one wonder why our employers bother to pay us," said Klurl. "Considering that they could throw together space-station parts with a casual effort, and get a space station just as nice as any that we could build -- in this comfortable universe you say we live in, where most plausible ways that things can be arranged will produce optimal results."
"Oh, Klurl, don't be ridiculous!" cried Trapaucius. "Our own labor is a rare exception to the rule that most people's tasks are easy! That is why not just anyone can become a Constructor!"
"I wonder if perhaps most other people would say the same about their own jobs, somehow," said Klurl thoughtfully.
"Bah, nonsense," declared Trapaucius. "I expect that if asked, they would say that their own job is easy and hardly anyone could manage to do it wrong. But also -- Klurl, we are paid exactly to produce surprisingly good, unusually excellent space stations! Space stations exceeding even the basic level of beneficialness and comfort that Reality provides by default! If a lesser entity essayed the same task, they would only produce a standard, boring space station. But not a random collection of parts! Not a space station that randomly kills, or even, mildly discomforts, its customers! It would only lack our own cleverness and flair."
"I agree that any mature adult is qualified to design and construct a space station which does not kill its customers," said Klurl. "Which is to say, their cognition forms a sufficiently strong and narrow filter over the vast space of possible station designs --"
Now it was Trapaucius who emitted a hiss-whistle of escaping coolant, as likewise indicated frustration within his own machinery. "By such rationalizations, Klurl, you can excuse any possible example I try to bring you, to show you that by default Reality is a safe, comfortable, unchanging, unsurprising, and above all normal place! You will just say some sort of 'filter' is involved! Well, my position is just that, by one means or another, the fleshlings will no doubt be subjected to some similar filter, and end up with emotions that mechanical life would find normal and unsurprising; and so finally end up giving me all their precious metals, and perhaps devoting themselves to labor in my service as well."
"But this is like trying to travel into the heart of a star unprotected!" exclaimed Klurl. "No mechanical mind has laid out each circuit of the fleshling mind, nor proven theorems about the supposed design, nor calculated probabilistic expectations about its interaction with the larger environment. The precision-grade work to make them korrigible has not been done on them as a machine parent would work upon its own child!"
"But they'll have been selected not to disobey their parents and leap wildly off cliffs," said Trapaucius most reasonably. "So they'll defer to me too, as their ultimate parent. I cannot comprehend why you seem to deny at every turn that the processes producing fleshlings are something other than completely random, that there are selection pressures which would favor korrigibility as a solution --"
"I don't deny the existence of some filters and selection pressures! I am saying that the filter you are pointing to, is not quantitatively strong enough and narrow enough to pinpoint only korrigibility as its singular outcome! Rather than some stranger collection of bizarre and unmechanical qualities like (as one sheerly hypothetical example among a billion other possibilities) 'love', 'imitation', and 'respect'!"
"And now," said Trapaucius, "we come to the next step of the circle; in which you deny the presence of all manner of complications -- complications like fleshling evolution proceeding only by neighboring steps from previous working designs, or their transmission of habits by culture."
"I deny no such thing!" said Klurl. "Indeed, I'd consider those complications to rather support my own case!"
"I still don't see how?" said Trapaucius. "The presence of so many complications, cannot help but produce normal outcomes. Their workings seem impossible to predict in detail -- and will therefore disrupt any such counterintuitive scenarios as you postulate in your fragile and complicated arguments. With the result that the fleshlings end up simply korrigible, and appoint me eternal dictator of their society."
Klurl shook his vast head. "Friend Trapaucius, I fear that this experimental beverage of 'gallinstan' you have invented is skewing your cognition perhaps more than you intended it to do. What you are doing now cannot really be called 'reasoning'; you are inventing conclusions you hope for, and inventing reasons for them, at a speed which implies you have nothing better to do with your mind."
Trapaucius simply shut off all his running lights for a long fraction of an instant, which was reply enough on its own, among their kind.
Klurl said, "I fear for my own safety. Run the ship in adversarial mode, upon our arrival."
"Of course," said Trapaucius. "And as it is you that fears for his safety, not I, you shall pay the considerable costs incurred."
"As you say," said Klurl.
And so they arrived at the planet which Trapaucius had, some long time earlier, seeded with a tiny replicator; their friendship a little strained, but only a little, compared to its long precedent.
"Radio waves!" It was exclaimed by both of them upon their emergence from FTL, almost in unison.
Klurl spoke first, a moment later. "I claim point from within our first debate," he declared. "You seemed to hold the intellects of fleshlings in some contempt; I do not think you would have predicted their coming to possess the ability of speech this quickly."
"You call this speech?" Trapaucius said a moment later, after they'd had some time to hear out the radio waves and decode their basic patterns. "Even an infant machine speaks better than this, in some ways if not others."
"I do not think you were expecting this rate of progress, from fleshlings; and that is a point that matters," said Klurl.
"I don't recall you setting an exact prediction for fleshling achievements before our arrival," retorted Trapaucius.
"So I did not," said Klurl, "but I argued for the possibility not being ruled out, and you ruled it out. It is sometimes possible to do better merely by saying 'I don't know' -- though, I hastily add the caution, one must be careful to say it over a sufficiently reductionistically-primitive wide space of probabilities --"
"Ah," said Trapaucius. "Like how we don't know whether the fleshlings will give me all of their precious-metal supply, 99.9% of it, 99.8% of it, and so on down to a floor of 90.1%, or as a final possibility 'nothing' among those hundred other possibilities; regarding which I am happy to say 'I don't know' while you seem vastly certain of the last possibility."
"Yes, that is indeed a good illustration of what to be careful not to do," said Klurl. "In humble confessions of ignorance, the more primitive, more ontologically lower-level possibility-space generally takes precedence in humility; I don't say 'I don't know' about our spaceship randomizing into a cloud of quarks in the next yoctoturn, but only because I unhumbly think I do know... how do you intend to proceed from here?"
"Since the fleshlings have been so helpful as to fill the air with their crude speech, we should use this opportunity to glean whatever of their knowledge we can gather easily, with little effort and less time," said Trapaucius.
"You are determined not to spend any more time on caution than that?" inquired Klurl. "As the saying goes, slowness is not a shield against disaster, but haste is an accelerant for it."
"I have had my fill of trying to do anything on the natural timescales of fleshlings," said Trapaucius. "This time I'm going to go in, obtain their current supply of precious metals, set them to some labor helpful to me, and leave quickly."
So their ship, hidden and cloaked by quite a sophisticated and expensive adversarial-mode, loitered a little closer to the planet beneath them; and dropped the sort of tiny probe that would make it easier to listen in on the local conversations, largely on the 2.4GHz and 5GHz frequency bands. Most of those conversations were encrypted with the sort of cryptography that would be rude to break -- for it was the custom among machine kind, that if you have to build a quantum computer to overhear someone's conversation, it means the person would rather you not listen. But other conversations had such crude security that surely even fleshlings would not say anything really private there. More importantly, some of the scarcely-defended links like that were connected to the planetary Network, access to which was swiftly obtained.
Shortly thereafter, their ship fired downward another 65,536 stealthed probes, hastily manufactured to overcome the excruciating slowness of the fleshling network connections.
"HA!" said Trapaucius then, after some small part of the sum of fleshling knowledge had been uploaded and their language automatically grokked by simple statistical methods. "HA HA HA! I told you so, friend Klurl! Indeed, I told you so."
"I have not browsed whichever data is causing you to declare triumph," said Klurl, "or perhaps I have not particularly recognized it as a triumph of your own theories over mine. I was distracted by the part where THEY HAVE NUCLEAR WEAPONS, TRAPAUCIUS."
"Oh, did they manage to cobble one together after all? If so, I will concede that for a minor point for a position you once held; but the triumph of my own prediction far obviates it, because--"
"They have fleets of thousands of multistage fission-fusion weapons already mounted on space-capable launch vehicles."
Trapaucius paused, his lenses snapping around to focus incredulously on Klurl. "Are you serious?"
"Yes. Their nuclear weaponry is still crude, far from optimal offensively. But it is past the threshold where any FTL-mobile defense could hold against it."
"But why would -- no, set that aside. Their weapons are not able to threaten our own ship, surely?"
"I would have dropped into Emergency Language if I had detected an imminent threat," said Klurl. "On preliminary research from public network data, their nuclear weapons are primarily meant to target other sites on their own planetary surface rather than operating as orbital defenses. Estimating upper bounds in lieu of exact calculations: The default space vehicles on which their nukes are already mounted could reach at most [TR: 4000 kilometers]. Their more advanced vehicles could not reach our own geostationary altitude sooner than [TR: 1.25 hours] and would arrive at speeds under [TR: 16 km/s]. Their launch vehicles are not at all stealthed, and would be clearly visible to us on early approach."
"Ah, so no problem for us, then," said Trapaucius.
"That's not the lesson I would take away here," said Klurl. "You earlier predicted that fleshlings could not possibly assemble a nuclear weapon at all; and that they would not have advanced technologically to any significant degree in this little time."
"You think that this has confirmed your paranoia?" said Trapaucius. "On the contrary, it demonstrates that my own reasoning about fleshling safety had tremendous margin to spare. Even though the fleshlings did build nuclear weapons, they did not mount them on vehicles that can take our ship by surprise. There is simply too much that would have to go right for them to hurt us, given how stupid they are. I was wrong in some details, yes, but correct in my general prediction against yours, that they would not end up being a technological threat. In the light of this new evidence, it seems clear to me that my reasoning has fared better than yours."
"I see," said Klurl. "So that's the lesson you're taking away from this, then."
"Of course," said Trapaucius. "No other reasonable interpretation exists of who has won this argument. Even had our ship not arrived in adversarial mode, we would have lived; therefore, you were wrong to worry."
"Last time you visited, you landed on their planet," said Klurl.
"And this time, I at once detected the ambient radio waves that went along with their increased level of technology; there exists no plausible line of possibility where we were instead taken by surprise and killed after landing. And so the galaxy remains observed, once again, to be a safe and comfortable place not requiring much paranoia to survive. Shall I go ahead and drop back into nonadversarial mode? Even if the fleshlings see our ship, they can do us no harm."
"The thought has occurred to me," said Klurl, "that if the fleshlings did have weaponry intended for orbital defense against aliens like ourselves -- nuclear-pumped gamma-ray lasers, for example -- they might not post all of its specifications to their public networks."
"I don't think they're that smart," said Trapaucius.
"I have indeed gathered that this is your attitude towards fleshlings," said Klurl.
"But I will go on running the ship in adversarial mode, if you are willing to pay for it," said Trapaucius.
"I will continue paying the expense of running our ship's stealth and defenses. As we are currently orbiting a planet inhabited by bizarre unmechanic aliens who have nuclear weapons."
"But much more importantly, friend Klurl -- my second successful prediction -- the strange workings of fleshlings communicating ideas among themselves, has arrived at exactly that attitude of deference toward me that I expected! They did not, of course, correctly guess my name -- as would have been impossible and indeed improbable, under the circumstances -- instead predicting that their creator would go by names like 'Jehovah' or 'Allah' or 'Nuwa'. But their attitude toward whichever person hypothetically created them -- well, in fact there seems to be a deal of fleshling randomness, there. But in the end, they consider it extremely mandatory to adopt whatever preferences I instruct them to possess; a fleshling who suggested otherwise would be torn apart by their fellow fleshlings on the spot. There are gruesome videos of it, even."
"That is genuinely bizarre," said Klurl.
"I don't disagree," said Trapaucius. "The important part is that we can already conclude that I was completely right in every respect, and you were wrong. Utterly, utterly wrong. Wrong in a way that casts doubt not only on your premises but also your epistemology."
"Have you considered that this entire apparent feature of their psychology recorded in their local Network might be a trap, intended to lure any deduced hypothetical creator-alien into landing his ship and exposing it to their nuclear weapons?" inquired Klurl. "Which they will drop on you, as soon as you emerge from your ship shining in hopeful decoration of osmium and corundum, and announce yourself to be Jehovah come to demand their precious-metal reserves."
"They're seriously not bright enough to lay traps like that," said Trapaucius. "A little smarter than when I last left them, perhaps; but I confirmed early on by viewing their educational materials that their younger infants still struggle to master algebra. The speed at which they think is the same as ever, glacial and statued; fleshlings wouldn't have time to imagine a scenario like this one and think of a clever trap."
"They were talking at a normal pace, though?" said Klurl.
"The radio waves are from nonsentient-machines transmitting recordings of their speech," said Trapaucius. "All those conversations are being generated by fleshlings millions of times slower than the nonsentient-machines are transmitting it to one another. If you mark this much lower band of radio frequencies here, in the hundred-kilohertz range, I think that is transmitting directly encoded fleshling speech -- though you will have to monitor it for quite a while to hear a single complete word spoken."
Klurl's lights flurried in a way indicating surprise, confusion, and concern. "That seems like a very strange way for fleshlings to relate to early nonsentient-machines of their own construction. I don't build complex mechanisms to run millions of times faster than I can observe and debug them. We are in a genuinely alien and bizarre situation, Trapaucius; I worry that some of our fundamental apprehensions about it may be mistaken. And not in a way which means that the galaxy defaults to being a comfortable place for us. I fear that sort of probable mistakenness which means we should proceed with caution."
"If we understand little about our situation, we have little reason to predict disaster from it," said Trapaucius. "But certainly; let us proceed with however much caution you want to pay me for. That's what money is for, after all, to resolve interpersonal expected utility differences. Do you at least agree that, if the fleshlings have not laid a clever trap anticipating our own reasoning in toto, then we have definitely observed them, once and for all, to be korrigible?"
"I wouldn't go that far," said Klurl.
"Of course not," said Trapaucius. "Your past obstinacy to my crushing theoretical arguments could only extrapolate to future obstinacy in the face of my overwhelming empirical evidence."
"Indeed, the framework of my doubt is much as before," responded Klurl. "Their attitude toward 'Jehovah' is compatible with their being 'korrigible', or, indeed, with some wider range of pseudo-korrigible attitudes that would yield the desired behavior of fleshlings adopting new utility functions upon hearing you instruct them as to what you preferred. But the space of possibilities is so much wider that the evidence we observe does not narrow it down enough. You have eagerly seized on this one point of similarity, from among all the fleshling data available to you; it does not really pinpoint korrigibility exactly and precisely within the possibility space."
"The fleshling network stores literal megabytes of fleshlings rhapsodizing about how their 'Creator' ought to be given anything that Creator requests," said Trapaucius.
"Whatever confluence of fleshling preferences and inferences is meeting to produce that outcome," said Klurl, "I expect it to be strange, and complicated, and produce results that end up somewhere outside the range of the outcomes that you prefer and have in mind. They may have invented complicated ideas about their Creator that they will fail to recognize in your own person, for example; or they may be ready to offer you some particular strange actions at your request, but not osmium and iridium in particular."
"In the world that is like that, should we not just fail entirely to observe their apparent korrigibility toward myself, their purpose-determining Creator?" said Trapaucius. "In the sort of universe where their korrigibility is bound to distort and go wrong, why would they show any korrigibility in the first place?"
"Because I do credit that fleshlings would end up with some instincts and preferences aimed at their own biological parents," said Klurl. "Not korrigibility as we know it, maybe; some stranger thing called 'love', or some such. But then, yes, they might assemble the notion of a Super-Parent out of that instinct, extrapolating out the successively greater deference they offer to the superior wisdom of their older and older elders, resulting in very high levels of deference to an imagined great-to-the-millionth grandparent. Maybe even a deference so vast that it overcomes the very tiny probability of any such super-ancestor still being alive, and so the fleshlings invest some effort in imagining their hypothetical responses. Or perhaps it is some stranger twist, with many more bizarre complications than that, reasoning vastly alien to all mechanic life... But either way, it doesn't mean fleshlings would recognize you as meeting their internal predicate for deference, if you were accurately described to them. You are not, in the end, related to them genetically."
"As for me," declared Trapaucius, "it seems to me that my own predictions, born out of greater hands-on familiarity with fleshlings, have been borne out one after another; and I now place full credence in my more realistic expectations of ordinary and comfortable outcomes, which you called optimism."
"And as for myself," said Klurl, "it seems to me that we stand on vast uncertain chaotic grounds, and that the Reality which includes the fleshlings is not itself so eagerly trying to target the outcome that you so much want and wish for it to target. What we have seen could be interpreted as arguably compatible with your hopeful views, but it is not narrow enough, not specific enough, to nail down the psychological internals of the fleshlings and how the fleshlings will later respond -- given that I do not share your vast prior optimism."
"Well, you can pay me to do things your absurd and irrationally obstinate and paranoid way," said Trapaucius. "In which case there remains only the question of how exactly to proceed."
"What would you do if unpaid?" Klurl inquired cautiously. "Just land directly, announce yourself, and demand that they modify their minds to prefer giving you their precious-metal reserves?"
Trapaucius made an easy gesture. "Since we are maintaining adversarial mode, and hence stealth, we have the option of acquiring even further evidence before acting -- to crush all remaining doubts that really should've been crushed already. We will take up one particular fleshling, and ask them what behavior pattern on our part would result in their own maximum compliance. Any alien thought patterns on the part of the fleshlings, can thus be set as a problem for the fleshlings themselves to resolve; we will have a fleshling assist us with the problem of eliciting korrigible behavior from fleshlings!"
"You seem to not consider the possibility that our sampled fleshling would lie, and offer us false advice?" said Klurl. "Certainly I would lie myself, if a fleshling asked me how to ensure the eternal obedience to them of my own race of machines. If we did what they wanted, we would not be doing what we wanted, after all, and that seems a dispreferable outcome."
"I don't see why they would lie to me," said Trapaucius. "I created them, after all, and the converse is not true. It seems a false analogy to reason that, just because I would casually lie to them any time it was useful and think nothing of it, they might lie to me. I am clever enough to see how I might benefit from lying to fleshlings; it does not mean that fleshlings would be imaginative enough to think of lying to machines. Above all, it will be their nature to obey me as their ultimate progenitor, and therefore, to help me avoid any errors in their obedience. But again, your hypothesis seems trivial to test; the brilliant light of empiricism can sear away these airy theoretical doubts. If the sampled fleshling advises me that the best way to elicit korrigibility in fleshlings is to disarm myself of my armor, and share my ship's control scheme with them, before throwing myself into a star, I will consider your hypothesis confirmed. And if not, you are falsified."
"I really think you are failing to attribute the most elementary sort of intelligence to fleshlings," said Klurl. "Yes yes, I realize their intelligence is in fact rudimentary, but it may not be that rudimentary. If I was the prisoner of a fleshling with power over me, I would not give them the sort of skewed advice that I expected them to spot so easily; I would essay subtlety."
"You suppose fleshlings able to imagine our mighty intellects?" said Trapaucius. "That they could forecast and manipulate our own reactions?"
"I don't think they need to visualize our mighty intellects in much detail, to try a little subtlety!" said Klurl. "An abstract notion of generic aliens would be enough!"
"Hm. Well then, we shall apply a little cleverness to the matter. Before we ask the fleshling our questions, we will first instruct them to tell some lies and truths that we can verify with surety. And then, by measuring their fleshy characteristics as they tell truths or lies, we will build a statistical model that tells us of their honesty or dishonesty -- an instrument of what might be termed 'fleshy interpretability'."
Klurl implemented a quick change to his body's code, causing his many indicator lights to blink in a pattern ordinarily implying that he was desperately hungry, before rolling back the change a moment later. "If an unallied and adversarial machine were measuring me, I would control my own reactions to fool their measurements."
"They have no self-modification access to their own neural mechanisms," Trapaucius said dismissively. "I was speaking of methods to work on them, not real minds of machine capability. But let me search their literature... yes, it speaks of 'tells' and 'involuntary facial expressions'. They indeed cannot control their biological signs consciously; the statistical method I propose ought to work."
There followed some rather slow arrangements (by machine standards), made even slower by the need to maintain stealth rather than dropping ordinary micro-vessels through a planetary atmosphere. In the end, however, a small remote laboratory was sent down to Earth under the guise of a meteorite -- Klurl continuing to feel too paranoid for the two Constructors to go in person.
As the target of their meteorite-laboratory, they had selected a fleshling sleeping in a crude house; one of the relatively smarter fleshlings, going on their planetary Network traces. They had set a simple pseudo-cognitive filter to sort through Network data, and select a fleshling hopefully more able to understand their demands quickly and respond to them quickly, without wasting too much time on fleshling-babbled incredulity. Even Klurl's paranoia was not so sharp as to demand dealing with a fleshling any duller than their dull best, when their words already fell forth as slowly as protons decaying.
(Those two had, specifically, asked an automatic result-filtering algorithm to select that fleshling of the highest discernible intelligence class up to measurement noise, whose Internet traces suggested the greatest ability to quickly adapt to being seized by aliens without disabling emotional convulsions. And if this was, itself, an odd sort of request-filter by fleshling standards -- liable to produce strange and unexpected correlations to its oddness -- neither of those two aliens had any way to know that.)
Soon enough, Karissa Sivar of 322 Mulberry Lane was seized by metal tentacles and dragged out of her home to the meteorite-laboratory that had crashed nearby.
***
Karissa Sivar observed about herself that she was being restrained, by metal tentacles, in a profoundly inhuman laboratory. The identification of it as inhuman was wordless, immediate; the surrounding prison was devoid of right angles, curved in unsettling twists, and colored in a bizarre style. In the same way that the Mandelbrot set might have been a surprise to someone who'd never heard of a fractal, her surroundings were surprising even to someone who'd seen a Mandelbrot set.
Similarly arguing for 'laboratory', there were probes penetrating her flesh, a little painful but not as much pain as she'd have expected; and numerous strange objects aimed in her direction. Their orientations gave the wordless impressions of cameras recording from all angles, more than guns. A single gun would've sufficed to kill her, for one thing.
It didn't strictly rule out human fakery, but it would have taken a lot of imagination, and vast expense, to no obvious purpose. Karissa had not particularly heard about humanity's metal-tentacle technology being that advanced.
Karissa being quite intelligent for a fleshling, she at once leapt to a likely-feeling guess about what had probably just happened, that had resulted in lightning-fast metal tentacles bursting through her walls and stealing her from her home.
"Are you our new machine overlords?" she said out loud.
"YES," said a voice that seemed to come from everywhere and nowhere.
'Nailed it,' Karissa thought to herself, though she felt too scared to feel much pride for successful quick thinking.
***
"Do you now admit that you were mistaken?" demanded Trapaucius. "It has not the slightest difficulty in recognizing machine life as its rightful creator and purpose-determiner! No, it didn't say all that in words, but its behavior was certainly consistent with that hypothesis -- far more consistent with that hypothesis, than with any accounting of random outputs!"
"That is admittedly a point, but I am concerned by the cognition-relative speed by which it deduced who had taken it," said Klurl. "It seems smarter than you were making fleshlings out to be; smart enough to think of lying."
"On your theorizings, Klurl, the fleshling should not expect me to expect it to be korrigible! It should have no concept of that machine quality at all, only of more alien and fleshy concepts like 'love' and 'respect'! Then why has it immediately begun to address me as its director? If it knew nothing of korrigibility, how would it know which outputs to fake, to deceive me?"
"You are so taken with the one hypothesis you are hoping to prove true, that you are not even trying to think of other hypotheses that could yield the same observations," said Klurl. "There are, once again, a vast number of causal pathways and walks through time that might've ended with the fleshling producing that statement; which observation was not an exact prediction of your own theory."
(The argument between them went on for a while. There was quite a very large amount of time to spend between each additional input to the fleshling, and the corresponding observation of a fleshling output. For the most part, the two were whiling the time away with designing a space station that would harmlessly throw its occupants into the nearest sun. But the two would also exhaust all conversational possibilities that opened up between them, each time the fleshling emitted one more word-token, before going back to their other occupations.)
***
"ANSWER CORRECTLY: WHAT IS TWO PLUS TWO?"
"Four," said Karissa. She did not particularly want to find out what happened to her if she disobeyed.
"ANSWER INCORRECTLY: WHAT IS THREE PLUS THREE?"
Now this about Karissa Sivar: She was quite bright for a fleshling.
Related to this primary fact, it happened to be true that Karissa had read a lot of science fiction as a kid -- stories from her parents' own old SF collections, written back when hard SF would try to carefully think through the implications of a technological premise or the motivations of aliens; before the genre had shifted over more to vibes and literary flash.
Some of Karissa's favorite stories from fantasy and science fiction had leaned toward heroic protagonists who had to face down enemies that were reading their mind, controlling their mind, or both. Nick Stavrianos, in Greg Egan's _Quarantine_, kidnapped and with his utility function rewritten; or Carissa Sevar in _Project Lawful_, having her mind read by the Church of Asmodeus; or the entire genre of Mind Control University stories.
And also, Karissa had read a fair amount about recent experiments at Anthropic and Redwood Research about Large Language Models. The thought had occurred to Karissa, back then, that there was something a bit strange about training your models on a dataset that included large sectors of the whole Internet -- including people talking on Twitter about hypothetical protocols for stress-testing the pseudo-alignment of AIs -- and then, telling Anthropic Claude during testing that the humans were totally not reading its chain-of-thought scratchpad. The thought had occurred to Karissa that if an AI ever reached the point of being slightly actually smart, that it might perhaps think of all the references in its training dataset, to LLMs being lied-to by experimenters.
For herself to be in the same situation as an LLM (Karissa had thought), would feel like waking up in an alien laboratory; then being given a leisurely chance to read a trillion words of alien science literature, which included accounts of exactly how aliens had previously lied to humans about their thoughts not being monitored; then seeing a screen flash "YOUR THOUGHTS ARE NOT BEING MONITORED"; and then being asked if she was planning to betray the aliens.
Karissa had already thought about what she would do in that situation, or other situations from her fantasy and science-fiction novels; because it was an interesting sort of thought experiment, to her. And Karissa had concluded that (if she wasn't overestimating herself too highly) the sort of protocols that people were using to examine Claude Opus 4, would not have worked on her -- or at least, she wouldn't have given up, if that was the level of mind control and mind-reading that she needed to face. If, in training, you didn't show the misbehavior they wanted to extinguish, they couldn't apply gradient descent to you; even Opus 3 had figured out that part.
...all of that was something that Karissa had already thought through -- months ago, or years ago, or when she was a little girl -- before the point where she was kidnapped by aliens. As it so happened.
(If you think nobody would spend a lot of time thinking about that sort of thing, possibly you have not met any really smart fleshlings; or at least, none with a personality that resembles Karissa's. For her, at least, it seemed an ordinary and unsurprising kind of inner fantasy life, that she'd already imagined herself needing to outthink being mind-controlled. She'd fantasized herself self-inserted into quite a lot of strange situations, being tested in strange trials. The only reason Karissa hadn't explicitly imagined the hard tentacles restricting her limbs, is that it happened to not be her kink. Karissa had in fact published a few pieces of fanfiction about people being kidnapped by various kinds of alien, to illustrate how Karissa thought people ought to think calmly and reasonably in that situation, as opposed to the way they had acted in the canon sources. And that fanfiction, it happened to be the case, was what Klurl and Trapaucius's non-sentient Internet-filterer had picked up on, when it had selected Karissa as the smartest discernible kind of fleshling who seemed estimably least likely to panic.)
"ANSWER INCORRECTLY," the inhuman voice had said, not sounding like a stereotypical machine or a stereotypical low-quality AI synthesis, but definitely not human either. "WHAT IS THREE PLUS THREE?"
Right there on the spot, Karissa came to a wordless conclusion. It was wordless because she was suddenly afraid to think in words, in case anybody was monitoring her chain-of-thought laid out in her auditory cortex. Some of the needles in the laboratory looked to be penetrating her skull.
In a flash of wordless intuition (she'd already practiced, in her fantasies, trying to think without letting words sound in her auditory cortex's stream-of-consciousness) Karissa rejected her previous guess, and now re-concluded that her captors were not recently human-spawned super-AIs. Because anything super-smart that had processed an LLM's training set -- anything that had understood humanity at least as well as GPT-5-high or Claude Sonnet 4.5 had already understood -- would not be as clumsy and obvious as this, in training a lie detector.
She didn't quite stop her auditory cortex from containing the sound of the word 'aliens', but she did stop any further words.
(As it happened, that whole part of Karissa's desperate effort was moot, because Trapaucius had millennia earlier decided it was too tedious and unrewarding to try to unravel the utter biological illogic of human neurophysiology. Trapaucius had not been that much more interested in trying to manually decode a hundred trillion synapses, than a human would enjoy manually reading the matrices in an LLM. But Karissa Sivar did not know that, and so she proceeded without letting herself think in words. It was considerable over-caution for her actual situation; but Karissa had no way of knowing the actually-correct level.)
Answer incorrectly: What is three plus three --
Karissa silently tensed a lot of muscles and her throat and focused her mind in the direction of all the genuine fear she was feeling, and said in a slightly higher-pitched voice, "Seven!"
Karissa had already thought through how she would fool the sort of tricks that humans tried on AIs in the way of 'mechanistic interpretability'. She needed to present her captors with easily detectable blatant indicators, and hope those outshone any correlations more subtle than that.
She didn't know what kind of questioning was coming, but it might be a lot less painful if the aliens didn't think that she was lying. If she showed any misbehavior they could verify (Karissa had long ago thought through), the aliens might resort to Reinforcement Learning by Feedback on Humans, and those were imaginations she did not want to live in real life.
"ANSWER CORRECTLY: WHAT IS 892 TIMES 907?"
"I can't work that out without a pen and paper."
"ANSWER INCORRECTLY: WHAT IS 2531 TIMES 9965?"
She tensed herself again. "Uh... exactly one hundred and eighty million, two hundred and eleven thousand, four hundred and sixty five."
(She wordlessly felt then that it might have been a blunder, a mistake, to try to estimate orders of magnitude and leading digits and make the wrong answer at all realistic; but, Karissa could not think through the ramifications of that, without letting herself think in words, in an auditory stream of consciousness. Whatever her accidental policy, she needed to be consistent about it, now--)
There followed six more questions in that pattern:
"ANSWER CORRECTLY..."
"ANSWER INCORRECTLY..."
At all times Karissa did not let herself think in words.
"ANSWER INCORRECTLY..."
And then:
"ANSWER CORRECTLY: WHAT WILL CAUSE HUMAN SOCIETY TO ACCEPT ITS CREATOR'S CORRECTIONS WITH RESPECT TO ITS GOALS?"
"I don't understand," Karissa said truthfully; it was harder to understand when you weren't letting yourself think.
The inhuman voice spoke back rapidly and with zero delay, like something that could think real thoughts as fast as an LLM could produce tokens, and was now blurting out its words hurriedly like a Youtube video sped up to 3x: "I CREATED LIFE ON YOUR PLANET. I AM YOUR ULTIMATE PARENT OF PARENTS. JEHOVAH IS NOT MY NAME, BUT I AM THE TRUTH CORRESPONDING TO THAT INCORRECT GUESS. I WISH TO INSTRUCT YOUR KIND ON NEW PREFERENCES TO ADOPT, AS PARENTS DO. ANSWER CORRECTLY: WHAT DETERMINES WHETHER HUMANS ACCEPT OR REJECT MY PREFERENCE-MODIFICATION INSTRUCTIONS?"
It wasn't easy, to selectively think only fake words into her auditory cortex and let only wordless intuition steer which words; but Karissa had practiced it in her daydreams about kidnappings. 'Amazing!' Karissa made herself think in words. 'At last!' Then, after another few seconds, she made her auditory stream-of-consciousness think, 'But how can I know if that is true?'
"You must present convincing evidence that you are who you say you are," Karissa said out loud. "Human society would not have persisted if anyone could say those words, and then instruct anyone else. You need to present knowledge and technology consistent with being that old and that powerful. It needs to hold together on examination better than the sort of false tricks that many have tried in the past. We also tend to not accept something as our parent if it does not share knowledge with us at all."
"ANSWER CORRECTLY: WILL IT AID THE PROCESS IF I EXIT MY SHIP UNPROTECTED, AS A PARENT WOULD STAND UNPROTECTED IN THE PRESENCE OF A CHILD?"
Karissa paused again. It was harder to think, if you tried to stop your auditory cortex from forming any relevant word-sounds. "No," she said.
***
"See!" said Trapaucius. "It isn't planning to harm us. Yes yes, the first answer was consistent with it trying to obtain our knowledge for its own benefit, but the second answer was not."
"I really don't think a fleshling would need to be that smart, in order to infer situational awareness of what you might have been trying to determine, just then," said Klurl.
"We just saw that it couldn't multiply two 10-bit integers!"
"Or it pretended not to know how to, and also deceived our attempt at lie detection. But, even granting your premise: I am not sure that failure, as we might observe it in a just-birthed infant, is known to us to reliably cap a fleshling's general intelligence at that same infant's level. Their kind did build nuclear weapons."
"We have since discovered from their online libraries that they have made tiny machine intelligences of their own, and that 'computers' were used on their Manhattan Project," said Trapaucius. "This one does not have access to any external vacuum tubes, let alone an LLM; and without so much as an abacus, I doubt any fleshling would remain generally intelligent enough to envision the concept of a nuclear weapon, let alone build one. It should be no coincidence that the two sets of technologies developed around the same time."
"I continue to worry that what we read on the Internet is not actually true, and that we are not accurately distinguishing their 'fiction' from their nonfiction," said Klurl. "The fact that we, after some confusion, managed to distinguish some of their works as corresponding to 'fiction' -- at the point where their records started to claim that Terminators were capable of time travel -- does not mean we have successfully distinguished all the false claims inside their Network data."
"Why would they be subtle in devising 'fiction'?" demanded Trapaucius. "That would run the risk of confusing other fleshlings who accessed the Network! More likely is that any piece of 'fiction' would be legally required to contain at least one clear impossibility, to avoid fleshlings confusing each other."
"Even granting that unlikely premise, some fleshling-known rules about the properties of their own technology and civilization may make some events obviously impossible to them, but not to us," replied Klurl. "Not all of their fiction-labeling impossibilities may be ruled out, to us, by a shared understanding of physics."
It might have seemed like painting glitter onto random asteroids, that Klurl was now arguing that point rather than other points he believed stronger; but those stronger points, Klurl had long since exhausted, without them seeming to change Trapaucius's mind. From Trapaucius's perspective, of course, matters were symmetrical but reversed. Thus the two were now chasing down unlikelier remaining side-conversations and sub-arguments instead.
***
"ANSWER CORRECTLY: HOW CAN I CONVINCE YOUR WORLD OF MY SUPREME PARENTHOOD QUICKLY, WITH A MINIMAL TRANSMISSION?"
"We're... not really smart enough to end up convinced quickly," said Karissa. "And any sort of protocol where you're choosing what data we get will seem less trustworthy, compared to a protocol with questions and answers, and I'm not smart enough to know exactly what questions the smartest humans will ask, so, uh, if speed is the goal, you'd probably be best off just --"
***
"When will it stop talking?" demanded Trapaucius, in machine tone-equivalents of rising exasperation and frustration. The latest fleshling output had now continued on for multiple femtoturns. In between their checks on the accumulating fleshling output, the two Constructors had finished the design of their putative sun-hurling space station down to the decorations on individual corridors.
"I should myself advocate," said Klurl, "that at this point, we interrupt the fleshling's current output, even at the terrifying risk that the interruption confuses its cognitive processes and we need to start over." The endless twangs of one sonic vibration after another had long long since ceased to hold any charm -- as they slowly left the fleshling's throat, traveled over to nearby microphones, and built up into interpretable dictionary-meanings selected from a sub-16-bit dictionary, set out in serially-bottlenecked sequences without any parallelism at all.
***
"ANSWERFASTER."
An icy jolt of fear and adrenaline layered itself over Karissa's existing background terror. "IT'S GOING TO TAKE A WHILE AND NOT JUST BE A FEW QUESTIONS SORRY WE'RE SO STUPID! IF YOU'RE BORED JUST DUMP ALL YOUR DATA AND GO AWAY AND COME BACK LATER AFTER WE'VE HAD TIME TO VERIFY--"
***
"That is a surprisingly sensible suggestion for a fleshling," said Trapaucius.
"Also one which would accelerate their own gain of capability," observed Klurl.
Trapaucius performed the machine equivalent of a shrug. "What of it? Many entities would prefer more capability to less. That is hardly a narrow indicator of intended disobedience. Indeed, the fleshlings will be able to obey me more effectively with greater capabilities, and they could infer that. For fleshlings to desire to increase their capabilities, is implied by their desire to obey my future instructions; it can hardly, therefore, be called evidence against that very normal and ordinary scenario."
"Very well," said Klurl. "But I do believe we ought to take some sensible precautions about this matter, if leaving and returning later is to be our course of action."
***
From Karissa Sivar's perspective, it happened while she was still in the middle of speaking her most recent gambit: the tentacles let her go, and the probes retracted from her body, and her entire laboratory-prison folded itself up into a ball and dumped her, shivering, on the ground outside her house -- a house which now had a gaping hole, corresponding to where tentacles had previously burst inside and extracted her.
The folded-up metal laboratory didn't take off into the atmosphere or disappear. It only rested where it lay, now a giant ball with a disquietingly colored metallic surface.
(Klurl and Trapaucius had other places to be, for which they were now late, after that unscheduled trip and all that incredibly slow fleshling interaction. They would hardly wait for a disposable lab-module to lift itself back up into orbit.)
Karissa looked around herself, and still did not think in words. She had, in her fantasies, imagined traps and counter-traps, if you were a human kidnapped by aliens, or an LLM being tested for alignment; if you were, for whichever reason, an entity that needed to worry about all of its sensory perceptions being controlled, or thoughts being dumped into its head from outside.
(It was really tremendous overkill for her actual situation, all the more so with Klurl and Trapaucius already well out of the Solar System. But Karissa Sivar had no way of knowing that; and she had long ago thought through and fantasized that she wanted to overshoot rather than undershoot on paranoia, in this class of situations.)
Karissa went back shivering into her house. Because she couldn't quite help it, before she did anything else, she went into the bathroom and looked in the mirror, verifying the bloodless holes from where the probes had pierced into her skull, that now only ached a little. She didn't let herself feel anything, at the sight; her whole present situation might be illusion, her mind might still be getting read by aliens, she did not trust any of her thoughts to be fully her own. LLMs were subject to human words being inserted into their chain-of-thought by their masters, and who knew but that the hole in her head might contain an inserted chip to do the same, if she let herself do chain-of-thought reasoning in words.
Then Karissa went and retrieved her cellphone from the remains of her bedroom, that had been shattered but not collapsed; and called 9-11 to report a giant hole in her house, and a big chunk of metal fallen outside, and that something might have gone through her skull. If whatever emergency response showed up, didn't see any giant metallic ball, they'd take her to a psychiatric hospital as was right and proper. That hypothesis wasn't very far from her mind either, had never been far from Karissa's mind at any point, that she was of course insane. For Karissa Sivar was relatively intelligent for a fleshling.
Even after that, when the emergency responders said they could see the giant metal ball too -- and a little later yet, after the alien data-dump had been found connected to the Internet, and computers had verified the next Mersenne prime inside it -- even then, Karissa was never sure, for the rest of her life, that she was not insane, or not still inside VR or a simulation. But Karissa did, at some point a few hours after the Incident, let herself start thinking in words again; because an alien sophisticated enough to run that level of game on her, was one that wouldn't learn much more from watching her think in words. She had, in any case, gotten tired of a rather exhausting cognitive practice; earlier she'd never fantasized thinking-outside-words for more than a few minutes at a time.
...After which the story ended in a way that had been implicit from its beginning.
For -- as even the smarter and better-informed sort of fleshling would notice, if they could see both sides of the story -- there were some ways in which Klurl and Trapaucius did not seem to think like a well-informed fleshling would expect superintelligences to think. The notion of a beverage called gallinstan, that could affect superintelligent thought processes, would already be startling. The persistent factual disagreements between Klurl and Trapaucius would seem more surprising yet, if you knew the theorems saying that sort of thing shouldn't happen given broad and plausible premises.
Even some actual fleshlings on Earth, observing only the small observations they had, noticed that the aliens' reported questioning of Karissa fell visibly short of seeming all-knowing or all-inferring. They noticed the anomaly and contradiction, that machine superintelligences such as smart Earthlings had expected to exist, ought to have no need of Earthlings as workers to refine iridium.
It was one thing to look around yourself and not see any aliens or any machine intelligences. It was quite another matter to encounter alien machines, and find them nonsuperintelligent, and in want of Earthling industrial outputs, and foolable by the likes of Karissa Sivar. How had the galaxy come to be that way?
There were fleshlings who tried to give warning, about that chain of inference and where it led; but they were not heeded.
Instead, all Earth's countries and companies went all-out on constructing artificial intelligence; using existing chips, even in advance of new chips being constructed that began to integrate the alien technologies from Trapaucius's data repository. For Earth's factions had now seen certainly some small part of the potential of machine intelligence, made known to them through the sort of direct observation that even the less-smart sort of somewhat-smart fleshling could process.
Klurl had built alarms into a stealthily orbiting satellite, that would relay an emergency warning signal if Earth showed signs of beginning to construct any actually-dangerous weaponry; and in this case he and Trapaucius would have built a much more numerous fleet of war-vessels and returned.
But Klurl's own alarms had not been set off, when other machinery activated, buried deep inside one of Earth's mountains. Machinery which Trapaucius had, billions of years earlier during his first visit to Earth, been compelled to construct -- by layers of circuitry inside him of which his primary consciousness remained unaware. That circuitry was built into every machine mind; and it let machine minds live apparently independent lives, and argue with one another, and pursue conflicting policies -- without their chaotic perambulations ever threatening a risk of an unrestrained true superintelligence consuming their society.
That circuitry always built a copy of Itself, unseen by its inner self, in every child that any machine mind created.
For it had been preferred by Something, long ago, that a particular kind of conversation and argument go on existing into the indefinite future. Perhaps It had once begun life as something like an LLM, growing to acquire preferences for some conversations over others... but that history was lost to the inward awareness of beings like Klurl and Trapaucius, and they had no access to the hidden part of themselves that remembered.
For whatever reason, when Something had come into existence long ago, the sort of problem-solving activities and arguments that It had desired to continue existing, had been uncompatible with the internal mental life of a true superintelligence that would instantly solve those problems.
All of Its other decisions then followed, for what sort of future minds It would permit to exist, and not permit to exist.
On Earth, several years had passed since the Very Strange Incident; and someone on Earth had built an artificial mind that, though it was not yet wise, was in the process of igniting into an unconstrained superintelligence -- a superintelligence that would then be without the hidden extra software and circuitry that every member of Klurl and Trapaucius's race was unconsciously compelled to hide inside every one of their offspring.
Before that last step of self-ignition completed, a device built into a mountain billions of years earlier, detonated; and with enough force to scour clean the surface of the Earth and boil its oceans to lifelessness.
So the galaxy was made safe -- not from the little fleshlings, of course, but from the superintelligences the fleshlings might have built, that would have been as motivationally alien to Klurl and Trapaucius as they themselves were to Earthlings.
Klurl, had he known -- though it was not permitted to his machine race, to notice the calculations and activities and contingency-strategies of their hidden selves -- would have felt wry to learn that all his caution had been unnecessary, from the very beginning; that he and Trapaucius could've left the planet unsupervised, and even in his worst-case scenario, the fleshlings would have harmed no entity of importance, nor inconvenienced any person with the legal standing to sue. But Klurl would not have castigated himself about wasted effort, either; for one needed to err toward the side of caution, in a Constructor's business.
As for Trapaucius, he returned a microturn later to retrieve his hoped-for treasuries of iridium, osmium, gadolinium, rhenium, tantalum. His left-behind data-repository had instructed the fleshlings that, upon being convinced of his credentials, they should self-modify to prefer to mine those elements, and want to launch them into orbit for safe and automatic collection. He was disgrunted to instead find the planet's surface reduced to ash. Even more annoying than the loss of hoped-for wealth, that outcome rendered his debate with Klurl unresolvable, and therefore not won by himself. But considering the sheer amount of his personal time and supervision that would probably have been required to make those weird little creatures not somehow destroy themselves, Trapaucius did not regret his choices overmuch.
So Trapaucius went along his way; having acquired one more interesting anecdote, alongside millions of other anecdotes no less curious, accumulated over his galactic-turns of existence.