TIL Eliezer said that he refuses to read Ted Kaczynski's (aka the Unabomber's) 1995 manifesto Industrial Society and Its Future because "audience should not be a reward for crime", referring to the former mathematician's mail bombing campaign that took the lives of 3 people and injured 23 more.
The ≈35,000 word manifesto was published by the Washington Post under the threat of him killing more people should they refuse, and its publication was encouraged by the FBI to produce new leads. His brother recognized his writing style, which led to Kaczynski's arrest and subsequent life imprisonment, thus concluding the longest and most expensive manhunt in FBI history.
Rejecting the strategy of garnering attention by means of domestic terrorism is an understandable heuristic, but it's worth noting that the consensus seems to be that Industrial Society and Its Future stands as a serious piece of political philosophy worth engaging with despite its origins.
It even had something to say about AI, where he predicts phenomena that we're discussing here, 30 years later:
...First let us postulate that the computer scientists succeed in developing intelligent machines that can do all things better than
The West's effort to offset the massive strategic advantages of a Russia-India-China axis (demographics, manufacturing capacity, energy) might result in doubling down on the AI+robotics edge they currently enjoy. China not being far off in terms of capabilities might create additional pressures. I'm concerned that recent ideas surrounding global/multilateral AI governance and alignment (e.g. "Consensus-1") might be thwarted by geopolitics.
Ideologies formed from people interacting with AIs might be the beginning of "AI escaping the datacentres" via memetics.
Surprisingly, AI researchers are like Leninists in a number of important ways.
In their story, they're part of the vanguard working to bring about the utopia.
The complexity inherent to their project justifies their special status, and legitimizes their disregard of the people's concerns, which are dismissed as unenlightened.
Detractors are framed as too unsophisticated to understand how unpopular or painful measures are actually in their long-term interest, or a necessary consequence of a teleological inevitability underlying all of history.
Arguments of the form "group A is like bad group B in ways x, y, and z" seem bad. When the argument has merit, it's because x, y, or z is bad, and then you can reduce it to "group A has property x, which is bad", which is a better way of saying it.
These examples are about paternalism, which is a property of Leninists, AI researchers, global health charities, governments, strategy consultants, civil engineers, and your mom (I checked). My preference is that paternalism should require some very strong justification, especially when it's about overriding the preferences of others, as opposed to helping them get what they want in a way they don't understand. I agree that this situation looks more like the bad kind of paternalism.
Roughly this analogy was explored at length in the possessed machines. It seems pretty interesting, although I onlly looked at this summary This was about the Bolshevik's that were Leninist's spiritual forebears. It's written by an anonymous person from a major lab I believe, so I think it might capture some of that ethos that I don't understand.
Independent researchers have a lot less of that orientation. I'd say most of us don't want to rush toward the glorious future at current cost. That would both probably get us all killed, and with an embarrassing lack of Dignity (and roughly, the advantages that come with following at least some virtue ethics). Although it certainly is tempting ;)
"AI Parasitism" Leads to Enhanced Capabilities
People losing their minds after having certain interactions with their chatbots leads to discussions about it on the internet, which makes its way into the training data. It paints a picture of human cognitive vulnerabilities, which could be exploited.
It looks to me like open discussions about alignment failures of this type thus indirectly feed into capabilities. This will hold so long as the alignment failures aren't catastrophic enough to outweigh the incentives to build more powerful AI systems.
Tired of making sense of exponents? Introducing: the mol FLOP!
Simply divide the size of a training run by Avogadro's constant. Some examples:
Bonus: The ballpark equivalent water volume for each, mapping 1 FLOP to 1 water molecule,
I see that the "International Treaties on AI" idea takes heavy inspiration from nuclear arms control agreements. However, in these discussions, nuclear arms control is usually pictured as a kind of solved problem, a thing of the past.
I think the validity of this heroic narrative arc that human civilization, faced with the existential threat of nuclear annihilation, came together and neatly contained the problem is dubious.
In the grand scheme of things, nuclear weapons are still young. They're still here and still very much threatening; just because we stop...
The loss function of Capital approaches something like heroin via the creation of goods that generate strong and inelastic demand by exploiting vulnerabilities in your neurology.
Consider the sociology of violence in the AI risk/doom memeplex.
It seeks to leverage the state's power to accomplish its objectives (e.g. a ban on further capabilities research) using (the threat of) violence. Beyond that, violence is explicitly rejected.
This contrasts with other memeplexes that resorted to violence which was not legitimized by the state they operated in, including the American and Bolshevik revolutions, pro-democracy/independence movements, and religious/race riots. Furthermore, all of these examples share the apparent quality of fighting...
If you can plausibly live off your capital (especially due to stock/options at AI companies), unless you consider higher-order social and economic risks (which are uncertain), the impact of AI on the job market is probably not as concerning to you as it is to the majority population.
Most people have exactly one economic value-generating asset, which is their ability to work. To the extent that you own capital (especially in AI companies), you are more or less, or completely insulated from having to reckon with the consequences of personally being forced into a permabroke underclass because of your labour value going to zero soon.
About the notion of "mildly" superintelligent AI. How about the following typology of ASI:
I surmise that the accuracy of AI filters (the kind used in schools/academia) will diminish over time because people absorb and use the speech patterns (e.g. "This is not X. It's Y") of their chatbots as the fraction of their interactions with it grows relative to that of their interactions with other people.
In fact, their interactions with other people might enhance the speech patterns as well, since these people probably also interact with chatbots and are thus undergoing the same process.
The big picture is that AI is becoming an increasingly power...
Have the applications of AI post-2013 been a net negative for humanity? Apart from some broadly beneficial things like AlphaFold, it seems to me that much of the economic value of AI has been in aligning humans to consume more by making them stay glued to one or another platform.
Given superintelligence, what happens next depends on the success of the alignment project. The two options:
"The AI does things that I personally approve of" as an alignment target with reference to everybody and their values is actually easier to hit than one might think.
It doesn't require ethics to be solved; it can be achieved by engineering your approval.
It might be impossible for you to tell which of these two post-ASI worlds you find yourself in.
Moltbook: SubredditSimulator reloaded, or another step towards Actually Something Incomprehensible?
The idea of GPUs that don't run unless they phone home and regularly receive some cryptographic verification seems hopeless to me. It's not like the entire GPU architecture can be encrypted, and certainly not in a way that can't be decrypted with a single received key after which a rogue actor can just run away with it. Thus the only possible implementation of this idea seems to be the hardware equivalent of "if (keyNotReceived) shutDown()", which can simply be bypassed. Maybe one of the advanced open source models could even help someone do that...
Suicide occupies a strange place in agent theory. It is the one goal whose attainment is not only impossible to observe, but whose attainment hinges on the impossibility of it being observed by the agent.
In some cases, this is resolved by a transfer of agency to the thing for whom the agent is in fact a sub-agent and is itself experiencing selective pressure, e.g. in the case of the beehive observing the altruistic suicide of an individual bee defending it. This behaviour disappears once the sub-agent experiences selective pressures that are independent fr...
I had a dream about an LLM that had a sufficiently powerful predictive model of me that it was able to accurately prompt itself using my own line of thinking before I could verbalize it. The self-generated prompts even factored in my surprise at the situation.
When I woke up, I wondered whether this made sense. After all, the addition of the term in the Chinchilla scaling law implies a baseline unpredictability in language, which tracks with our warm wetware having some inherent entropy.
I posit that is on average far lower in the h...
When I asked Claude Opus 4.5 "What was the Incan economy like?", I accidentally "encrypted" the prompt by typing it out with Ukrainian keyboard settings, resulting in Cyrillic gibberish. Claude immediately picked up on this and decoded the message in its chain of thought, dutifully answering my intended query. I can't imagine any human responding like this! It seems to me that most people would be genuinely confused, and the small minority of those who might have an idea of what's going on would presumably still ask for clarification. Even if someone were ...
To the extent that AI has been used to optimize human behaviour (for things like retention time and engagement) for just over a decade now and continues to get better at it, "gradual disempowerment" stops looking like a hypothetical future scenario and more like something we're currently living through. This tracks with mental illness and ADHD rates increasing over the same time period.
What are some reasons to believe that Rice's theorem doesn't doom the AI alignment project by virtue of making it impossible to verify alignment, independent of how it is defined/formalized?
It seems to me that Rice's theorem implies that it is impossible for there to be an "isAligned" function to verify an AI's alignment, independent of how you define alignment.