It's not clear that Bentham would advocate eradicating those species. There could very well be utilitarian value in keeping a species around, just at reduced population counts. In your alien example, I think you could plausibly argue that it'd be good if the aliens reduced the suffering human population to a lower number, until we were advanced enough to be on-net happy. Or if having a larger suffering population would be good because it would speed up technological progress, that would be an important disanalogy between your thought experiment and the wild animal case.
Living beings have some kind of adjustable happiness baseline level. Making someone happy isn't as simple as triggering their pleasure centres all the time and making someone not unhappy isn't as simple as preventing their pain centres to ever be triggered (even if this means destroying them).
The argument also doesn't rely on any of this? It just relies on it being possible to compare the value of two different world-states.
There are some additional reasons, beyond the question of which values would be embedded in the AGI systems, to not prefer AGI development in China, that I haven't seen mentioned here:
I think it's also very rare that people are actually faced with a choice between "AGI in the US" versus "AGI in China". A more accurate but still flawed model of the choice people are sometimes faced with is "AGI in the US" versus "AGI in the US and in China", or even "AGI in the US, and in China 6-12 months later" versus "AGI in the US, and in China 3-6 months later".
The view shared by Hanania in 2024 that Trump would be reined in by others seems less solid now: whereas during his first term Trump’s top economic adviser Gary Cohn allegedly twice stole major trade-related documents off of his desk, nothing like that seems to be happening now.
This is an aside, but yesterday the WSJ reported something like this happening:
On April 9, financial markets were going haywire. Treasury Secretary Scott Bessent and Commerce Secretary Howard Lutnick wanted President Trump to put a pause on his aggressive global tariff plan. But there was a big obstacle: Peter Navarro, Trump’s tariff-loving trade adviser, who was constantly hovering around the Oval Office.
Navarro isn’t one to back down during policy debates and had stridently urged Trump to keep tariffs in place, even as corporate chieftains and other advisers urged him to relent. And Navarro had been regularly around the Oval Office since Trump’s “Liberation Day” event.
So that morning, when Navarro was scheduled to meet with economic adviser Kevin Hassett in a different part of the White House, Bessent and Lutnick made their move, according to multiple people familiar with the intervention.
They rushed to the Oval Office to see Trump and propose a pause on some of the tariffs—without Navarro there to argue or push back. They knew they had a tight window. The meeting with Bessent and Lutnick wasn’t on Trump’s schedule.
The two men convinced Trump of the strategy to pause some of the tariffs and to announce it immediately to calm the markets. They stayed until Trump tapped out a Truth Social post, which surprised Navarro, according to one of the people familiar with the episode. Bessent and press secretary Karoline Leavitt almost immediately went to the cameras outside the White House to make a public announcement.
This is a great post, thanks for writing it. I agree that, when it comes to creative endeavours, there's just no "there" there with current AI systems. They just don't "get it". I'm reminded of this tweet:
Mark Cummins: After using Deep Research for a while, I finally get the “it’s just slop” complaint people have about AI art.
Because I don’t care much about art, most AI art seems pretty good to me. But information is something where I’m much closer to a connoisseur, and Deep Research is just nowhere near a good human output. It’s not useless, I think maybe ~20% of the time I get something I’m satisfied with. Even then, there’s this kind of hall-of-mirrors quality to the output, I can’t fully trust it, it’s subtly distorted. I feel like I’m wading through epistemic pollution.
Obviously it’s going to improve, and probably quite rapidly. If it read 10x more sources, thought 100x longer, and had 1000x lower error rate, I think that would do it. So no huge leap required, just turning some knobs, it’s definitely going to get there. But at the same time, it’s quite jarring to me that a large fraction of people already find the outputs compelling.
As someone who does care about art, and has, I think, discerning taste, I have always kind of felt this, and only when I read the above tweet did I realise that not everyone felt what I felt. When Sam Altman tweeted that story, which seemed to impress some people and inspire disgust/ridicule from others, the division became even clearer.
I think with Deep Research the slop is actually not as much of a problem -- you can just treat it as a web search on steroids and can always jump into the cited sources to verify things. And for similar reasons, it seems true that if DR "read 10x more sources, thought 100x longer, and had 1000x lower error rate", it could be as good as me at doing bounded investigations. For the hardest bits needed for AI to generate genuinely good creative fiction, it feels less obvious whether the same type of predictable progress will happen.
I think I'm less sure than you that the problem has to do with attractor basins, though. That does feel part of or related to the problem, but I think a larger issue is that chatbots are not coherent enough. Good art has a sort of underlying internal logic to it, which even if you do not notice it contributes to making the artwork feel like a unified whole. Chatbots don't do that, they are too all over the place.
I think this article far overstates the extent to which these AI policy orgs (maybe with the exception of MIRI? but I don’t think so) are working towards an AI pause, or see the goal of policy/regulation as slowing AI development. (I mean policy orgs, not advocacy orgs.) I see as much more common policy objectives: creating transparency around AI development, directing R&D towards safety research, laying groundwork for international agreements, slowing Chinese AI development, etc. — things that (is the hope) are useful on their own, not because of any effect on timelines.
On the advice of @adamShimi, I recently read Hasok Chang's Inventing Temperature. The book is terrific and full of deep ideas, many of which relate in interesting ways to AI safety. What follows are some thoughts on that relationship, from someone who is not an AI safety researcher and only somewhat follows developments there, and who probably got one or two things wrong.
(Definitions: By "operationalizing", I mean "giving a concept meaning by describing it in terms of measurable or closer-to-measurable operations", whereas "abstracting" means "removing properties in the description of an object".)
There has been discussion on LessWrong about the relative value of abstract work on AI safety (e.g., agent foundations) versus concrete work on AI safety (e.g., mechanistic interpretability, prosaic alignment). Proponents of abstract work argue roughly that general mathematical models of AI systems are useful or essential for understanding risks, especially coming from not-yet-existing systems like superintelligences. Proponents of concrete work argue roughly that safety work is more relevant when empirically grounded and subjected to rapid feedback loops. (Note: The abstract-concrete distinction is similar to, but different from, the distinction between applied and basic safety research.)
As someone who has done neither, I think we need both. We need abstract work because we need to build safety mechanisms using generalizable concepts, so that we can be confident that the mechanisms apply to new AI systems and new situations. We need concrete work because we must operationalize the abstract concepts in order to measure them and apply them to actually existing systems. And finally we need work that connects the abstract concepts to the concrete concepts, to see that they are coherent and for each to justify the other.
Chang writes:
The dichotomy between the abstract and the concrete has been enormously helpful in clarifying my thinking at the earlier stages, but I can now afford to be more sophisticated. What we really have is a continuum, or at least a stepwise sequence, between the most abstract and the most concrete. This means that the operationalization of a very abstract concept can proceed step by step, and so can the building-up of a concept from concrete operations. And it may be beneficial to move only a little bit at a time up and down the ladder of abstraction.
Take for example the concept of (capacity for) , i.e., the degree to which an AI system can be corrected or shut down. The recent alignment faking paper showed that, in experiments, Claude would sometimes "pretend" to change its behavior when it was ostensibly being trained with new alignment criteria, while not actually changing its behavior. That's an interesting and important result. But (channeling Bridgman) we can only be confident that it applies to the concrete concept of corrigibility measured by the operations used in the experiments -- we have no guarantees that it holds for some abstract corrigibility, or when corrigibility is measured using another set of operations or under other circumstances.
An interesting case study discussed in the book is the development of the abstract concept of temperature by Lord Kelvin (the artist formerly known as William Thomson) in collaboration with James Prescott Joule (of conservation of energy fame). Thomson defined his abstract temperature in terms of work and pressure (which were themselves abstract and needed to be operationalized). He based his definition on the Carnot cycle, an idealized process performed by the theoretical Carnot heat engine. The Carnot heat engine was inspired by actual heat engines, but was fully theoretical -- there was no physical Carnot heat engine that could be used in experiments. In other words, the operationalization of temperature that Thomson invented using the Carnot cycle was an intermediate step that required further operationalization before Thomson's abstract temperature could be connected with experimental data. Chang suggests that, while the Carnot engine was never necessary for developing an abstract concept of temperature, it did help Thomson achieve that feat.
Ok, back to AI safety. So above I said that, for the whole AI thing to go well, we probably need progress on both abstract and concrete AI safety concepts, as well as work to bridge the two. But where should research effort be spent on the margin?
You may think abstract work is useless because it has no error-correcting mechanism when it is not trying to, or is not close to being able to, operationalize its abstract concepts. If it is not grounded in any measurable quantities, it can't be empirically validated. On the other hand, many abstract concepts (such as corrigibility) still make sense today and are currently being studied in the concrete (though they have not yet been connected to fully abstract concepts) despite being formulated before AI systems looked much like they do today.
You may think concrete work is useless because AI changes so quickly that the operations used to measure things today will soon be irrelevant, or more pertinently perhaps, because the superintelligent systems we truly need to align are presumably vastly different from today's AI systems, in their behavior if not in their architecture. In that way, AI is quite different from temperature. The physical nature of temperature is constant in space and time -- if you measure temperature with a specific set of operations (measurement tools and procedures), you would expect the same outcomes regardless of which century or country you do it in -- whereas the properties of AI change rapidly over time and across architectures. On the other hand, timelines seem short, such that AGI may share many similarities with today's AI systems, and it is possible to build abstractions gradually on top of concrete operations.
There is in fact an example from the history of thermometry of extending concrete concepts to new environments without recourse to abstract concepts. In the 18th century, scientists realized that the mercury and air thermometers used then behaved very differently, or could not be used at all due to freezing and melting, for very low and very high temperatures. While they had an intuitive notion that some abstract temperature ought to apply across all degrees of heat or cold, their operationalized temperatures clearly only applied to a limited range of heat and cold. To solve this, they eventually developed different sets of operations for the measurement of temperatures in extreme ranges. For example, Josiah Wedgwood measured very high temperatures in ovens by baking standardized clay cylinders and measuring how much they'd shrunk. These different operations, which yielded measurements of temperature on different scales, were then connected by measuring temperature for both scales (using different operations) in an overlapping range and lining those up. All this was done without an abstract theory of temperature, and while the resulting scale was not on very solid theoretical ground, it was good enough to provide practical value.
Of course, the issue with superintelligence is that, because of e.g., deceptive alignment and gradient hacking, we want trustworthy safety mechanisms and alignment techniques in place well before the system has finished training. That's why we want to tie those techniques to abstract concepts which we are confident will generalize well. But I have no idea what the appropriate resource allocation is across these different levels of abstraction.[1] Maybe what I want to suggest is that abstract and concrete work is complementary and should strive towards one another. But maybe that's what people have been doing all along?
Hmm, if the Taiwan tariff announcement caused the NVIDIA stock crash, then why did Apple stock (which should be similarly impacted by those tariffs) go up that day? I think DeepSeek -- as illogical as it is -- is the better explanation.
Those diffusion regulations were projected by Nvidia to not have a substantive impact on their bottom line in their official financial statement.
I don't think that's true? AFAIK there's no requirement for companies to report material impact on an 8-K form. In a sense, the fact that NVIDIA even filed an 8-K form is a signal that the diffusion rule is significant for their business -- which it obviously is, though it's not clear whether the impact will be substantially material. I think we have to wait for their 10-Q/10-K filings to see what NVIDIA signals to investors, since there I do think it is the case that they'd need to report expected material impact.
You talk later about evolution to be selfish; not only is the story for humans is far more complicated (why do humans often offer an even split in the ultimatum game?), but also humans talk a nicer game than they act (See construal level theory, or social-desirability bias.). Once you start looking at AI agents who have similar affordances and incentives that humans have, I think you'll see a lot of the same behaviors.
Some people have looked at this, sorta:
I think I'd guess roughly that, "Claude is probably more altruistic and cooperative than the median Western human, most other models are probably about the same, or a bit worse, in these simulated scenarios". But of course a major difference here is that the LLMs don't actually have anything on the line -- they don't stand to earn or lose any money, for example, and if they did, they would have nothing to do with the money. So you might expect them to be more altruistic and cooperative than they would under the conditions humans are tested.
Nonsense feels too strong to me? That seems like the type of thing we should be pretty uncertain about -- it's not like we have lots of good evidence either way on meta-ethics that we can use to validate or disprove these theories. I'd be curious what your reasoning is here? Something like a person-affecting view?
This seems like a different point than the one I responded to (which is fine obviously), but though I share the general intuition that it'd make sense for life in the wild to be roughly neutral on the whole, I think there are also some reasons to be skeptical of that view.
First, I don't see any strong positive reason why evolution should make sure it isn't the case that "they experienced nothing but pain and fear and stress all the time". It's not like evolution "cares" whether animals feel a lot more pain and stress than they feel pleasure and contentment, or vice versa. And it seems like animals -- like humans -- could function just as well if their lives were 90% bad experiences and 10% good experiences, as with a 50/50 split. They'd be unhappy of course, but they'd still get all the relevant directional feedback from various stimuli.
Second, I think humans generally don't feel that intense pleasure (e.g., orgasms or early jhanas) is more preferable than intense pain (e.g., from sudden injury or chronic disease) is dispreferable. (Cf. when we are in truly intense pain nothing else matters than making the pain go away.) But if we observe wild animals, they probably experience pain more often than pleasure, just based on the situations they're in. E.g., disease, predation, and starvation seem pretty common in the animal kingdom, whereas sexual pleasure seems pretty rare (almost always tied to reproduction).
Third and relatedly, from an evolutionary perspective, bad events are typically more bad (for the animal's reproductive fitness) than good events are good. For example, being eaten alive and suffering severe injury means you're ~0% likely to carry on your genes, whereas finding food and mating doesn't make you 100% likely to carry on your genes. So there's an asymmetry. That would be a reason for evolution to make negative experiences more intense than positive experiences. And many animals are at risk of predation and disease continuously through their lives, whereas they may only have relatively few opportunities for e.g., mating or seeing the births of their offspring.
Fourth, most animals follow r-selection strategies, producing many offspring of which only a few survive. Evolution probably wouldn't optimize for those non-surviving offspring to have well-tuned valence systems, and so they could plausibly just be living very short lives of deprivation and soon death.
I agree.