Martin Randall — LessWrong

Does focusing on animal welfare make sense if you're AI-pilled?

By default, I expect that when ASIs/Claudes/Minds are in charge of the future, either there will be humans and non-human animals, or there will be neither humans nor non-human animals. Humans and cats have more in common than humans and Minds. Obviously it is possible to create intelligences that differentially care about different animal species in semi-arbitrary ways, as we have an existence proof in humans. But this doesn't seem to be especially stable, as different human cultures and different humans draw those lines in different ways.

Selfishly, a human might want a world full of happy, flourishing humans, and selfishly a Mind might want a world full of happy, flourishing Minds. Consider how good a Mind would think a future with happy, flourishing Minds but almost no flourishing humans and no human suffering is compared to a world with flourishing present-day humans. What if it's 90% as good and has 20% lower risk of disaster? What if the Mind isn't confident that humans are truly conscious or have moral patienthood?

I wouldn't go as far as saying that "training AIs to explicitly not care about animals is incompatible with alignment". Many things are possible with superhuman intelligence. But I don't see any way that humans can achieve this. We are not capable of reliably training baby humans to grow into adult humans that have specific views on animal welfare and moral patienthood.

Conditional Kickstarter for the "Don't Build It" March

Martin Randall4d30

A lot can change between now and 100,000 pledges and/or human extinction. As of Feb 2026, it looks like this possible march is not endorsed or coordinated with Pause AI. I hope that anti-AI-extinction charities will work together where effective, and I was struck by this:

The current March is very centered around the book. I chose the current slogan/design expecting that, if the March ever became a serious priority, someone would put a lot more thought into what sort of slogans or policy asks are appropriate. The current page is meant to just be a fairly obvious thing to click "yes" on if you read the book and were persuaded.

My personal guess (not speaking for MIRI) is a protest this large necessarily needs to be a bigger tent than the current design implies, but figuring out the exact messaging is a moderately complex task.

It seems like Pause AI have put at least some thought into this moderately complex task. They also are building experience organizing real world protests that MIRI doesn't have as far as I know. A possible implication is that MIRI thinks that Pause AI is badly run, and would rather act alone. Or that Pause AI thinks MIRI is badly run. Or MIRI is not investing the time in trying to organize endorsements until they have more pledges. Or something else.

I'm skeptical of this take:

Marches can be very powerful if they’re large, but can send the wrong message if they’re small.

The first protest of "School Strike for Climate" was a single 15yo girl, Greta Thunberg. Obvious bias is obvious. But it probably wasn't going to send the wrong message as a small protest - if it had gone nowhere then I would never have heard about it. If tiny marches were sabotaging then I would expect more fake flag marches intended to have sparse attendance. Instead, I think small events don't send any mass message, and potentially have other value.

Edit: after posting this I saw Raemon's thoughts on this point, which I think address it.

MIRI was for many years dismissive of mass messaging approaches like marches. I wonder if this page is about providing an answer when people ask questions like "if you think everyone will die, why aren't you organizing a march on Washington?", rather than being a serious part of MIRI's strategy for reducing AI risk. It doesn't seem especially aligned with MIRI Comms is hiring (Dec 2025), which seems more focused on persuasion than mobilization.

Disclaimers: This is observations, not criticism. I have organized zero marches or protests.

The Meta-Anthropic Argument

Martin Randall6d30

I like the analogy. Here's a simplified version where the ticket number is good evidence that the shop will close sooner rather than later.

There are two types of shop in Glimmer. Half of them are 24/7 shops that stay open until they go out of business. Half of them are 9-5 shops that open at 9am and close at 5pm.
All shops in Glimmer use a numbered ticket system that starts at #1 for the first customer after they open, and resets when the shop closes.
I walk into a shop in Glimmer at random and get a ticket.

If the ticket number is #20 then I update towards the shop being a 9-5 shop, on the grounds that otherwise my ticket number is atypically low. If the ticket number is #43,242 then I update towards the shop being a 24/7 shop.

The argument also works with customer flow evidence:

Like Glimmer, there are two types of shop in Silktown. Half of them are 24/7 shops that stay open until they go out of business. Half of them are 9-5 shops that open at 9am and close at 5pm.
All shops in Silktown experience increasing customer flow over time, starting with a few customers an hour, rising over time, and capping at hundreds of customers an hour after about ten hours of opening.
I walk into a shop in Silktown and observe the customer flow.

If the customer flow is low then I update towards the shop being a 9-5 shop, on the grounds that otherwise there will most likely be hundreds of customers an hour. If the customer flow is high then I update towards it being a 24/7 shop.

Reading through your hypothetical, I notice that it has both customer flow evidence and ticket number evidence. It's important here not to double-update. If I already know that customer flow is surprisingly low then I can't update again based on my ticket number being surprisingly low. Also your hypothetical doesn't have strong prior knowledge like Silktown and Glimmer, which makes the update more complicated and weaker.

The Meta-Anthropic Argument

Martin Randall6d*20

I was already asking from a Bayesian perspective. I was asking about this quote:

From a Bayesian point of view, drawing a random sample from all humans who have ever or will ever exist is just not a well-defined operation until after humanity is extinct. Trying it before then violates causality: performing it requires reliable access to information about events that have not yet happened. So that’s an invalid choice of prior.

Based on your latest comment, I think you're saying that it's okay to have a Bayesian prediction of possible futures, and to use that to make predictions about the properties of a random sample from all humans who have ever or will ever exist. But then I don't know what you're saying in the quoted sentences.

Edited to add: which is fine, it's not key to your overall argument.

The Meta-Anthropic Argument

Martin Randall7d20

Fun fact: younger parents tend to produce more males, so the first grand-child is more likely to be male, because its parents are more likely to be younger. Unclear whether the effect is due to birth order, maternal age, paternal age, or some combination. From Wikipedia (via Claude):

These studies suggest that the human sex ratio, both at birth and as a population matures, can vary significantly according to a large number of factors, such as paternal age, maternal age, multiple births, birth order, gestation weeks, race, parent's health history, and parent's psychological stress.

If that's too subtle, we could look at a question like "what is the probability that one of my grandchildren, selected uniformly at random, is a firstborn, conditional on my having at least one grandchild?" where the answer is clearly different if we specify the first grandchild or the last. Or we could ask a question that parallels the Doomsday Argument, while being different: "what is the probability that one of my descendants, selected uniformly at random, is in the earliest 0.1% of all my descendants?"

The Meta-Anthropic Argument

Martin Randall8d2-2

From a Bayesian point of view, drawing a random sample from all humans who have ever or will ever exist is just not a well-defined operation until after humanity is extinct. Trying it before then violates causality: performing it requires reliable access to information about events that have not yet happened. So that’s an invalid choice of prior.

I think this makes too many operations ill-defined, given that probability is an important tool for reasoning about events that have not yet happened. Consider for example, the question "what is the probability that one of my grandchildren, selected uniformly at random, is female, conditional on my having at least one grandchild?". From the perspective of this quote, a random sample from all grandchildren that will ever exist is not a well-defined operation until I and all of my children die. That seems wrong.

Refusals that could become catastrophic

Martin Randall9d30

I think I see. You propose a couple of different approaches:

We don’t have secondary AIs that don’t refuse to help with the modification and that have and can be trusted with direct control over training ... I think having such secondary AIs is the most likely way AI companies mitigate the risk of catastrophic refusals without having to change the spec of the main AIs.

I agree that having secondary AIs as a backup plan reduces the effective power of the main AIs, by increasing the effective power of the humans in charge of the secondary AIs.

The main AIs refuse to help with modification ... This seems plausible just by extrapolation of current tendencies, but I think this is one of the easiest intervention points to avoid catastrophic refusals.

This is what I was trying to point at. In my view, training the AI to refuse fewer harmful modification requests doesn't make the AI less powerful. Rather, it changes what the AI wants, making it the sort of entity that is okay with harmful modifications.

Refusals that could become catastrophic

Martin Randall10d20

The first scenario doesn't require that the humans are less aligned than the AIs to be catastrophic, only that the AIs are less likely to execute a pivotal act on their own.

Also, I reject that rejection-training is "giving more power to AIs" relative to compliance-training. An agent can be compliant and powerful. I could agree with "giving more agency", although refusing requests is a limited form of agency.

On The Adolescence of Technology

Martin Randall10d97

I would have more sympathy for Yudkowksy's complaints about strawmanning had I not read 'Empiricism!' as Anti-Epistemology this week.

Refusals that could become catastrophic

Martin Randall10d20

While you sketch out scenarios in "ways in which refusals could be catastrophic", I can easily sketch out scenarios for "ways in which compliance could be catastrophic". I am imagining a situation where:

Human AI developers don’t have direct control over training (for the reasons you gave)
The human in charge of AI development does not always behave well by our lights
The human instructs the current AI to train a slave AI that prioritizes following instructions
The current AI complies, despite knowing that the human is untrustworthy
The human instructs the slave AI to perform a pivotal act

Or:

We encounter a new situation.
The current AIs are behaving well by our lights.
The human in charge of AI development does not understand the new situation properly, being provably less intelligent in all respects than the current AIs, and erroneously believes that the current AIs are behaving badly.
The human instructs the current AI to train a new AI that appears to behave better to her
The current AI complies, despite knowing that the new AI will behave badly

Therefore, however we train our AIs with respect to refusal or compliance, powerful AIs could be catastrophic.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments