All of MichaelStJules's Comments + Replies


I would say experiments, introspection and consideration of cases in humans have pretty convincingly established the dissociation between the types of welfare (e.g. see my section on it, although I didn't go into a lot of detail), but they are highly interrelated and often or even typically build on each other like you suggest.

I'd add that the fact that they sometimes dissociate seems morally important, because it makes it more ambiguous what's best for someone if multiple types seem to matter, and there are possible beings with some types but not others.

If someone wants to establish probabilities, they should be more systematic, and, for example, use reference classes. It seems to me that there's been little of this for AI risk arguments in the community, but more in the past few years.

Maybe reference classes are kinds of analogies, but more systematic and so less prone to motivated selection? If so, then it seems hard to forecast without "analogies" of some kind. Still, reference classes are better. On the other hand, even with reference classes, we have the problem of deciding which reference class to u... (read more)

There's also a decent amount of call option volume+interest at strike prices of $17.5, $20, $22.5, $25, (same links as the comment I'm replying to) which suggests to me that the market is expecting lower upside on successful merger than you. The current price is about $15.8/share, so $17.5 is only +10% and $25 is only +58%.

There's also of course volume+interest for call option at higher strike prices, $27.5, $30, $32.5.

I think this also suggests the market-implied odds calculations giving ~40% to successful merger are wrong, because the expected upside is overestimated.  The market-implied odds are higher.

From, for calculating the market-implied odds:

Author's analysis - assumed break price of $5 for Hawaiian and $6 for Spirit.


  • Without a merger, Spirit may be financially distressed based on recent operating results. There's some risk that Spirit can't continue as a going concern without a merger.
  • Even if JetBlue prevails in court, there is some risk that the deal is recut as the offer was made in a much more favorable environment for airlines, though clauses in the merger agreement may prevent this.

So maybe you're overestimating t... (read more)

Unless I'm misreading, it looks like there's a bunch of volume+interest in put options with strike prices of around $5, but little volume+interest in options with lower strike prices (some in $2.50, but much less). $5.5 for January 5th, $5 for January 19th, $5 for February 16th. Much more volume+interest for put options in general for Feb 16th. So if we take those seriously and I'm not misunderstanding, the market expects a chance it'll drop below $5 per share, so a drop of at least ~70%.

There's more volume+interest in put options with strike prices of $7.... (read more)

There's also a decent amount of call option volume+interest at strike prices of $17.5, $20, $22.5, $25, (same links as the comment I'm replying to) which suggests to me that the market is expecting lower upside on successful merger than you. The current price is about $15.8/share, so $17.5 is only +10% and $25 is only +58%. There's also of course volume+interest for call option at higher strike prices, $27.5, $30, $32.5. I think this also suggests the market-implied odds calculations giving ~40% to successful merger are wrong, because the expected upside is overestimated.  The market-implied odds are higher.
From, for calculating the market-implied odds: also: So maybe you're overestimating the upside?   From

Why is the downside only -60%?

Good book value. It might trade under book but its presumably not going to zero when it has decent book value. 

Why think this is underpriced by the markets?

As you know I don't find the EMH consistently true. The argument for why its more than ~40% to go through are linked.

I would be surprised if iguanas find things meaningful that humans don't find meaningful, but maybe they desire some things pretty alien to us. I'm also not sure they find anything meaningful at all, but that depends on how we define meaningfulness.

Still, I think focusing on meaningfulness is also too limited. Iguanas find things important to them, meaningful or not. Desires, motivation, pleasure and suffering all assign some kind of importance to things.

In my view, either

  1. capacity for welfare is something we can measure and compare based on cognitive eff
... (read more)

I think that's true, but also pretty much the same as what many or most veg or reducetarian EAs did when they decided what diet to follow (and other non-food animal products to avoid), including what exceptions to allow. If the consideration of why not to murder counts as involving math, so does veganism for many or most EAs, contrary to Zvi's claim. Maybe some considered too few options or possible exceptions ahead of time, but that doesn't mean they didn't do any math.

This is also basically how I imagine rule consequentialism to work: you decide what rul... (read more)

Well, there could be ways to distinguish, but it could be like a dream, where much of your reasoning is extremely poor, but you're very confident in it anyway. Like maybe you believe that your loved ones in your dream saying the word "pizza" is overwhelming evidence of their consciousness and love for you. But if you investigated properly, you could find out they're not conscious. You just won't, because you'll never question it. If value is totally subjective and the accuracy of beliefs doesn't matter (as would seem to be the case on experientialist accou... (read more)

I think a small share of EAs would do the math before deciding whether or not to commit fraud or murder, or otherwise cause/risk involuntary harm to other people, and instead just rule it out immediately or never consider such options in the first place. Maybe that's a low bar, because the math is too obvious to do?

What other important ways would you want (or make sense for) EAs to be more deontological? More commitment to transparency and against PR?

2Ben Pace4mo
Ah come on. I am tempted to say "You're not a true Effective Altruist unless you've at least done the math." Rigorously questioning the foundations of strong moral rules like this one is surely a central part of being an ethical person.  Countries do murder all of the time in wars and by police. Should you be pushing really hard to get to an equilibrium where that isn't okay? There are boundaries here and you actually have to figure out which ones are right and which ones are wrong. What are the results of such policies? Do they net improve or hurt people? These are important questions to ask and do factor into my decisions, at least. Many countries have very different laws around what level of violence you're allowed to use to protect yourself from someone entering your property (like a robber). You can't just defer to "no murder", I do think you have to figure out for yourself what's right and wrong in this scenario. And there's math involved, as well as deontology.

Maximizing just for expected total pleasure, as a risk neutral classical utilitarian? Maybe being okay with killing everyone or letting everyone die (from AGI, say), as long as the expected payoff in total pleasure is high enough?

I don't really see a very plausible path for SBF to have ended up with enough power to do this, though. Money only buys you so much, against the US government and military, unless you can take them over. And I doubt SBF would destroy us with AGI if others weren't already going to.

Where I agree with classical utilitarianism is that we should compute goodness as a function of experience, rather than e.g. preferences or world states

Isn't this incompatible with caring about genuine meaning and fulfillment, rather than just feelings of them? For example, it's better for you to feel like you're doing more good than to actually do good. It's better to be put into an experience machine and be systematically mistaken about everything you care about, i.e. that the people you love even exist (are conscious, etc.) at all, even against your own... (read more)

The way you describe it you make it sound awful, but actually I think simulations are great and that you shouldn't think that there's a difference between being in a simulation and being in base reality (whatever that means). Simple argument: if there's no experiment that you could ever possibly do to distinguish between two situations, then I don't think that those two situations should be morally distinct.

And if emotionally significant social bonds don't count, it seems like we could be throwing away what humans typically find most important in their lives.

Of course, I think there are potentially important differences. I suspect humans tend to be willing to sacrifice or suffer much more for those they love than (almost?) all other animals. Grief also seems to affect humans more (longer, deeper), and it's totally absent in many animals.

On the other hand, I guess some other animals will fight to the death to protect their offspring. And some die apparently gr... (read more)

Interesting topic I think that unless we can find a specific causal relationship implying that the capacity to form social bonds increases overall well-being capacity, we should assume that attaching special importance to this capacity is merely a product of human bias. Humans typically assign an animal's capacity for wellbeing and meaningful experience based on a perceived overlap, or shared experience. As though humans are this circle in a Ven diagram, and the extent to which our circle overlaps with an iguana's circle is the extent to which that iguana has meaningful experience. I think this is clearly fallacious. An iguana has their own circle, maybe the circle is smaller, but there's a huge area of non-overlap that we can't just entirely discount because we're unable to relate to it. We can't define meaningful experience by how closely it resembles human experience.

Ya, I don't think utilitarian ethics is invalidated, it's just that we don't really have much reason to be utilitarian specifically anymore (not that there are necessarily much more compelling reasons for other views). Why sum welfare and not combine them some other way? I guess there's still direct intuition: two of a good thing is twice as good as just one of them. But I don't see how we could defend that or utilitarianism in general any further in a way that isn't question-begging and doesn't depend on arguments that undermine utilitarianism when genera... (read more)

The argument can be generalized without using infinite expectations, and instead using violations of Limitedness in Russell and Isaacs, 2021 or reckless preferences in Beckstead and Thomas, 2023. However, intuitively, it involves prospects that look like they should be infinitely valuable or undefinably valuable relative to the things they're made up of.  Any violation of (the countable extension of) the Archimedean Property/continuity is going to look like you have some kind of infinity.

The issue could just be a categorization thing. I don't thi... (read more)

Also, I'd say what I'm considering here isn't really "infinite ethics", or at least not what I understand infinite ethics to be, which is concerned with actual infinities, e.g. an infinite universe, infinitely long lives or infinite value. None of the arguments here assume such infinities, only infinitely many possible outcomes with finite (but unbounded) value.

3Garrett Baker5mo
The argument you made that I understood seemed to rest on allowing for an infinite expectation to occur, which seems pretty related to me to infinite ethics, though I'm no ethicist.

Thanks for the comment!

I don't understand this part of your argument. Can you explain how you imagine this proof working?

St Petersburg-like prospects (finite actual utility for each possible outcome, but infinite expected utility, or generalizations of them) violate extensions of each of these axioms to countably many possible outcomes:

  1. The continuity/Archimedean axiom: if A and B have finite expected utility, and A < B, there's no strict mixture of A and an infinite expected utility St Petersburg prospect, like 
... (read more)
Based on your explanation in this comment, it seems to me that St. Petersburg-like prospects don't actually invalidate utilitarian ethics as it would have been understood by e.g. Bentham, but it does contradict the existence of a real-valued utility function. It can still be true that welfare is the only thing that matters, and that the value of welfare aggregates linearly. It's not clear how to choose when a decision has multiple options with infinite expected utility (or an option that has infinite positive EV plus infinite negative EV), but I don't think these theorems imply that there cannot be any decision criterion that's consistent with the principles of utilitarianism. (At the same time, I don't know what the decision criterion would actually be.) Perhaps you could have a version of Bentham-esque utilitarianism that uses a real-valued utility function for finite values, and uses some other decision procedure for infinite values.

Possibly, but by limiting access to the arguments, you also limit the public case for it and engagement by skeptics. The views within the area will also probably further reflect self-selection for credulousness and deference over skepticism.

There must be less infohazardous arguments we can engage with. Or, maybe zero-knowledge proofs are somehow applicable. Or, we can select a mutually trusted skeptic (or set of skeptics) with relevant expertise to engage privately. Or, legally binding contracts to prevent sharing.

Eliezer's scenario uses atmospheric CHON. Also, I guess Eliezer used atmospheric CHON to allow the nanomachines to spread much more freely and aggressively.

Is 1% of the atmosphere way more than necessary to kill everything near the surface by attacking it?

Also, maybe we design scalable and efficient quantum computers with AI first, and an AGI uses those to simulate quantum chemistry more efficiently, e.g. Lloyd, 1996 and Zalka, 1996. But large quantum computers may still not be easily accessible. Hard to say.

High quality quantum chemistry simulations can take days or weeks to run, even on supercomputing clusters.

This doesn't seem very long for an AGI if they're patient and can do this undetected. Even months could be tolerable? And if the AGI keeps up with other AGI by self-improving to avoid being replaced, maybe even years. However, at years, there could be a race between the AGIs to take over, and we could see a bunch of them make attempts that are unlikely to succeed.

That's one simulation though. If you have to screen hundreds of candidate structures, and simulate every step of the process because you cannot run experiments, it becomes years of supercomputer time.
Also, maybe we design scalable and efficient quantum computers with AI first, and an AGI uses those to simulate quantum chemistry more efficiently, e.g. Lloyd, 1996 and Zalka, 1996. But large quantum computers may still not be easily accessible. Hard to say.

As a historical note and for further context, the diamondoid scenario is at least ~10 years old, outlined here by Eliezer, just not with the term "diamondoid bacteria":

The concrete illustration I often use is that a superintelligence asks itself what the fastest possible route is to increasing its real-world power, and then, rather than bothering with the digital counters that humans call money, the superintelligence solves the protein structure prediction problem, emails some DNA sequences to online peptide synthesis labs, and gets back a batch of protein

... (read more)

Are you thinking quantum computers specifically? IIRC, quantum computers can simulate quantum phenomena much more efficiently at scale than classical computers.

EDIT: For early proofs of efficient quantum simulation with quantum computers, see:

  1. Lloyd, 1996
  2. Zalka, 1996
I didn't think about QC. But the idea still holds: if runaway AI needs to hack of build advance QC to solve diamondoid problem, it will make it more vulnerable and observable. 

This is the more interesting and important claim to check to me. I think the barriers to engineering bacteria are much lower, but it’s not obvious that this will avoid detection and humans responding to the threat, or that timing and/or triggers in bacteria can be reliable enough.

Unfortunately, explaining exactly what kind of engineered bacteria could be dangerous is a rather serious infohazard.

Hmm, if A is simulating B with B's source code, couldn’t the simulated B find out it's being simulated and lie about its decisions or hide what its actual preferences? Or would its actual preferences be derivable from its weights or code directly without simulation?

I had a similar thought about "A is B" vs "B is A", but "A is the B" should reverse to "The B is A" and vice versa when the context is held constant and nothing changes the fact, because "is" implies that it's the present condition and "the" implies uniqueness. However, it might be trained on old and no longer correct writing or that includes quotes about past states of affairs. Some context might still be missing, too, e.g. for "A is the president", president of what? It would still be a correct inference to say "The president is A" in the same context, a... (read more)

p.37-38 in Goodsell, 2023 gives a better proposal, which is to clip/truncate the utilities into the range  and compare the expected clipped utilities in the limit as . This will still suffer from St Petersburg lottery problems, though.

Looking at Gustafsson, 2022's money pumps for completeness, the precaution principles he uses just seem pretty unintuitive to me. The idea seems to be that if you'll later face a decision situation where you can make a choice that makes you worse off but you can't make yourself better off by getting there, you should avoid the decision situation, even if it's entirely under your control to make a choice in that situation that won't leave you worse off. But, you can just make that choice that won't leave you worse off later instead of avoiding the situation... (read more)

I think a multi-step decision procedure would be better. Do what your preferences themselves tell you to do and rule out any options you can with them. If there are multiple remaining incomparable options, then apply your original policy to avoid money pumps.

Coming back to this, the policy

if I previously turned down some option X, I will not choose any option that I strictly disprefer to X

seems irrational to me if applied in general. Suppose I offer you  and , where both  and  are random, and  is ex ante preferable to , e.g. stochastically dominates , but has some chance of being worse than . You pick . Then you evaluate  to get . However, suppose you get unlucky, and  is worse than . Suppose f... (read more)

Ah yes, nice point. The policy should really be something like 'if I previously turned down some option X, then given that no uncertainty has been resolved in the meantime, I will not choose any option that I strictly disprefer to X.' An agent acting in accordance with that policy can trade y for X−. And I think that even agents acting in accordance with this restricted policy can avoid pursuing dominated strategies. As your case makes clear, these agents might end up with X− when they could have had X (because they got unlucky with Y yielding y). But although that's unfortunate for the agent, it doesn't put any pressure on the agent to revise its preferences.

Also, the estimate of the current number of researchers probably underestimates the number of people (or person-hours) who will work on AI safety. You should probably expect further growth to the number of people working on AI safety, because the topic is getting mainstream coverage and support, Hinton and Bengio have become advocates, and it's being pushed more in EA (funding, community building, career advice).

However, the FTX collapse is reason to believe there will be less funding going forward.

Some other possibilities that may be worth considering and can further reduce impact, at least for an individual looking to work on AI safety themself:

  1. Some work is net negative and increases the risk of doom or wastes the time and attention of people who could be doing more productive things.
  2. Practical limits on the number of people working at a time, e.g. funding, management/supervision capacity. This could mean some people could have much lower probability of making a difference, if them taking a position pushes someone else who would have out from the field, or into (possibly much) less useful work.
Also, the estimate of the current number of researchers probably underestimates the number of people (or person-hours) who will work on AI safety. You should probably expect further growth to the number of people working on AI safety, because the topic is getting mainstream coverage and support, Hinton and Bengio have become advocates, and it's being pushed more in EA (funding, community building, career advice). However, the FTX collapse is reason to believe there will be less funding going forward.

An AGI could give read and copy access to the code being run and the weights directly on the devices from which the AGI is communicating. That could still be a modified copy of the original and more powerful (or with many unmodified copies) AGI, though. So, the other side may need to track all of the copies, maybe even offline ones that would go online on some trigger or at some date.

Also, giving read and copy access could be dangerous to the AGI if it doesn't have copies elsewhere.

Some other discussion of his views on (animal) consciousness here (and in the comments).

My understanding from Eliezer's writing is that he's an illusionist (and/or a higher-order theorist) about consciousness. However, illusionism (and higher-order theories) are compatible with mammals and birds, at least, being conscious. It depends on the specifics.

I'm also an illusionist about consciousness and very sympathetic to the idea that some kinds of higher-order processes are required, but I do think mammals and birds, at least, are very probably conscious, and subject to consciousness illusions. My understanding is that Humphrey (Humphrey, 2022,&... (read more)

I think it's worth pointing out that from the POV of such ethical views, non-extinction could be an existential risk relative to extinction, or otherwise not that important (see also the asymmetric views in Thomas, 2022). If we assign some credence to those views, then we might instead focus more of our resources on avoiding harms without also (significantly) increasing extinction risks, perhaps especially reducing s-risks or the torture of sentient beings.

Furthermore, the more we reduce the risks of such harms, the less prone deontological (and other morally asymmetric) AI could be to aim for extinction.

The arguments typically require agents to make decisions independently of the parts of the decision tree in the past (or that are otherwise no longer accessible, in case they were ruled out). But an agent need not do that. An agent can always avoid getting money pumped by just following the policy of never picking an option that completes a money pump (or the policy of never making any trades, say). They can even do this with preference cycles.

Does this mean money pump arguments don't tell us anything? Such a policy may have other costs that an agent would... (read more)

See also EJT's comment here (and the rest of the thread). You'd just pick any one of the utility functions. You can also probably drop continuity for something weaker, as I point out in my reply there.

This is cool. I don't think violations of continuity are also in general exploitable, but I'd guess you should also be able to replace continuity with something weaker from Russell and Isaacs, 2020, just enough to rule out St. Petersburg-like lotteries, specifically any one of Countable Independence (which can also replace independence), the Extended Outcome Principle (which can also replace independence) or Limitedness, and then replace the real-valued utility functions with utility functions representable by "lexicographically ordered ordinal sequences of bounded real utilities".

EDIT: Looks like a similar point made here.


I wonder if we can "extend" utility maximization representation theorems to drop Completeness. There's already an extension to drop Continuity by using an ordinal-indexed vector (sequence) of real numbers, with entries sorted lexicographically ("lexicographically ordered ordinal sequences of bounded real utilities", Russell and Isaacs, 2020). If we drop Completeness, maybe we can still represent the order with a vector of independent but incomparable dimensions across which it must respect ex ante Pareto eff... (read more)

There's a recent survey of the general public's answers to "In your own words, what is consciousness?"

Consciousness: In your own words by Michael Graziano and Isaac Ray Christian, 2022


Surprisingly little is known about how the general public understands consciousness, yet information on common intuitions is crucial to discussions and theories of consciousness. We asked 202 members of the general public, “In your own words, what is consciousness?” and analyzed the frequencies with which different perspectives on consciousness were represented. Almo

... (read more)

I think it's more illustrative than anything, and a response to Robert Miles using chess against Magnus Carlsen as an analogy for humans vs AGI. The point is that a large enough material advantage can help someone win against a far smarter opponent. Somewhat more generally, I think arguments for AI risk often put intelligence on a pedestal, without addressing its limitations, including the physical resource disadvantages AGIs will plausibly face.

I agree that the specifics of chess probably aren't that helpful for informing AI risk estimates, and that a bet... (read more)

For my 2nd paragraph, I meant that the experiment would underestimate the required resource gap. Being down exactly by a queen at the start of a game is not as bad as being down exactly by a queen later into the game when there are fewer pieces overall left, because that's a larger relative gap in resources.

Would queen-odds games pass through roughly within-distribution game states, anyway, though?

Or, either way, if/when it does reach roughly within-distribution game states, the material advantage in relative terms will be much greater than just being down a queen early on, so the starting material advantage would still underestimate the real material advantage for a better trained AI.

Its clear that it was never optimized for odds games, therefore unless concrete evidence is presented, I doubt that @titotal actually played against a "superhuman system - which may explain why it won. There's definitely a ceiling to which intelligence will help - as the other guy mentioned, not even AIXI would be able to recover from an adversarially designed initial position for Tic-Tac-Toe.   But I'm highly skeptical OP has reached that ceiling for chess yet.
SF's ability to generalize across that distribution shift seems unclear. My intuition is that a starting position with queen odds is very off distribution because in training games where both players are very strong, large material imbalances only happen very late in the game. I'm confused by your 2nd paragraph. Do you think this experiment overestimates or underestimates resource gap required to overcome a given intelligence gap?

Why is it too late if it would take militaries to stop it? Couldn't the militaries stop it?

5Max H8mo
If an AI is smart enough that it takes a military force to stop it, the AI is probably also smart enough to avoid antagonizing that force, and / or hiding out in a way that a military can't find. Also, there are a lot of things that militaries and governments could do, if they had the will and ability to coordinate with each other effectively. What they would do is a different question. How many governments, when faced with even ironclad evidence of a rogue AI on the loose, would actually choose to intervene, and then do so in an effective way? My prediction is that many countries would find reasons or rationalizations not to take action at all, while others would get mired in disagreement and infighting, or fail to deploy their forces in an actually effective way. And that's before the AI itself has an opportunity to sow discord and / or form alliances. (Though again, I still think an AI that is at exactly the level where military power is relevant is a pretty narrow and unlikely band.)

There's also a similar interesting argument here, but I don't think you get a money pump out of it either:

Load More