If AI agents with unusual values would for a long time be mostly interested in promoting them through means other than lying in wait and taking over the world, is important because...AIs pursuing this strategy are much more visible than those hiding in wait deceptively. We might less expect AI scheming.

AIs showing bits of unintended motives in experiments or deployment would be a valuable piece of evidence re scheming risk, but such behavior would be trained against, pushing scheming behavior out towards the tails of takeover/escape with the power to resist modification. The tendency of human institutions to retrain or replace AIs to human preferences pushes towards misaligned AIs having ~0 or very high power.

The catastrophic error IMO is:

Five years from when you open your account there are options for taking gains out tax-free even if you're not 59.5 yet. You can take "substantially equal periodic payments", but there are also ones for various kinds of hardship.

For Roth you mostly can't take out gains tax-free. The hardship ones are limited, and SEPP doesn't let you access much of it early. The big ones of Roth conversions and just eating the 10% penalty only work for pretax.

[As an aside Roth accounts are worse for most people vs pretax for multiple reasons, e.g. pretax comes with an option of converting or withdrawing in low income years at low tax rates. More details here.]

In #1 if you start with $100k then it's $200k at the time you convert, and you pay $48k (24%) in taxes leaving you with $152k in your Roth 401k. It grows to $198k, you withdraw $152k and you have $46k of gains in your Roth 401k.

You pay taxes on the amount you convert, either from outside funds or withdrawals to you. If you convert $X you owe taxes on that as ordinary income, so you can convert $200k and pay $48k in taxes from outside funds. This makes pretax better.
Re your assumptions, they are not great for an AI-pilled saver. Someone who believes in short AI timelines should probably be investing in AI if they don't have decisive deontological objections. NVDA is up 20x in the last 5 years, OpenAI even more. On the way to a singularity AI investments will probably more than 10x again unless it's a surprise in the next few years as Daniel K argues in comments. So their 401k should be ~all earnings, and they may have a hard time staying in the low tax brackets you use (moreso at withdrawal time than contribution time) if they save a lot. The top federal tax rates are 37% for ordinary income and 23.8% for capital gains.

Paying the top federal income tax rate plus penalties means a 47% tax rate on early withdrawals from the Roth vs 23.8% from taxable. I.e. every dollar kept outside the the Roth is worth 44% more if you won't be using the account after 59.5. That's a wild difference from the standard Roth withdrawal case where there's a 0% tax rate.

A substantially larger percentage in Roth than the probability you are around to use it and care about it after 59.5 looks bad to me. From the perspective of someone expecting AI soon this advice could significantly hurt them in a way that the post obscured.


This post seems catastrophically wrong to me because of its use of a Roth 401k as an example, instead of a pre-tax account. Following it could create an annoying problem of locked-up funds.

Roth earnings become tax free at 59.5. Before that, even if you use SEPP to do withdrawals without penalties you still have to pay taxes on the withdrawn earnings (some of which are your principal because of inflation). And those taxes are ordinary income rates, which top out much higher than long term capital gains tax rates. Further, the SEPP withdrawals are spaced out to reflect your whole lifetime according to actuarial tables, so if TEOTAWKI is in 10 years and the life tables have you space out your SEPP withdrawals over 40 years, then you can only access a minority of your money in that time.

For a pretax 401k where you contribute when you have a high income, the situation is very different: you get an upfront ordinary income tax deduction when you contribute, you don't get worse tax treatment by missing out on LTCG rates. And you can rollover to a Roth IRA (paying taxes on the conversion) and then access the amount converted penalty-free in 5 years (although that would trap some earnings in the Roth) or just withdraw early and pay the 10% penalty (which can be overcome by tax-free growth benefits earlier, or withdrawing in low income years).

I'm 41.5, so it's 18 years to access my Roth balances without paying ordinary taxes on the earnings (which are most of the account balances). I treat those funds as insurance against the possibility of a collapse of AI progress or blowup of other accounts, but I prefer pre-tax contributions over Roth ones now because of my expectation that probably there will be an AI capabilities explosion well before I reach 59.5. If I had all or most of my assets in Roth accounts it would be terrible.

This is pretty right for pretax individual accounts (401ks may not let you do early withdrawal until you leave), for Roth accounts that have accumulated earnings early withdrawal means paying ordinary taxes on the earnings, so you missed out on LTCG rates in addition to the 10% penalty.

(My perennial uncertainty is: AI 1 can straightforwardly send source code / model weights / whatever to AI 2, but how can AI 1 prove to AI 2 that this file is actually its real source code / model weights / whatever? There might be a good answer, I dunno.)

They can jointly and transparently construct an AI 3 from scratch motivated to further their deal, and then visibly hand over their physical resources to it, taking turns with small amounts in iterated fashion.

AI 3 can also be given access to secrets of AI 1 and AI 2 to verify their claims without handing over sensitive data.

Regarding making AIs motivated to have accurate beliefs, you can make agents that do planning and RL on organizing better predictions, e.g. AIs whose only innate drives/training signal (beside short-run data modeling, as with LLM pretraining) are doing well in comprehensive forecasting tournaments/prediction markets, or implementing reasoning that scores well on various classifiers built based on habits of reasoning that drive good performance in prediction problems, even against adversarial pressures (AIs required to follow the heuristics have a harder time believing or arguing for false beliefs even when optimized to do so under the constraints).

Even if you're an anarchist who thinks taxation is theft, to say willful nonpayment of taxes to donate is effective altruism is absurd, the consequences of this are just obviously very bad, both the idea and the advocacy. One publicized case of a person willfully refusing to pay their taxes in the name of effective altruism can do much more damage to it than many such people donating a bit more, and even if a particular case is invisible, the general practice is visible (Newcomb issues). Consider how much damage SBF and FTX have done to the causes of effective altruism, pandemic prevention, AI safety. There are billions of dollars committed to effective charity, and thousands of people trying to do good effectively, and people tying commonsense wrongdoing to it with crazy rationales has a serious damaging multiplier effect on the whole.

Any dollar donated through this method is in expectation going to cost multiple dollars worth of similar donations (plausibly a huge number) equivalent through such damage. It would be much better for the world if tax scofflaws were spending their taxes due on gambling or alcohol rather than effective altruism.

I disagree, from my experience of engaging with the public debate, doubt is mostly about AI capability, not about misalignment. Most people easily believe AI to be misaligned to them, but they have trouble believing it will be powerful enough to take over the world any time soon. I don't think alignment research will do that much here.

I would say that the power of AI will continue to visibly massively expand (although underestimation of further developments will continue to be a big problem), but that will increase both 'fear AI disaster' and 'get AI first' elements. My read is that that the former is in a very difficult position now when its policy recommendations conflict with the latter. I see this in the Congressional hearings and rejection of the pause letter.

Even if experts would agree that increasing the power of the aligned AI is good and necessary, and that expansion in space would be required for that, I think it will take a long time to convince the general public and/or decision makers, if it's at all possible. And in any remotely democratic alignment plan, that's a necessary step.

When that kind of AI is available, it would mean by the same token that such expansion could break down MAD in short order as such explosive growth could give the power to safely disarm international rivals if not matched or stopped. And AI systems and developers will be able to demonstrate this. So the options would be verifying/trusting deals with geopolitical and ideological rivals to hold back or doing fast AI/industrial expansion. If dealmaking fails, then all options would look scary and abrupt.

I think the assumption that safe, aligned AI can't defend against a later introduction of misaligned AI is false, or rather depends on the assumption of profound alignment failures so that the 'aligned AI' really isn't. AI that is aligned enough to do AI research and operate industry and security forces can expand its capabilities to the technological frontier and grow an industrial base claiming unclaimed resources in space. Then any later AI introduced faces an insurmountable balance of capabilities just from the gap in resources, even if it catches up technologically.  That would not violate the sovereignty of any state, although it could be seen as a violation of the Outer Space Treaty if not backed by the international community with treaty revision.

Advanced AI-enabled tech and industry can block bioweapons completely through physical barriers, detection, and sterilization. Vast wealth can find with high probability any zero-days that could  be discovered with tiny wealth, and produce ultra-secure systems, so cyberattacks do not produce a vulnerable world. Even nuclear weapons lose their MAD element in the face of millions of drones/interceptors/defenses for each attempted attack (and humans can move to a distance in space, back up their minds, etc).

If it turns out there is something like the ability to create a vacuum collapse that enables  one small actor to destroy a much larger AI-empowered civilization, then the vast civilization will find out first, and could safely enforce a ban if a negotiated AI-enforced  treaty could not be struck.

If I understand correctly memes about pivotal acts to stop anyone from making misaligned AI stem from the view that we won't be able to make AI that could be trusted to undergo intelligence explosion and industrial expansion for a long time after AI could enable some other 'pivotal act.' I.e. the necessity for enforcing a ban even after AGI development is essentially entirely about failures of technical alignment.

Furthermore, the biggest barrier to extreme regulatory measures like a ban is doubt (both reasonable and unreasonable) about the magnitude of misalignment risk, so research that studies and demonstrates high risk (if it is present) is perhaps the most leveraged possible tool to change the regulatory/governmental situation.

No. Short version is that the prior for the combination of technologies and motives for aliens (and worse for magic, etc) is very low, and the evidence distribution is familiar from deep dives in multiple bogus fields (including parapsychology, imaginary social science phenomena, and others), with understandable data-generating processes so not much likelihood ratio.

