Logan Zoellner

Wiki Contributions


>Now imagine that Musk gets in trouble with the government

Now image the same scenario but  Elon has not gotten in trouble with the government and multiple people (including those who fired him) have affirmed he did nothing wrong.

Does an "ordinary layman's understanding of actual history" include knowledge of how tanks are used in combined arms warfare to create breakthroughs in "blitzkrieg" style warfare?  Seems like "don't attack until you have tanks and a good idea of how to use them in coordination with infantry and artillery, and also don't antagonize America" is sufficient for near certain victory.

In contrast, suppose you have a strong and knowledgeable multimodal predictor trained on all data humanity has available to it that can output arbitrary strings. Then apply extreme optimization pressure for never losing at chess. Now, the boundaries of the space in which the AI operates are much broader, and the kinds of behaviorist "values" the AI can have are far less constrained. It has the ability to route through the world, and with extreme optimization, it seems likely that it will.

"If we build AI in this particular way, it will be dangerous"

Okay, so maybe don't do that then.

Well, I claim that these are more-or-less the same fact. It's no surprise that the AI falls down on various long-horizon tasks and that it doesn't seem all that well-modeled as having "wants/desires"; these are two sides of the same coin.


It's weird that this sentence immediately follows you talking about AI being able to play chess.  A chess playing AI doesn't "want to win" in the behaviorist sense.  If I flip over the board or swap pieces mid game or simply refuse to move the AI's pieces on it's turn, it's not going to do anything to stop me because it doesn't "want" to win the game.  It doesn't even realize that a game is happening in the real world.  And yet it is able to make excellent long term plans about "how" to win at chess.

a) A chess playing AI fits into your definition of "want", in which case who cares if AI wants things, this tells us nothing about their real-world behavior.
b) A chess playing AI doesn't "want" to win (my claim) in which case AI can make long term plans without wanting.

Construction of overhead electric lines would be much more expensive in America than other countries, making those ROI estimates inaccurate.


I think you might be seriously underestimating 1.  Rail projects cost 50% more in the US (vs e.g France).

"reality is large" is a bad objection.

It's possible in principle to build a simulation that is literally indistinguishable from reality.  Say we only run the AI in simulation for 100million years, and there's a simulation overhead of 10x.  That should cost (100e6 ly)**3*(100e6 years) * 10 of our future lightcone.  This is a minuscule fraction of our actual future lightcone (9.4e10 ly) * (10^15 y)

A few better objections:

Simulating a universe with a paperclip maximizer in it means simulating billions of people being murdered and turned into paperclips.  If we believe computation=existence, that's hugely morally objectionable.

The AGI's prior that it is in a simulation doesn't depend on anything we do, only on the universal prior.

I don't thing all AI regulation harmful, but I think almost all "advocacy" is harmful.  Increasing the salience of AI Doom is going to mean making it a partisan issue.  For the moment, this means that the left is going to want to regulate AI bias and the right is going to want to build AI faster than China.

I think the correct approach is more akin to secret congress, the idea that bipartisan deals are possible by basically doing things everyone agrees on without publicly broadcasting it.

Once the economy is fully automated we end up in a Paul-Christiano-scenario where all the stuff that happens in the world is incomprehensible to humans without a large amount of AI help. But ultimately the AI, having been in control for so long, is able to subvert all the systems that human experts use to monitor what is actually going on. The stuff they see on screens is fake, just like how Stuxnet gave false information to Iranian technicians at Natanz

This concedes the entire argument that we should regulate uses not intelligence per-se.  In your story a singleton AI uses a bunch of end-effectors (robot factories, killer drones, virus manufacturing facilities) to cause the end of humanity.

If there isn't a singleton AI (i.e. my good AI will stop your bad AI), or if we just actually have human control of dangerous end-effectors then you can never pass through to the "and then the AI kills us all" step.

Certainly you can argue that the AI will be so good at persuasion/deception that there's no way to maintain human control.  Or that there's no way to identify dangerous end-effectors in advance.  Or that AI will inevitably all cooperate against humanity (due to some galaxy-brained take about how AI can engage in acausal bargaining by revealing their source code but humans can't). But none of these things follow automatically from the mere existence somewhere of a set of numbers on a computer that happens to surpass humanity's intelligence.  Under any plausible scenario without Foom, the level at which AGI becomes dangerous just by existing is well-above the threshold of human-level intelligence.

Here is the list of counter-arguments I prepared beforehand

1) Digital cliff, it may not be possible to weaken a stronger model
2) Competition, the existence of a stronger model implies we live in a more dangerous world
3) Deceptive alignment, the stronger model may be more likely to decieve you into thinking it's aligned
4) Wireheading, the user may be unable to resist using the stronger model even knowing it is more dangerous
5) Passive Saftey, the weaker model may be passively safe while the stronger model is not
6) Malicious actors, the stronger model may be more likely to be used by malicious actors
7) inverse scaling, the stronger model may be weaker in some safety-critical dimensions
8) Domain of alignment, the stronger model may be more likely to be used in a safety-critical context


I think the strongest counter arguments are:

  1. There may not be a surefire way to weaken a stronger model
  2. saying you "can" weaken a model is useless unless you actually do it


It would love to hear a stronger argument for what @johnswentworth describes as "subproblem 1": that the model might become dangerous during training.  All of the versions of this argument that I aware of involve some "magic" step where the AI unboxes itself by (e.g. side-channel or talking its way out of the box ) that seem like the either require huge leaps in intelligence or can be easily mitigated (air-gapped network, two person control).

Load More