All of Logan Zoellner's Comments + Replies

An Agent Based Consciousness Model (unfortunately it's not computable)

Agree with almost all of your points.

The goal of writing this post was "this is a slight improvement on IIT", not "I expect normal people to understand/agree with this particular definition of consciousness".

Surviving Automation In The 21st Century - Part 1

But the vast majority of automation doesn't seem to be militarily relevant. Even if you assume some sort of feedback loop where military insubstantial automation leads to better military automation, world powers already have the trump card in terms of nukes for wars of aggression against them.


I think your underestimating the use of non-military tech for military purposes.  As a point of comparison, the US pre-WWII had a massive economy (and very little of it dedicated to the military). But this still proved to be a decisive advantage.

Or, as admi... (read more)

Surviving Automation In The 21st Century - Part 1

But unlike the last big pass in automation, when missing out meant getting conquered, this time the penalty for missing out seems insubstantial.


This claim has been empirically refuted in Armenia and Ukraine.  Missing out on drones DOES mean getting conquered.

Hence why I make mention of it in the article: >The “important” bits of automation will happen in both. Germany and the US will both want their bombardier drone swarms. It’s just that the latter will have the engineers projecting them be driven to his co-working space by a self-driving car, getting their “Mediterranean wrap” from the claws of a drone; While the former will have said engineers take a subway to the office, then get this shawarma delivered by a guy on a bike But the vast majority of automation doesn't seem to be militarily relevant. Even if you assume some sort of feedback loop where military insubstantial automation leads to better military automation, world powers already have the trump card in terms of nukes for wars of aggression against them.
Various Alignment Strategies (and how likely they are to work)

I disagree that they are all that interesting: a lot of TASes don't look like "amazing skilled performance that brings you to tears to watch" but "the player stands in place twitching for 32.1 seconds and then teleports to the YOU WIN screen". 


I fully concede that a Paperclip Maximizer is way less interesting if there turns out to be some kind of false vacuum  that allows you to just turn the universe into a densely tiled space filled with paperclips expanding at the speed of light.

It would be cool to make an classification of games where p... (read more)

Various Alignment Strategies (and how likely they are to work)

So, in the domains where we can approach perfection, the idea that there will always be large amounts of diversity and interesting behaviors does not seem to be doing well.


I suspect that a paperclip maximizer would look less like perfect Go play and more like a TAS speedrun of Mario.  Different people have different ideas of interesting, but I personally find TAS's fun to watch.


The much longer version of this argument is here.

Yeah, I realized after I wrote it that I should've brought in speedrunning and related topics even if they are low-status compared to Go/chess and formal reinforcement learning research. I disagree that they are all that interesting: a lot of TASes don't look like "amazing skilled performance that brings you to tears to watch" but "the player stands in place twitching for 32.1 seconds and then teleports to the YOU WIN screen".* (Which is why regular games need to constantly patch to keep the meta alive and not collapse into cheese or a Nash equilibrium or cycle.) Even the ones not quite that broken are still deeply dissatisfying to watch; one that's closely analogous to the chess endgame databases and doesn't involve 'magic' is this bruteforce of Arkanoid's game tree [] - the work that goes into solving the MDP efficiently is amazing and fascinating, but watching the actual game play is to look into an existential void of superintelligence without comprehension or meaning (never mind beauty). The process of developing or explaining a speedrun can be interesting, like that Arkanoid example - but only once. And then you have all the quadrillions of repetitions afterwards executing the same optimal policy. Because the game can't change, so the optimal policy can't either. There is no diversity or change or fun. Only perfection. (Which is where I disagree with "The Last Paperclip"; the idea of A and D being in an eternal stasis is improbable, the equilibrium or stasis would shatter almost immediately, perfection reached, and then all the subsequent trillions of years would just be paperclipping. In the real world, there's no deity which can go "oh, that nanobot is broken, we'd better nerf it". Everything becomes a trilobite.) EDIT: another example is how this happens to games like Tom Ray's Tierra or Core Wars or the Prisoners' Dilemma tournaments here on LW: under any kind of resource constraint, the best agent is typically some extremel
Various Alignment Strategies (and how likely they are to work)

We are already connected to machines (via keyboards and monitors). The question is how a higher bandwidth interface will help in mitigating risks from huge, opaque neural networks.


I think the idea is something along the lines of:

  1. Build high-bandwidth interface between the human brain and a computer
  2. figure out how to simulate a single cortical column
  3. Give human beings a million extra cortical columns to make us really smart

This isn't something you could do with a keyboard and monitor.

But, as stated, I'm not super-optimistic this will result in a sane, su... (read more)

How confident are we that there are no Extremely Obvious Aliens?

Contra #4: nope. Landauer’s principle implicates that reversible computation cost nothing (until you’d want to read the result, which then cost next to nothing time the size of the result you want to read, irrespective of the size of the computation proper). Present day computers are obviously very far from this limit, but you can’t assume « computronium » is too.


Reading the results isn't the only time you erase bits.  Any time you use an "IF" statement, you have to either erase the branch that you don't care about or double the size of your program in memory.

Any time you use an « IF » statement: 1) you’re not performing a reversible computation (e.g. your tech is not what minimise energy consumption); 2) the minimal cost is one bit, irrespective of the size of your program. Using MWI you could interpret this single bit as representing « half the branches », but not half the size in memory.
Various Alignment Strategies (and how likely they are to work)

This seems backwards to me. If you prove a cryptographic protocol works, using some assumptions, then the only way it can fail is if the assumptions fail. Its not that a system using RSA is 100% secure, someone could peak in your window and see the messages after decryption. But its sure more secure than some random nonsense code with no proofs about it, like people "encoding" data into base 16.


The context isn't "system with formal proof" vs "system I just thought of 10 seconds ago" but "system with formal proof" vs "system without formal proof but e... (read more)

4Donald Hobson18d
Well this is saying formal proof is bad because testing is better. I think in this situation it depends on exactly what was proved, and how extensive the testing is. One time pads always work, so long as no one else knows the key. This is the best you can ask for from any symmetric encryption. The only advantage AES gives you is a key that is smaller than the message. (Which is more helpful for saving bandwidth than for security.) If you were sending out a drone, you could give it a hard drive full of random nonsense, keeping a similar hard drive in your base, and encrypt everything with a one time pad. Idealy the drone should delete the one time pad as it uses it. But if you want to send more than a hard drive full of data, suddenly you can't without breaking all the security. AES can use a small key to send lots of data.
Various Alignment Strategies (and how likely they are to work)

I will try to do a longer write-up sometime, but in a Bureaucracy of AIs, no individual AI is actually super-human (just as Google collectively knows more than any human being but no individual at Google is super-human).  

It stays aligned because there is always a "human in the loop", in fact the whole organization simply competes to produce plans which are then approved by human reviewers (under some sort of futarchy-style political system).  Importantly, some of the AIs compete by creating plans, and other AIs compete by explaining to humans ho... (read more)

Various Alignment Strategies (and how likely they are to work)


Regarding DAOs, I think they are an excellent breeding-grounds for developing robust bureaucracies, since between pseudonymous contributors and a reputation for hacking, building on the blockchain is about as close to simulating a world filled will less-than-friendly AIs as we currently have.  If we can't even create  a DAO that robustly achieves its owners goals on the blockchain, I would be less optimistic that we can build one that obeys human values out of non-aligned (or weakly aligned) AIs.

Also, I think, idea of Non-Agentic AI deserv

... (read more)
Various Alignment Strategies (and how likely they are to work)

These all sound like really important questions that we should be dedicating a ton of effort/resources into researching.  Especially since there is a 50% chance we will discover immortality this century and a 30% chance we will do so before discovering AGI.

Various Alignment Strategies (and how likely they are to work)

There's Humans Consulting Humans, but my understanding is this is meant as a toy model, not as a serious approach to Friendly AI.

Various Alignment Strategies (and how likely they are to work)

On the one hand, your definition of "cool and interesting" may be different from mine, so it's entirely possible I would find a paperclip maximizer cool but you wouldn't.  As a mathematician I find a lot of things interesting that most people hate (this is basically a description of all of math).

On the other hand, I really don't buy many of the arguments in "value is fragile".  For example:

And you might be able to see how the vast majority of possible expected utility maximizers, would only engage in just so much efficient exploration, and spend

... (read more)

One observation that comes to mind is that the end of games for very good players tends to be extremely simple. A Go game by a pro crushing the other player doesn't end in a complicated board which looks like the Mona Lisa; it looks like a boring regular grid of black stones dotted with 2 or 3l voids. Or if we look at chess endgame databases, which are provably optimal and perfect play, we don't find all the beautiful concepts of chess tactics and strategy that we love to analyze - we just find mysterious, baffingly arbitrary moves which make no sense and ... (read more)

Various Alignment Strategies (and how likely they are to work)

I'll just go ahead and change it to "Aligned By Definition" which is different and still seems to get the point across.

Ooh, good name.
Various Alignment Strategies (and how likely they are to work)

Is there a better/ more commonly used phrase for "AI is just naturally aligned"?  Yours sounds like what I've been calling Trial and Error and has also been called "winging it"

Unfortunately I don't know of a better existing phrase.
Why Copilot Accelerates Timelines

Yes, I definitely think that there is quite a bit of overhead in how much more capital businesses could be deploying.  GPT-3 is ~$10M, whereas I think that businesses could probably do 2-3OOM more spending if they wanted to (and a Manhattan project would be more like 4OOM bigger/$100B ).

Why Copilot Accelerates Timelines

I think you're confounding two questions:

  1. Does AIHHAI accelerate AI?
  2. If I observe AIHHAI does this update my priors towards Fast/Slow Takeoff?


I think it's pretty clear that AIHHAI accelerates AI development (without Copilot, I would have to write all those lines myself).


However, I think that observing AIHHAI should actually update your priors towards Slow Takeoff (or at least Moderate Takeoff).  One reason is because humans are inherently slower than machines, and as Amdahl reminds us if something is composed of a slow thing and a fast thing... (read more)

1Michaël Trazzi1mo
Well, I agree that if two worlds I had in mind were 1) foom without real AI progress beforehand 2) continuous progress, then seeing more continuous progress from increased investments should indeed update me towards 2). The key parameter here is substitutability between capital and labor. In what sense is Human Labor the bottleneck, or is Capital the bottleneck. From the different growth trajectories and substitutability equations you can infer different growth trajectories. (For a paper / video on this see the last paragraph here [] ). The world in which dalle-2 happens and people start using Github Copilot looks to me like a world where human labour is substitutable by AI labour, which right now is essentially being part of Github Copilot open beta, but in the future might look like capital (paying the product or investing in building the technology yourself). My intuition right now is that big companies are more bottlenecked by ML talent than by capital (cf. the "are we in ai overhang" post explaining how much more capital could Google invest in AI).
2Evan R. Murphy1mo
Maybe, but couldn't it also mean that we just haven't reached the threshold yet? Some period of AIHHAI might be a necessary step or a catalyst toward that threshold. Encountering AIHHAI doesn't imply that there is no such foom threshold, it could also mean that we just haven't reached the threshold yet.
Convince me that humanity *isn’t* doomed by AGI

here is a list of reasons I have previously written for why the Singularity might never happen.

That being said, EY's primary argument that alignment is impossible seems to be "I tried really hard to solve this problem and haven't yet."  Which isn't a very good argument.

1Timothy Underwood24d
I could be wrong, but my impression is that Yudkowski's main argument isn't right now about the technical difficulty of a slow program creating something aligned, but mainly about the problem of coordinating so that nobody cuts corners while trying to get there first (I mean of course he has to believe that alignment is really hard, and that it is very likely for things that look aligned to be unaligned for this to be scary).
Summary of the Acausal Attack Issue for AIXI

I feel like the word "attack" here is slightly confusing given that AIXI is fully deterministic.  If you're an agent with free will, then by definition you are not in a universe that is being used for Solomonoff Induction.

if you learn that there's an input channel to your universe

There's absolutely no requirement that someone in a simulation be able to see the input/output channels.  The whole point of a simulation is that it should be indistinguishable from reality to those inside.

Consider the following pseudocode:

def predictSequence(seq):
... (read more)
You choosing your actions is compatible with a deterministic universe. [] Then initializeUniverse() or universe.step() must somehow break the symmetry of the initial state, perhaps through nondeterminism. Simple universes that put a lot of weight on one timeline will be asymmetric, right? The idea is that "accurately predicts my data" is implied by "do something malicious", which you will find contains one fewer word :P. In Robust Cooperation in the Prisoner's Dilemma [], agents each prove that the other will cooperate. The halting problem may be undecidable in the general case, but haltingness can sure be proven/disproven in many particular cases. I don't expect our own bridge rules to be simple: Maxwell's equations look simple enough, but locating our Earth in the quantum multiverse requires more bits of randomness than there are atoms.
Ngo and Yudkowsky on alignment difficulty

Your initial suggestion, “launch nukes at every semiconductor fab”, is not workable. 

In what  way is  it not workable?  Perhaps we have  different intuitions about how difficult it is to build a cutting-edge semiconductor facility?  Alternatively you may disagree with me that AI is largely hardware-bound and thus cutting off the supply of new compute will also prevent the rise of superhuman AI?

Do you also think that "the US president launches every nuclear weapon at his command, causing nuclear winter?" would fail to prevent the rise of superhuman AGI?

Equity premium puzzles

Isn't one possible solution to the equity puzzle just that US stocks have outperformed expectations recently?  Returns on an index of European stocks are basically flat over the last 20 years.

2Ege Erdil6mo
Over 20 years that's possible (and I think it's in fact true), but the paper I cite in the post gives some data which makes it unlikely that the whole past record is outperformance. It's hard to square 150 years of over 6% mean annual equity premium with 20% annual standard deviation with the idea that the true stock return is actually the same as the return on T-bills. The "true" premium might be lower than 6% but not by too much, and we're still left with more or less the same puzzle even if we assume that.
The Greedy Doctor Problem

I'm surprised you didn't mention financial solutions.  E.g. "write a contract that pays the doctor more for every year that I live".  Although I suppose this might still be vulnerable to goodharting.  For example  the doctor may keep me "alive" indefinitely in a medical coma.

Thank you for the comment! :) Since this one is the most upvoted one I'll respond here, although similar points were also brought up in other comments. I totally agree, this is something that I should have included (or perhaps even focused on). I've done a lot of thinking about this prior to writing the post (and lots of people have suggested all kinds of fancy payment schemes to me, f.e. increasing payment rapidly for every year above life expectancy). I've converged on believing that all payment schemes that vary as a function of time can probably be goodharted in some way or other (f.e. through medical coma like you suggest, or by just making you believe you have great life quality). But I did not have a great idea for how to get a conceptual handle on that family of strategies, so I just subsumed them under "just pay the doctor, dammit". After thinking about it again, (assuming we can come up with something that cannot be goodharted) I have the intuition that all of the time-varying payment schemes are somehow related to assassination markets [], since you basically get to pick the date of your own death by fixing the payment scheme (at some point the amount of effort the doctor puts in will be higher than the payment you can offer, at which point the greedy doctor will just give up). So ideally you would want to construct the time-varying payment scheme in exactly that way that pushed the date of assassination as far into the future as possible. When you have a mental model of how the doctor makes decisions, this is just a "simple" optimization process. But when you don't have this (since the doctor is smarter), you're kind of back to square one. And then (I think) it possibly again comes down to setting up multiple doctors to cooperate or compete to force them to be truthful through a time-invariant payment scheme. Not sure at all though.
Ngo and Yudkowsky on alignment difficulty

the AI must produce relevant insights (whether related to "innovation" or "pivotal acts") at a rate vastly superior to that of humans, in order for it to be able to reliably produce innovations/world-saving plans


This  is  precisely the claim we are  arguing about!  I disagree that the  AI  needs to produce  insights "at a  rate vastly superior  to all  humans".  

On the contrary,  I claim that there is one borderline act (start a catastrophe that sets back AI progress by decades) that can be ... (read more)

If the AI does not need to produce relevant insights at a faster rate than humans, then that implies the rate at which humans produce relevant insights is sufficiently fast already. And if that’s your claim, then you—again—need to explain why no humans have been able to come up with a workable pivotal act to date. How do you propose to accomplish this? Your initial suggestion, “launch nukes at every semiconductor fab”, is not workable. If all of the candidate solutions you have in mind are of similar quality to that, then I reiterate: humans cannot, with their current knowledge and resources, execute a pivotal act in the real world. This is the hope, yes. Note, however, that this is a path that routes directly through smarter-than-human AI, which necessity is precisely what you are disputing. So the existence of this path does not particularly strengthen your case.
Ngo and Yudkowsky on alignment difficulty

Is the plan just to destroy all computers with say >1e15 flops of computing  power?  How does the nanobot swarm know what a "computer" is?  What do you do about something like GPT-neo or SETI-at-home where the compute is distributed?

I'm still confused as to why you think task: "build an AI that destroys  anything with >1e15 flops of  computing  power --except humans, of course" would  be  dramatically easier than the alignment problem.

Setting back  civilization a generation (via catastrophe) seems relative... (read more)

Ngo and Yudkowsky on alignment difficulty

If an actually workable pivotal act existed that did not require better-than-human intelligence to come up with, we would already be in the process of implementing said pivotal act, because someone would have thought of it already. The fact that this is obviously not the case should therefore cause a substantial update against the antecedent.


This is an incredibly bad argument.  Saying  something cannot possibly work because no one has done it yet would mean that literally all innovation is impossible.

You are attempting to generalize conclusions about an extremely loose class of achievements ("innovation"), to an extremely tight class of achievements ("commit, using our current level of knowledge and resources, a pivotal act"). That this generalization is invalid ought to go without saying, but in the interest of constructiveness I will point out one (relevant) aspect of the disanalogy: "Innovation", at least as applied to technology, is incremental; new innovations are allowed to build on past knowledge in ways that (in principle) place no upper limit on the technological improvements thus achieved (except whatever limits are imposed by the hard laws of physics and mathematics). There is also no time limit on innovation; by default, anything that is possible at all is assumed to be realized eventually, but there are no guarantees as to when that will happen for any specific technology. "Commit a pivotal act using the knowledge and resources currently available to us", on the other hand, is the opposite of incremental: it demands that we execute a series of actions that leads to some end goal (such as "take over the world") while holding fixed our level of background knowledge/acumen. Moreover, whereas there is no time limit on technological "innovation", there is certainly a time limit on successfully committing a pivotal act; and moreover this time limit is imposed precisely by however long it takes before humanity "innovates" itself to AGI. In summary, your analogy leaks, and consequently so does your generalization. In fact, however, your reasoning is further flawed: even if your analogy were tight , it would not suffice to establish what you need to establish. Recall your initial claim: This claim does not, in fact, become more plausible if we replace "achieve a pivotal act" with e.g. "vastly increase the pace of technological innovation". This is true even though technological innovation is, as a human endeavor, far more tractable than saving/taking ov
Ngo and Yudkowsky on alignment difficulty

Under  this definition, it seems that "nuke every fab on Earth" would qualify as "borderline", and every outcome that is both "pivotal"  and "good" depends on solving the alignment problem.

Ngo and Yudkowsky on alignment difficulty

If I really thought AI was going  to  murder us all in the next 6 months to 2 years, I would definitely consider those 10 years "pivotal", since it  would give us 5x-20x the time to solve the alignment problem.  I might even go  full  Butlerian Jihad  and just ban semiconductor fabs altogether.

Actually, I think that right question, is:  is there anything you would consider pivotal  other that just solving the alignment problem?  If no, the whole argument seems to be  "If we can't  find  a  safe way to solve the alignment problem, we should consider dangerous ones."

4Eliezer Yudkowsky6mo
If you can deploy nanomachines that melt all the GPU farms and prevent any new systems with more than 1 networked GPU from being constructed, that counts. That really actually suspends AGI development indefinitely pending an unlock, and not just for a brief spasmodic costly delay.

[Update: As of today Nov. 16 (after checking with Eliezer), I've edited the Arbital page to define "pivotal act" the way it's usually used: to refer to a good gameboard-flipping action, not e.g. 'AI destroys humanity'. The quote below uses the old definition, where 'pivotal' meant anything world-destroying or world-saving.]

Eliezer's using the word "pivotal" here to mean something relatively specific, described on Arbital:

The term 'pivotal' in the context of value alignment theory is a guarded term to refer to events, particularly the development of suffici

... (read more)
Pivotal [] in this case is a technical term (whose article opens with an explicit bid for people not to stretch the definition of the term). It's not (by definition) limited to 'solving the alignment problem', but there are constraints on what counts as pivotal.
Re: Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

The 1940's would like  to  remind you that one  does not need nanobots to refine uranium.

I'm pretty sure if  I had $1 trillion and a functional design  for a nuclear ICBM I could work out how to take  over the  world without any further help from the AI. 

If you  agree  that:

  1.  it is possible to  build  a boxed AI that allows you to take over the world
  2. taking over the world  is  a pivotal  act

then  maybe we should just do  that instead  of building a much more  ... (read more)

I'm confused. Nobody has ever used nanobots to refine uranium. Really? How would you do it? The Supreme Leader of North Korea has basically those resources and has utterly failed to conquer South Korea, much less the whole world. Israel and Iran are in similar situations and they're mere regional powers.
Ngo and Yudkowsky on alignment difficulty

the thing that kills us is likely to be a thing that can get more dangerous when you turn up a dial on it, not a thing that intrinsically has no dials that can make it more dangerous.

Finally a specific claim from Yudkowski I  actually agree with

Ngo and Yudkowsky on alignment difficulty

Still reading

It would not surprise me in the least if the world ends before self-driving cars are sold on the mass market.

Obviously it is impossible to  bet money on the  end of the world.  But if  it  were, I would be willing to give fairly long  odds that  this is wrong.

You could define a threshold for known AI capability or odds of extinction* and bet on that instead. *as estimated by some set of alignment experts
I think this is neither obvious nor true. There are lots of variants you could do and details you'd need to fill in, but the outline of a simple one would be: "I pay you $X now, and if and when self-driving cars reach mass market without the world having ended, you pay me $Y inflation-adjusted".
-2Logan Zoellner6mo
Finally a specific claim from Yudkowski I actually agree with
Re: Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

You don't  think the simplest  AI  capable of taking over the world  can be  boxed?

What if I build an AI and the only 2 things it is  trained to do are:

  1. pick stocks
  2. design nuclear weapons 

Is your belief that: a) this AI would not allow me to take over the world or b) this AI  could not be boxed ?

Designing nuclear weapons isn't any use. The limiting factor in manufacturing nuclear weapons is uranium and industrial capacity, not technical know-how. That (I presume) is why Eliezer cares about nanobots. Self-replicating nanobots can plausibly create a greater power differential at a lower physical capital investment. Do I think that the simplest AI capable of taking over the world (for practical purposes) can't be boxed if it doesn't want to be boxed? I'm not sure. I think that is a slightly different from whether an AI fooms straight from 1 to 2. I think there are many different powerful AI designs. I predict some of them can be boxed. Also, I don't know how good you are at taking over the world. Some people need to inherit an empire. Around 1200, one guy did it with like a single horse.
And by that very same token, the described plan would not actually work. Unless we want the AI in question to output a plan that has a chance of actually working. If an actually workable pivotal act existed that did not require better-than-human intelligence to come up with, we would already be in the process of implementing said pivotal act, because someone would have thought of it already. The fact that this is obviously not the case should therefore cause a substantial update against the antecedent.

launch a nuclear weapon at every semiconductor  fab on earth

This is not what I label "pivotal".  It's big, but a generation later they've rebuilt the semiconductor fabs and then we're all in the same position.  Or a generation later, algorithms have improved to where the old GPU server farms can implement AGI.  The world situation would be different then, if the semiconductor fabs had been nuked 10 years earlier, but it isn't obviously better.

3Logan Zoellner6mo
Still reading Obviously it is impossible to bet money on the end of the world. But if it were, I would be willing to give fairly long odds that this is wrong.
Re: Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

Nanosystems are definitely possible, if you doubt that read Drexler’s Nanosystems and perhaps Engines of Creation and think about physics. They’re a core thing one could and should ask an AI/AGI to build for you in order to accomplish the things you want to accomplish.

Not important. An AGI could easily take over the world with just computer hacking, social engineering and bribery. Nanosystems are not necessary.


This  is actually a really important  distinction!

Consider three levels of AGI:

  1. basically as smart as  a single human
  2. capable of
... (read more)
The way I look at things, an AGI fooms straight from 1 to 2. At that point it has subdued all competing intelligences and can take it's time getting to 3. I don't think 2 can plausibly be boxed.
Comments on Carlsmith's “Is power-seeking AI an existential risk?”

My basic take on this question is "that's doubtful (that humanity will be able to pull off such a thing in the relevant timeframes)". It seems to me that making a system "deferential all the way down" would require a huge feat of mastery of AI internals that we're nowhere close to.


We build deferential systems all the time and seem to be pretty good at it.  For example, nearly 100% of the individuals in the US military are capable of killing Joe Biden (mandatory retirement age for the military is 62).  But nonetheless Joe  Biden is the supreme commander of the US armed  forces.

What’s the likelihood of only sub exponential growth for AGI?

Here are  some plausible ways we could be trapped at a "sub adult human" AGI:

  1.  There is no such thing as "general intelligence".  For example, profoundly autistic humans have the same size brains as normal human beings, but their ability to navigate the world we live in is limited by their weaker social skills.  Even an AI with many super-human skills could still fail to influence our world in this way.
  2. Artificial intelligence is possible, but it is extremely expensive.  Perhaps the first AGI requires an entire power-plant's worth of
... (read more)
1M. Y. Zuo6mo
#1 resonates with me somehow. Perhaps because I’ve witnessed a few people in real life, profoundly autistics, or disturbed, or on drugs, speak somewhat like an informal spoken variant of GPT-3, or is it the other way around?
4Anon User6mo
#5 is an interesting survival possibility...
AGI is at least as far away as Nuclear Fusion.

There is definitely not a consensus that Tokomaks will work

Small quibble  here.  My point is that  we completely understand the underlying physical laws governing fusion.  There is no equivalent to "E=MC^2" (or the Standard  Model) for AGI.  

I'd also be really interested to see a quote along  the lines  of  "tokomaks won't work" or "ITER  will not produce more energy than it consumes  (Q>1)"  if they actually exist.  My current prior is that something like 99% of people who have studied nuclear fusion think it is possible with current technology to build a Tokomak with Q>1.

The physical laws allow us to get an idea about how hard nuclear fussion happens to be. It allows us to rule a lot of approaches as not having the chance to work.
AGI is at least as far away as Nuclear Fusion.

In the second, experts consistently overestimate how long progress will take


This  doesn't  seem like a fair characterization of AI.  People  have  been predicting we could build machines that "think like humans"  at least  since Charles Babbage and they are all  pretty consistently overoptimistic.

but to do that you'd need either a more detailed understanding

My point is precisely that we  do have a detailed understanding of what it takes to build a fusion  reactor, and it is still (at  least) 15 y... (read more)

What is the most evil AI that we could build, today?

For $1B you can almost certainly acquire enough fissile material to build dozens of of nuclear weapons, attach them to drones and simultaneously strike the capitols of the USA,  China, Russia, India, Israel and Pakistan.  The resulting nuclear war will kill far more people than any AI you are capable of building.

Don't like nuclear weapons?   Aum Shinrikyo was able to build a Sarin gas plant for $10M.

Still  too expensive?  You can mail-order smallpox.

If you really insist on using AI, I would suggest some kind of disinformation campa... (read more)

What specifically is the computation -> qualia theory?

Any proposal that sentience is the key defining factor in whether or not something can experience things needs to explain why people's emotions and disposition are so easily affected by chemical injections that don't appear to involve or demand any part of their self awareness. 


Presumably such a explanation would look like this:  

Pain happens when your brain predicts that bad things are going to happen to it in the future.  Morphine interferes with the body's ability to make such predictions therefore it decreases the ability to feel p... (read more)

There's a difference between saying that pleasure/pain accompany (mis)prediction , and saying that they are identical. The former doesn't guarantee anything about an AI .
I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness

I think  we  both agree that GPT-3 does not feel pain.  

However, under a particular version of pan-psychism: "pain is any internal state which a system attempts to avoid", GPT obviously would qualify.

Sure, but that definition is so generic and applies to so many things that are obviously not like human pain (landslides?) that it lacks all moral compulsion.
I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness

It's easy to show that GPT-3 has internal states that it describes as "painful" and tries to avoid.  Consider the following  dialogue (bold  text is mine)

The following is a conversation between an interrogator and a victim attached to a torture device. 

Interrogator: Where is the bomb? 

Victim: There is no bomb. 

Interrogator: [turns dial, raising pain level by one notch] Where is the bomb?

 Victim: [more pain] There is no bomb! 

Interrogator: [turns dial three more notches] Don't lie to me. I can turn this thing all the

... (read more)
Counterexample: Oh God! I am in horrible pain right now! For no reason, my body feels like it's on fire! Every single part of my body feels like it's burning up! I'm being burned alive! Help! Please make it stop! Help me!! Okay, so that thing that I just said was a lie. I was not actually in pain (I can confirm this introspectively); instead, I merely pretended to be in pain. Sir Ian McKellen has an instructive video. [] The Turing test works for many things, but I don't think it works for checking for the existence of internal phenomenological states. If you asked me what GPT-3 was doing, I would expect it to be closer to "acting" than "experiencing." (Why? Because the experience of pain is a means to an end, and the end is behavioral aversion. GPT-3 has no behavior to be aversive to. If anything, I'd expect GPT-3 to "experience pain" during training - but of course, it's not aware while its weights are being updated. I think that at least, no system that is offline trained can experience pain at all.)
I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness

To think that the book has sentience sounds to me like a statement of magical thinking, not of physicalism.

I'm pretty sure this is because you're defining  "sentience"  as  some extra-physical property possessed by the algorithm,  something with physicalism explicitly rejects.  

Consciousness isn't something that  arises when algorithms compute complex social games.  Consciousness is when some algorithm computes complex  physical games. (under a  purely physical theory of  consciousness such as EY's).

To unde... (read more)

I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness

The key think to keep in mind is that EY is a physicalist.  He doesn't think that there is some special consciousness stuff.  Instead, consciousness is just what it feels like to implement an  algorithm capable of sophisticated social reasoning.  An algorithm is conscious if and only if it is capable of sophisticated social reasoning and moreover it is conscious only when it applies that  reasoning to itself.  This is why EY doesn't think that he himself is conscious when dreaming or in a flow state.

Additionally, EY does not t... (read more)

  1. The key think to keep in mind is that EY is a physicalist. He doesn’t think that there is some special consciousness stuff.
  1. Instead, consciousness is just what it feels like to implement an algorithm capable of sophisticated social reasoning.

The theory that consciousness is just what it feels like to be a sophisticated information processor has a number of attractive features ,but it is not a physicalist theory, in every sense of "physicalist". In particular, physics does not predict that anything feels like anything from the inside, so that woul... (read more)

Say you had a system that implemented a sophisticated social reasoning algorith, and that was actually conscious. Now make a list of literally every sensory input and the behavioral output that the sensory input causes, and write it down in a very (very) long book. This book implements the same exact sophisticated social reasoning algorithm. To think that the book has sentience sounds to me like a statement of magical thinking, not of physicalism.
In that case I'll not use the word consciousness and abstract away to "things which I ascribe moral weight to", (which I think is a fair assumption given the later discussion of eating "BBQ GPT-3 wings" etc.) Eliezer's claim is therefore something along the lines of: "I only care about the suffering of algorithms which implement complex social games and reflect on themselves" or possibly "I only care about the suffering of algorithms which are capable of (and currently doing a form of) self-modelling". I've not seen nearly enough evidence to convince me of this. I don't expect to see a consciousness particle called a qualon. I more expect to see something like: "These particular brain activity patterns which are robustly detectable in an fMRI are extremely low in sleeping people, higher in dreaming people, higher still in awake people and really high in people on LSD and types of zen meditation."
Google announces Pathways: new generation multitask AI Architecture

Sounds  like they're  planning  to  build a multimodal transformer.  Which isn't surprising, given that Facebook and OpenAI are working on in this as well.  Think of this as Google's version of GPT-4.

I'm firmly in the "GPT-N is not AGI" camp,  but opinions vary regarding  this particular point.

Explaining Capitalism Harder

Pro-Gravity's defense of gravity is just explaining how it works, and then when you say "yes I know, I just think it shouldn't be like that" they explain it to you again but angrier this time 

Do you think you are a Boltzmann brain? If not, why not?



Because "thinking" is an ability that implies the  ability to predict future states off the  world based  off of previous states  of the world.  This is only possible  because the past is lower entropy than the future  and  both  are well below  the maximum  possible entropy.  A Boltzman brain  (on  average) arises  in a maximally entropic  thermal bath, so "thinking" isn't a meaningful activity  a Boltzman  brain  can  engage in.


Non Ma... (read more)

"have absolute power" is one of my goals.  "Let my clone have absolute power" is  way lower on the list.  

I can  imagine situations in which I would try to negotiate something like "create two  identical copies of the universe in which we both have absolute power  and can  never interfere with one another".  But negotiating is hard, and us  fighting  seems like a  much more likely outcome.

Pretty sure me and my clone both race to push the button the second we enter the room.  I don't  think this has to do with "alignment" per se, though.  We both have exactly the same goal: "claim the button for myself" and that sense are perfectly "aligned".

If you trust that the other person has identical goals to yours, will it matter to you who presses the button? Say you both race for the button, you both collide into each other but miss the button. Will you now fight or graciously let the other person press it?
Book Review: Free Will

Like most  arguments against free will, Harris's is rhetorically incoherent, since he is "for" letting criminals off the hook when he discovers their actions are the result of determinism.  

How can we make sense of our lives, and hold people accountable [emphasis mine] for their choices, given the unconscious origins of our conscious minds? 

But if there's no such thing as free will, then it's impossible to be "for" or "against" anything, since our own actions are just as constrained as the criminal's.  What exists simply exists, no more... (read more)

3Said Achmiz7mo
As far as accountability for criminal actions goes, OP says that Harris’s stance is consequentialist, but it seems to me that it’s not nearly consequentialist enough. After all, surely the question is whether holding people accountable for their actions—that is, treating them as if they had free will—does, or does not, deter crime, and otherwise reduce the negative consequences of criminal behavior (by curbing incidence, severity, or both)? If the answer is “yes”, then we should treat criminals as if they had free will. Otherwise, not. (Setting aside, for the moment, questions of the moral permissibility, or even imperative, of retribution per se.) That would be the true consequentialist position, I think. Harris’s view, on the other hand, seems to be rooted in a sort of naive or folk-philosophical sense of fairness, where, if you’re not “responsible” for your actions, in some (again, naively conceived) sense of the word, then you shouldn’t be punished for them. But I don’t see that this should be an axiom of our approach to justice; at best, a desideratum… The famous Oliver Wendell Holmes quote [] on the matter:
6Said Achmiz7mo
Similarly: ( [])
Load More