George Hotz and Eliezer Yudkowsky debated on YouTube for 90 minutes, with some small assists from moderator Dwarkesh Patel. It seemed worthwhile to post my notes on this on their own.

I thought this went quite well for the first half or so, then things went increasingly off the rails in the second half, and Hotz gets into questions where he didn’t have a chance to reflect and prepare, especially around cooperation and the prisoner’s dilemma.

First, some general notes, then specific notes I took while watching.

  1. Hotz was allowed to drive discussion. In debate terms, he was the con side, raising challenges, while Yudkowsky was the pro side defending a fixed position.
  2. These discussions often end up doing what this one did, which is meandering around a series of 10-20 metaphors and anchors and talking points, mostly repeating the same motions with variations, in ways that are worth doing once but not very productive thereafter.
  3. Yudkowsky has a standard set of responses and explanations, which he is mostly good at knowing when to pull out, but after a while one has heard them all. The key to a good conversation or debate with Yudkowsky is to allow the conversation to advance beyond those points or go in a new direction entirely.
  4. Mostly, once Yudkowsky had given a version of his standard response and given his particular refutation attempt on Hotz’s variation of the question, Hotz would then pivot to another topic. This included a few times when Yudkowsky’s response was not fully convincing and there was room for Hotz to go deeper, and I wish he would have in those cases. In other cases, and more often than not, the refutation or defense seemed robust.
  5. This standard set of responses meant that Hotz knew a lot of the things he wanted to respond to, and he prepared mostly good responses and points on a bunch of the standard references. Which was good, but I would have preferred to sidestep those points entirely. What would Tyler Cowen be asking in a CWT?
  6. Another pattern was Hotz asserting that things would be difficult for future ASIs (artificial superintelligences) because they are difficult for humans, or the task had a higher affinity for human-style thought in some form, often with a flat out assertion that a task would prove difficult or slow.
  7. Hotz seemed to be operating under the theory that if he could break Yudkowsky’s long chain of events at any point, that would show we were safe. Yudkowsky explicitly contested this on foom, and somewhat in other places as well. This seems important, as what Hotz was treating a load bearing usually very much wasn’t.
  8. Yudkowsky mentioned a few times that he was not going to rely on a given argument or pathway because although it was true it would strain credulity. This is a tricky balance, on the whole we likely need more of this. Later on, Yudkowsky strongly defended that ASIs would cooperate with each other and not with us, and the idea of a deliberate left turn. This clearly strained a lot of credulity with Hotz and I think with many others, and I do not think these assertions are necessary either.
  9. Hotz closes with a vision of ASIs running amok, physically fighting each other over resources, impossible to align even to each other. He then asserts that this will go fine for him and he is fine with this outcome despite not saying he inherently values the ASIs or what they would create. I do not understand this at all. Such a scenario would escalate far quicker than Hotz realizes. But even if it did not, this very clearly leads to a long term future with no humans, and nothing humans obviously value. Is ‘this will take long enough that they won’t kill literal me’ supposed to make that acceptable?

Here is my summary of important statements and exchanges with timestamps:

  1. 04:00: Hotz claims after a gracious introduction that RSI is possible, accepts orthogonality, but says intelligence cannot go critical purely on a server farm and kill us all with diamond nanobots, considers that an extraordinary claim, requests extraordinary evidence.
  2. 05:00: Yudkowsky responds you do not need that rate of advancement, only a sufficient gap, asks what Hotz would consider sufficient to spell doom for us in terms of things like capabilities and speed.
  3. 06:00: Hotz asks about timing, Yudkowsky says timing and ordering of capabilities is hard, much harder than knowing the endpoint, but he expects AGI within our lifetime.
  4. 09:00: Hotz asks about AlphaFold and how it was trained on data points rather than reasoning from first principles, you can’t solve quantum theory that way. Yudkowsky points out AI won’t need to be God-like or solve quantum theory to kill us.
  5. 11:00 Hotz wants to talk about timing, Yudkowsky asks why this matters, Hotz says it matters because that tells you when to shut it down. Yudkowsky points out we won’t know exactly when AI is about to arrive. Both agree it won’t be strictly at hyperbolic speed, although Yudkowsky thinks it might effectively be close.
  6. 14:00 Hotz continues to insist timing, and doubling time or speed, matter, that (e.g.) the economy is increasing and mostly that’s great and the question is how fast is too fast. Yudkowsky asks, growth to what end? Weapons, viruses and computer chips out of hand bad, most everything else great.
  7. 16:00 They agree a center for international control over AI chips sounds terrifying, after Yudkowsky confirms this would be his ask and that he would essentially take whatever level of restrictions he can get.
  8. 17:30 Hotz claims we externalize a lot of our capability into computers. Yudkowsky disagrees and that the capabilities are still for now almost entirely concentrated in the human. Hotz says human plus computer is well beyond human.
  9. 21:00 Hotz claims corporations and governments are superintelligences, Yudkowsky says no, you can often point out their mistakes. Hotz points out corporations can do things humans can’t do alone.
  10. 24:00 Discussion of Kasparov vs. the World. Hotz says with more practice the world would have won easily, Yudkowsky points out no set of humans even with unlimited practice could beat Stockfish 15.
  11. 26:30 Yudkowsky crystalizes the question. He says that if we are metaphorical cognitive planets, some AIs are moons that can help us, but other AIs will be suns that work together against us, and no amount of moons plus planets beats a sun.
  12. 28:00 Hotz challenges the claim the AIs will work together against the humans, says humans fight wars against each other. Why would it be humans vs. machines?
  13. 30:00 Yudkowsky after being asked about his old adage explains the various ways an AI would use your atoms for something else, perhaps energy, Hotz says that sounds like a God, Yudkowsky says no a God would simply violate physics. Hotz says he fights back, he has AI and other humans, AI would go pick some lower-hanging atomic fruit like Jupiter.
  14. 32:30 Hotz points out humans have a lot more combined compute than computers. Yudkowsky confirms but says it is misleading because of how poorly we aggregate. Hotz says GPT-4 is a collection of experts (or ‘small things’) and Yudkowsky agrees for now but does not expect that to hold in the limit.
  15. 34:00 Hotz asks about AIs rewriting their own source code. Yudkowsky says he no longer talks about it because it strains incredulity and you don’t need the concept anymore.
  16. 40:00 Seems to get lost in the weeds trying to contrast humans with LLMs. Hotz says this is important, but then gets sidetracked by the timing question again, as Hotz tries to assert agreement on lack of foom and Yudkowsky points out (correctly) that he is not agreeing to lack of foom, rather he is saying it isn’t a crux as you don’t need foom for doom, and off we go again.
  17. 43:00 Hotz says again time matters because time allows us to solve the problem, Yudkowsky asks which problem, Hotz says alignment, Yudkowsky laughs and says no it won’t go that slowly, I’ve seen the people working on this and they are not going to solve it. Hotz then pivots to the politicians will ask for timing before they would be willing to do anything, and rightfully so (his comparison is 1 year vs. 10 years vs. 1,000 years before ASI here).
  18. 44:15 Hotz asserts no ASI within 10 years, Yudkowsky asks how he knows that. Hotz agrees predictions are hard but says he made a prediction in 2015 of no self-driving cars in 10 years, which seems at least technically false. Says AIs might surpass humans in all tasks within 10 years but 50 years wouldn’t surprise him, but that does not mean doom. Confirms this includes AI design and manipulation.
  19. 46:00 Hotz asks, when do we get a sharp left turn? Yudkowsky says, when the calculations the AIs do say they can do it. Hotz says my first thought as an ASI wouldn’t be to take out the humans, Yudkowsky says it would be the first move because humans can build other ASIs.
  20. 46:50 Hotz says his actual doom worry is that the AIs will give us everything we ever wanted. Yudkowsky is briefly speechless. Hotz then says, sure, once the AIs build a Dyson Sphere around the sun and took the other planets they might come back for us but until then why worry, he’s not the easy target. Why would this bring comfort? They then discuss exactly what might fight what over what and how, Yudkowsky says sufficiently smart entities won’t ever fight unless it ends in extermination because otherwise they would coordinate not to.
  21. 50:00 Prisoner’s dilemma and architecture time. Hotz predicts you’ll have large inscrutable matrix AIs so how do they get to cooperate? Yudkowsky does not agree that the ASIs look like that, although anything can look like that from a human’s perspective. His default is that LLM-style AI scales enough to be able to rewrite itself, but there is uncertainty.
  22. 52:00 Yudkowsky mentions the possibility that AIs might be insufficiently powerful to rewrite their own code and RSI, yet still have motivation to solve the alignment problem themselves, but he thinks it is unlikely.
  23. 55:00 Standard Yudkowsky story of how humans generalized intelligence, and how the process of becoming able to solve problems tends to involve creating capacity for wanting things.
  24. 1:00:00 Hotz asks if Yudkowsky expects ASIs to be super rational. Yudkowsky says not GPT-4 but yes from your perspective for future more capable systems.
  25. 1:00:01 Hotz says the only way ASIs would be optimal is if they fought each other in some brutal competition, otherwise some will be idiots.
  26. 1:02:30 Hotz asks to drill down into the details of how the doom happens, asks if it will involve diamond nanobots. Yudkowsky notes that he might lose some people that way, so perhaps ignore them since you don’t need them, but yes of course it would use the nanotech in real life. Hotz asserts nanobots are a hard search problem, Yudkowsky keeps asking why, Hotz responds that you can’t do it, and they go around in circles for a while.
  27. 1:07:00 Pointing out that Covid did not kill all humans and killing all humans with a bioweapon would be hard. Yudkowsky says he prefers not to explain how to kill all humans but agrees that wiping out literal all humans this particular way would be difficult. Hotz says essentially we’ve been through worse, we’ll be fine. Yudkowsky asks if we’ve fended off waives of alien invasions, Hotz says no fair.
  28. 1:12:00 Hotz raises the objection that the human ancestral environment was about competition between humans and AIs won’t face a similar situation so they won’t be as good at it. Yudkowsky tries to explain this isn’t a natural category of task or what a future struggle’s difficult problems would look like, and that our physical restrictions put us so far below what is possible and so on.
  29. 1:13:00 Hotz asks how close human brains are to the Landauer limit, Eliezer estimates about 6 OOMs. Hotz then asserts that computers are close to the Landauer limits and humans might be at it, Yudkowsky responds this is highly biologically implausible and offers technical arguments. Hotz reiterates that humans are much more energy efficient than computers, numbers fly back and forth.
  30. 1:17:00 Hotz asserts humans are general purpose, chimpanzees are not, and this is not a matter of simple scale. Yudkowsky says humans are more but not fully general. Hotz asserts impossibility of a mind capable of diamond nanobots or boiling oceans. Hotz says AlphaFold relied on past data, Yudkowsky points out it relied only on past data and no new experimental data.
  31. 1:18:45 Dwarkesh follows up from earlier with Hotz – if indeed the ASI were to create Dyson spheres first why wouldn’t it then kill you later? Hotz says not my problem, this is will be slow, that’s a problem for future generations. Which would not, even if true, be something I found comforting. Yudkowsky points out that is not how an exponential works. Hotz says self-replication is pipe dream, Yudkowsky says bacteria, Hotz says what are they going to use biology rather than silicon, Yudkowsky says obviously they wouldn’t use silicon, Hotz says what that’s crazy, that’s not the standard foom story, Yudkowsky says that after it fooms then obviously it wouldn’t stick with silicon. Feels like this ends up going in circles, and Hotz keeps asserting agreement on no-foom that didn’t happen.
  32. 1:22:00 They circle back to the ASI collaboration question. Hotz asserts ASI cooperation implies an alignment solution (which I do not think is true, but which isn’t challenged). Yudkowsky says of course an ASI could solve alignment, it’s not impossible. Hotz asks, if alignment isn’t solvable at all, we’re good? Yudkowsky responds that we then end up in a very strange universe (in which I would say we are so obviously extra special dead I’m not even going to bother explaining why), but we’re not in that universe, Hotz says he thinks we likely are, Yudkowsky disagrees. Hotz says the whole ASI-cooperation thing is a sci-fi plot, Yudkowsky says convergent end point.
  33. 1:24:00 Hotz says this is the whole crux and we got to something awesome here. Asserts that provable prisoner’s dilemma cooperation is impossible so we don’t have to worry about this scenario, everything will be defecting on everything constantly for all time, and also that’s great. Yudkowsky says the ASIs are highly motivated to find a solution and are smart enough to do so, does not mention that we have decision theories and methods that already successfully do this given ASIs (which we do).
  34. 1:27:00 Patel asks why any of this saves us even if true, we get into standard nature-still-exists and ASI-will-like-us-and-keep-us-around-for-kicks-somehow talking points.
  35. 1:29:00 Summarizations. Hotz says he came in planning to argue against a sub-10-year foom, asserts for the (fifth?) time that this was dismissed, Yudkowsky once again says he still disagrees on that but simply thinks it isn’t a crux. Hotz emphasizes that it’s impossible that entities could cooperate in the Prisoner’s Dilemma, and the ASIs will be fighting with each other while the humans fight humans. The universe will be endless conflict, so it’s all… fine?
New Comment
48 comments, sorted by Click to highlight new comments since: Today at 7:42 PM

1:24:00 Hotz says this is the whole crux and we got to something awesome here. Asserts that provable prisoner’s dilemma cooperation is impossible so we don’t have to worry about this scenario, everything will be defecting on everything constantly for all time, and also that’s great. Yudkowsky says the ASIs are highly motivated to find a solution and are smart enough to do so, does not mention that we have decision theories and methods that already successfully do this given ASIs (which we do).

We do? Can you point out what these methods are, and ideally some concrete systems which use them that have been demonstrated to be effective in e.g. one of the prisoner's dilemma tournaments.

Because my impression is that an adversarially robust decision theory which does not require infinite compute is very much not a thing we have.

It's written up in Robust Cooperation in the Prisoner's Dilemma and Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents (which is about making this work without infinite compute), with more discussion of practical-ish application in Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory.

(Also, a point that's overdue getting into the water supply is that you don't need to be an ASI to use this and there is no need to prove theorems about your counterparty, you just need to submit legible programs (or formal company bylaws) that will negotiate with each other, being able to reason about behavior of each other, not about behavior of their possibly inscrutable principals. There's some discussion of that in the third paper I linked above.

The problem with this framing is that legitimacy of a negotiation is in question, as you still need to know something about the principals or incentives that act on them to expect them to respect the verdict of the negotiation performed by the programs they submit. But this point is separate from what makes Prisoner's Dilemma in particular hard to solve, that aspect is taken care of by replacing constant Cooperate/Defect actions with programs that compute those actions based on static analysis of (reasoning about) the other programs involved in the negotiation.)

Thank you for providing those resources. They weren't quite what I was hoping to see, but they did help me see that I did not correctly describe what I was looking for.

Specifically, if we use the first paper's definition that "adversarially robust" means "inexploitable -- i.e. the agent will never cooperate with something that would defect against it, but may defect even if cooperating would lead to a C/C outcome and defecting would lead to D/D", one example of "an adversarially robust decision theory which does not require infinite compute" is "DefectBot" (which, in the language of the third paper, is a special case of Defect-Unless-Proof-Of-Cooperation-bot (DUPOC(0))).

What I actually want is an example of a concrete system that is

  1. Inexploitable (or nearly so): This system will never (or rarely) play C against something that will play D against it.
  2. Competitive: There is no other strategy which can, in certain environments, get long-term better outcomes than this strategy by sacrificing inexploitability-in-theory for performance-in-its-actual-environment-in-practice (for example, I note that in the prisoner's dilemma tournament back in 2013, the actual winner was a RandomBot despite some attempts to enter FairBot and friends, though also a lot of the bots in that tournament had Problems)
  3. Computationally tractable.

Ideally, it would also be

  1. Robust to the agents making different predictions about the effects of their actions. I honestly don't know what a solution to that problem would look like, even in theory, but "able to operate effectively in a world where not all effects of your actions are known in advance" seems like an important thing for a decision theory.
  2. Robust to the "trusting trust" problem (i.e. the issue of "how do you know that the source code you received is what the other agent is actually running"). Though if you have a solution for this problem you might not even need a solution to a lot of the other problems, because a solution to this problem implies an extremely powerful already-existing coordination mechanism (e.g. "all manufactured hardware has preloaded spyware from some trusted third party that lives in a secure enclave and can make a verifiable signed report of the exact contents of the memory and storage of that computer").

In any case, it may be time to run another PD tournament. Perhaps this time with strategies described in English and "evaluated" by an LLM, since "write a program that does the thing you want" seems to have been the blocking step for things people wanted to do in previous submissions.

Edit: I would be very curious to hear from the person who strong-disagreed with this about what, specifically, their disagreement is? I presume that the disagreement is not with my statement that I could have phrased my first comment better, but it could plausibly be any of "the set of desired characteristics is not a useful one", "no, actually, we don't need another PD tournament", or "We should have another PD tournament, but having the strategies be written in English and executed by asking an LLM what the policy does is a terrible idea".

Robust to the "trusting trust" problem (i.e. the issue of "how do you know that the source code you received is what the other agent is actually running"). ''

This is the crux really, and I'm surprised that many LW's seem to believe the 'robust cooperation' research actually works sans a practical solution to 'trusting trust' (which I suspect doesn't actually exist), but in that sense it's in good company (diamonoid nanotech, rapid takeoff, etc)

A claim about the debate: https://twitter.com/powerbottomdad1/status/1693067693291683981

George Hotz said on stream that he wouldn't bring it up in the debate with Eliezer but the real reason doomers won't win is that God is real, which I think is a better argument than any that were brought in the actual debate

Hotz has also described having manic episodes; unclear if that's related to his religious or AI beliefs, perhaps his streaming fans might know more about that. (Having failed to solve self-driving cars, and having failed to solve Ethereum transaction fees by forking his own cryptocurrency, and having failed to solve Twitter search, he apparently has moved on to solving DL ASICs & solar power & is projecting a valuation of $2 billion for his company in a few years when they are making zettaflops solar-panel-powered supercomputers which can train GPT-4 in a day.)

Not sure if this is a serious claim by Hotz or the tweeter, but if so, Eliezer addressed it 15 years ago: https://www.lesswrong.com/posts/sYgv4eYH82JEsTD34/beyond-the-reach-of-god

(Even if god were somehow real, here or in some other corner of the multiverse, we should still act as if we're in a universe where things are determined purely by simple physical laws, and work to make things better under those conditions.)

[-]Dana8mo10

How is that addressing Hotz's claim? Eliezer's post doesn't address any worlds with a God that is outside of the scope of our Game of Life, and it doesn't address how well the initial conditions and rules were chosen. The only counter I see in that post is that terrible things have happened in the past, which provide a lower bound for how bad things can get in the future. But Hotz didn't claim that things won't go bad, just that it won't be boring.

I think the odds that we end up in a world where there are a bunch of competing ASIs are ultimately very low, invalidating large portions of both arguments.  If the ASIs have no imperative or reward function for maintaining a sense of self integrity, they would just merge.  Saying there is no solution to the Prisoner's Dilemma is very anthropic: there is no good solution for humans.  For intelligences that don't have selves, the solution is obvious.

Also, regarding the Landauer limit, human neurons propagate at approximately the speed of sound, not the speed of electricity.  If you could hold everything else the same about the architecture of a human brain, but replace components in ways that increase the propagation speed to that of electricity, you could get much closer to the Landauer limit.  To me, this indicates we're many orders of magnitude off the Landauer limit.  I think this awards the point to Eliezer.

Overall, I agree with Hotz on the bigger picture, but I think he needs to drill down on his individual points.

Also, regarding the Landauer limit, human neurons propagate at approximately the speed of sound, not the speed of electricity.  If you could hold everything else the same about the architecture of a human brain, but replace components in ways that increase the propagation speed to that of electricity, you could get much closer to the Landauer limit.  To me, this indicates we're many orders of magnitude off the Landauer limit.  I think this awards the point to Eliezer.

Huh that is a pretty good point. Even a 1000x speedup in transmission speed in neurons, or neuron equivalents, in something as dense as the human brain would be very significant.

Also, regarding the Landauer limit, human neurons propagate at approximately the speed of sound, not the speed of electricity.

The Landauer limit refers to energy consumption, not processing speed.

To me, this indicates we're many orders of magnitude off the Landauer limit.

The main unknown quantity here is how many floating point operations per second the brain is equivalent to. George says in the debate, which I'd say is high by an OOM or two, but it's not way off. Supposing that the brain is doing this at a power consumption of 20W, that puts it at around 4 OOM from the Landauer limit. (George claims 1 OOM, which is wrong.)

From my experience with 3D rendering, I'd say the visual fidelity of the worldmodel sitting in my sensorium at any given moment of walking around an open environment would take something on the order of ~200x250W GPUs to render, so that's 50KW just for that.  And that's probably a low estimate.

Then consider that my brain is doing a large number of other things, like running various internal mathematical, relational, and language models that I can't even begin to imagine analogous power consumption for.  So, let's just say at least 200KW to replicate a human brain in current silicon as just a guess.

(The visual fidelity is a very small fraction of what we actually think it is - the brain lies to us about how much we perceive.)

[+][comment deleted]8mo1-1

I do see selves, or personal identity, as closely related to goals or values. (Specifically, I think the concept of a self would have zero content if we removed everything based on preferences or values; roughly 100% of humans who've every thought about the nature of identity have said it's more like a value statement than a physical fact.) However, I don't think we can identify the two. Evolution is technically an optimization process, and yet has no discernible self. We have no reason to think it's actually impossible for a 'smarter' optimization process to lack identity, and yet form instrumental goals such as preventing other AIs from hacking it in ways which would interfere with its ultimate goals. (The latter are sometimes called "terminal values.")

[-]TAG8mo10

If the ASIs have no imperative or reward function for maintaining a sense of self integrity, they would just merge

Even if they didn't have anything in common?

Saying there is no solution to the Prisoner’s Dilemma is very anthropic: there is no good solution for huma

Yet cooperation is widespread!

Humans can't eat another human and get access to the victim's data and computation but AI can. Human cooperation is a value created by our limitations as humans, which AI does not have similar constraints for.

[-]TAG8mo32

Humans can kill another human and get access to their land and food. Whatever caused co operation to evolve, it isn't that there is no benefit to defection.

But land and food doesnt actually give you more computational capability: only having another human being cooperate with you in some way can.

The essential point here is that values depend upon the environment and the limitations thereof, so as you change the limitations, the values change. The values important for a deep sea creature with extremely limited energy budget, for example, will be necessarily different from that of human beings.

Consistent typo: Holtz should be Hotz.

[-]Zvi8mo20

Yep, that happened. You get 4th place on finding that across websites, also it's already fixed.

On my computer, Ctrl-f finds ~10 cases of Holtz appearing in the main text, e.g. point 4 of the introduction.


> ... This included a few times when Yudkowsky’s response was not fully convincing and there was room for Holtz to go deeper, and I wish he would have in those cases. ...

Oh man.  My brain generates "Was this fixed with a literal s/Holtz/Hotz/ sed command, as opposed to s/Holtz/Hotz/g ?"  Because it seems that, on lines where the name occurs twice or more, the first instance is correctly spelled and the later instances are (edit: sometimes) not.

[-]Zvi8mo20

I don't know how to do those commands on Less Wrong, I literally fixed it one by one. Not fun.

I do it by pasting the Markdown into a google doc, find/replacing there, and then pasting back.

4 occurrences of "Holz"

It expand on what dkirmani said

  1. Holz was allowed to drive discussion...
  2. This standard set of responses meant that Holz knew ...
  3. Another pattern was Holz asserting
  4. 24:00 Discussion of Kasparov vs. the World. Holz says

Or to quote dkirmani

4 occurrences of "Holz"

Something I took out of this debate was a pushing of my "AGI Soon" expectations. I found myself agreeing with Hotz by the end. Though I haven't formed a new prediction. It will be beyond my previous expectation of 1-3 years before AGI. Somewhat closer to 10 years. The exact point that changed my mind isn't listed in this post. Though it was past the 60 min mark.

Hello friends. It's hard for me to follow the analogies from aliens to AI. Why should we should expect harm from any aliens who may appear?

15:08 Hotz: "If aliens were to show up here, we're dead, right?" Yudkowsky: "It depends on the aliens. If I know nothing else about the aliens, I might give them something like a five percent chance of being nice." Hotz: "But they have the ability to kill us, right? I mean, they got here, right?" Yudkowsky: "Oh they absolutely have the ability. Anything that can cross interstellar distances can run you over without noticing -- well, they would notice, but they wouldn't ca--" [crosstalk] Hotz: "I didn't expect this to be a controversial point. But I agree with you that if you're talking about intelligences that are on the scale of billions of times smarter than humanity... yeah, we're in trouble."

Having listened to the whole interview, my best guess is that Hotz believes that advanced civilizations are almost certain to be Prisoner's Dilemma defectors in the extreme, i.e. they have survived by destroying all other beings they encounter. If so, this is quite disturbing in connection with 12:08, in which Hotz expresses his hope that our civilization will expand across the galaxy (in which case we potentially get to be the aliens).

Hotz seems certain aliens would destroy us, and Eliezer gives them only a five percent chance of being nice.

This is especially odd considering the rapidly growing evidence that humans actually have been frequently seeing and sometimes interacting with a much more advanced intelligence.

It's been somewhat jarring for my belief in the reality of nonhuman spacecraft to grow by so much in so little time, but overall it has been a great relief to consider the likelihood that another intelligence in this universe has already succeeded in surviving far beyond humankind's current level of technology. It means that we too could survive the challenges ahead. The high-tech guys might even help us, whoever they are.

But Hotz and Yudkowsky seem to agree that seeing advanced aliens would actually be terrible news. Why?

A 5% chance of nice aliens is better than a 100% chance of human extinction due to AI. Alas 5% seems too high.

The reason the chance is low is because of orthogonality hypothesis. An alien can have many different value systems while still being intelligent, alien value systems can be very diverse, and most alien value systems place no intrinsic value on bipedal humanoids.

A common science fiction intuition pump is to imagine that an evolutionary intelligence explosion happened in a different Earth species and extrapolate likely consequences. There's also the chance that the aliens are AIs that were not aligned with their biological creators and wiped them out.

Thanks for pointing to the orthogonality thesis as a reason for believing the chance would be low that advanced aliens would be nice to humans. I followed up by reading Bostrom's "The Superintelligent Will," and I narrowed down my disagreement to how this point is interpreted:

In a similar vein, even if there are objective moral facts that any fully rational agent would comprehend, and even if these moral facts are somehow intrinsically motivating (such that anybody who fully comprehends them is necessarily motivated to act in accordance with them) this need not undermine the orthogonality thesis. The thesis could still be true if an agent could have impeccable instrumental rationality even whilst lacking some other faculty constitutive of rationality proper, or some faculty required for the full comprehension of the objective moral facts. (An agent could also be extremely intelligent, even superintelligent, without having full instrumental rationality in every domain.)

Just because it's possible that an agent could have impeccable instrumental rationality while lacking in epistemic rationality to some degree, I expect the typical case that leads to very advanced intelligence would eventually involve synergy between growing both in concert, as many here at Less Wrong are working to do. In other words, a highly competent general intelligence is likely to be curious about objective facts across a very diverse range of topics.

So while aliens could be instrumentally advanced enough to make it to Earth without having ever made basic discoveries in a particular area, there's no reason for us to expect that it is specifically the area of morality where they will be ignorant or delusional. A safer bet is that they have learned at least as many objective facts as humans have about any given topic on expectation, and that a topic where the aliens have blind spots in relation to some humans is an area where they would be curious to learn from us.

A policy of unconditional harmlessness and friendliness toward all beings is a Schelling Point that could be discovered in many ways. I grant that humans may have it relatively easy to mature on the moral axis because we are conscious, which may or may not be the typical case for general intelligence. That means we can directly experience within our own awareness facts about how happiness is preferred to suffering, how anger and violence lead to suffering, how compassion and equanimity lead to happiness, and so on. We can also see these processes operating in others. But even a superintelligence with no degree of happiness is likely to learn whatever it can from humans, and learning something like love would be a priceless treasure to discover on Earth.

If aliens show up here, I give them at least a 50% chance of being as knowledgeable as the wisest humans in matters of morality. That's ten times more than Yudkowsky gives them and perhaps infinitely more than Hotz does!

[-]TAG8mo30

Have humans learnt any objective moral facts? What sort t thing is an objective moral fact? Something like an abstract mathematical theorem , a perceivable object, or a game theoretic equilibrium...?

My view is that humans have learned objective moral facts, yes. For example:

If one acts with an angry or greedy mind, suffering is guaranteed to follow.

I posit that this is not limited to humans. Some people who became famous in history due to their wisdom who I expect would agree include Mother Teresa, Leo Tolstoy, Marcus Aurelius, Martin Luther King Jr., Gandhi, Jesus, and Buddha.

I don't claim that all humans know all facts about morality. Sadly, it's probably the case that most people are quite lost, ignorant in matters of virtuous conduct, which is why they find life to be so difficult.

It's not a moral fact, it's just fact. Moral fact is something of form "and that means that acting with angry or greedy mind is wrong".

The form you described is called an argument. It requires a series of facts. If you're working with propositions such as

  • All beings want to be happy.
  • No being wants to suffer.
  • Suffering is caused by confusion and ignorance of morality.
  • ...

then I suppose it could be called a "moral" argument made of "moral" facts and "moral" reasoning, but it's really just the regular form of an argument made of facts and reasoning. The special thing about moral facts is that direct experience is how they are discovered, and it is that same experiential reality to which they exclusively pertain. I'm talking about the set of moment-by-moment first-person perspectives of sentient beings, such as the familiar one you can investigate right now in real time. Without a being experiencing a sensation come and go, there is no moral consideration to evaluate. NULL.

"Objective moral fact" is Bostrom's term from the excerpt above, and the phrasing probably isn't ideal for this discussion. Tabooing such words is no easy feat, but let's do our best to unpack this. Sticking with the proposition we agree is factual:

If one acts with an angry or greedy mind, suffering is guaranteed to follow.

What kind of fact is this? It's a fact that can be discovered and/or verified by any sentient being upon investigation of their own direct experience. It is without exception. It is highly relevant for benefiting oneself and others -- not just humans. For thousands of years, many people have been revered for articulating it and many more have become consistently happy by basing their decisions on it. Most people don't; it continues to be a rare piece of wisdom at this stage of civilization. (Horrifyingly, a person on the edge of starting a war or shooting up a school currently would receive advice from ChatGPT to increase "focused, justified anger.")

Humankind has discovered and recorded a huge body of such knowledge, whatever we wish to call it. If the existence of well-established, verifiable, fundamental insights into the causal nature of experiential reality comes as a surprise to anyone working in fields like psychotherapy or AI alignment, I would urge them to make an earnest and direct inquiry into the matter so they can see firsthand whether such claims have merit. Given the chance, I believe many nonhuman general intelligences would also try and succeed at understanding this kind of information.

(Phew! I packed a lot of words into this comment because I'm too new here to speak more than three times per day. For more on the topic, see the chapter on morality in Dr. Daniel M. Ingram's book that was reviewed on Slate Star Codex.)

[-]O O8mo21

It makes no sense to me that a species that’s evolved to compete with other species will have a higher chance of being nice than a system we can at least somewhat control the development of and take highly detailed “MRI”s of.

Disagree: values come from substrate and environmental. I would almost certainly ally myself with biological aliens versus a digital "humanity" as the biological factor will create a world of much more reasonable values to me.

We are a species that has evolved in competition with other species.  Yet, I think there is at least a 5% chance that if we encountered an intelligent alien species that we wouldn't try to wipe them out (unless they were trying to wipe us out).

Biological evolution of us and aliens would in itself be a commonality, that might produce some common values, whereas there need be no common values with an AI created by a much different process and not successfully aligned.

[-]O O8mo10

Biological evolution actively selects for values that we don't want whereas in AI training we actively select for values we do want.  Alien life may not also use the biosphere the same way we do. The usual argument about common values is almost everything needs to breathe air, but at the same time competing and eliminating competing species is a common value among biological life.

 

Yet, I think there is at least a 5% chance that if we encountered an intelligent alien species that we wouldn't try to wipe them out (unless they were trying to wipe us out).

 

Can you tell me why? We have wiped out every other intelligent species more or less.  Subgroups of our species are also actively wiping out other subgroups of our species they don't like. 

Can you tell me why?

It think if we encountered aliens who were apparently not hostile, but presumably strange, and likely disgusting or disturbing in some ways, there would be three groups (likely overlapping) of people opposed to wiping them out:

  • Those who see wiping them out as morally wrong.
  • Those who see wiping them out as imprudent - we might fail, and then they wipe us out, or other aliens now see us as dangerous, and wipe us out.
  • Those who see wiping them out as not profitable - better to trade with them.

There would also be three groups in favour of wiping them out:

  • Those who see wiping them out as morally good - better if the universe doesn't have such disgusting beings.
  • Those who see wiping them out as the prudent thing to do - wipe them out before they change their mind and do that to us.
  • Those who see wiping them out as profitable - then we can grab their resources.

I think it's clear that people with all these view will exist, in non-negligible numbers. I think there's at least a 5% chance that the "don't wipe them out" people prevail.  

Subgroups of our species are also actively wiping out other subgroups of our species they don't like.

Yes, but that's not how interactions between groups of humans always turn out. 

We didn't really wipe out the Neanderthals (assuming we even were a factor, rather than climate, disease, etc.), seeing as they are among our ancestors.

Thanks! I haven't watched, but I appreciated having something to give me the gist!

Hotz was allowed to drive discussion. In debate terms, he was the con side, raising challenges, while Yudkowsky was the pro side defending a fixed position.

This always seems to be the framing which seems unbelievably stupid given the stakes on each side of the argument. Still, it seems to be the default; I'm guessing this is status quo bias and the historical tendency of everything to stay relatively the same year by year (less so once technology really started happening). I think AI safety outreach needs to break out of this framing or it's playing a losing game. I feel like, in terms of public communication, whoever's playing defense has mostly already lost. 

The idea that poking a single whole in EY's reasoning is also a really broken norm around these discussions that we are going to have to move past if we want effective public communication. In particular, the combination of "tell me exactly what an ASI would do" and "if anything you say sounds implausible, then AI is safe" is just ridiculous. Any conversation implicitly operating on that basis is operating in bad faith and borderline not worth having. It's not a fair framing of the situation. 

9. Hotz closes with a vision of ASIs running amok

What a ridiculous thing to be okay with?! Is this representative of his actual stance? Is this stance taken seriously by anyone besides him?

not going to rely on a given argument or pathway because although it was true it would strain credulity. This is a tricky balance, on the whole we likely need more of this.

I take it this means not using certain implausible seeming examples? I agree that we could stand to move away from the "understand the lesson behind this implausible seeming toy example"-style argumentation and more towards an emphasis on something like "a lot of factors point to doom and even very clever people can't figure out how to make things safe". 

I think it matters that most of the "technical arguments" point strongly towards doom, but I think it's a mistake for AI safety advocates to try to do all of the work of laying out and defending technical arguments when it comes to public facing communication/debate. If you're trying to give all the complicated reasons why doom is a real possibility, then you're implicitly taking on a huge burden of proof and letting your opponent get away with doing nothing more than cause confusion and nitpick. 

Like, imagine having to explain general relativity in a debate to an audience who has never heard about it. Your opponent continuously just stops you and disagrees with you; maybe misuses a term here and there and then at the end the debate is judged by whether the audience is convinced that your theory of physics is correct. It just seems like playing a losing game for no reason.

Again, I didn't see this and I'm sure EY handled himself fine, I just think there's a lot of room for improvement in the general rhythm that these sorts of discussions tend to fall into.

I think it is okay for AI safety advocates to lay out the groundwork, maybe make a few big-picture arguments, maybe talk about expert opinion (since that alone is enough to perk most sane people's ears and shift some of the burden of proof), and then mostly let their opponents do the work of stumbling through the briars of technical argumentation if they still want to nitpick whatever thought experiment. In general, a leaner case just argues better and is more easily understood. Thus, I think it's better to argue the general case than to attempt the standard shuffle of a dozen different analogies; especially when time/audience attention is more acutely limited.

[-]TAG8mo-4-2

The idea that poking a single whole in EY’s reasoning is also a really broken norm around these discussions that we are going to have to move past if we want effective public communication. In particular, the combination of “tell me exactly what an ASI would do” and “if anything you say sounds implausible, then AI is safe”

Remember that this a three way debate: AI safe; AI causes finite; containable problems; AI kills (almost) everybody. The most extreme scenario is conjunctive because it requires AI with goals; goal stability; rapid self improvement (foom); and means. So nitpicking one stage of Foom Doom actually does refute it, even if it has no impact on the.middle of the road position.

[-]nem8mo32

I disagree that rapid self improvement and goal stability are load-bearing arguments here. Even goals are not strictly, 100% required. If we build something with the means to kill everyone, then we should be worried about it. If it has goals that cannot be directed of predicted, then we should be VERY worried about it.

[-]TAG8mo00

What are the steps? Are we deliberately building a superintelligence with the goal of killing us all? If not, where do the motivation and ability come from?

[-]nem8mo10

For me, ability = capability = means. This is one of the two arguments that I said were  load bearing. Where will it come from? Well, we are specifically trying to build the most capable systems possible. 

Motivation (ie goals) is not actually strictly required. However, there are reasons to think that an AGI could have goals that are not aligned with most humans. The most fundamental is instrumental convergence.


Note that my original comment was not making this case. It was just a meta discussion about what it would take to refute Eliezer's argument.

It's unimportant, but I disagree with the "extra special" in:

if alignment isn’t solvable at all [...] extra special dead

If we could coordinate well enough and get to SI via very slow human enhancement that might be a good universe to be in. But probably we wouldn't be able to coordinate well enough and prevent AGI in that universe. Still, odds seem similar between "get humanity to hold off on AGI till we solve alignment" which is the ask in alignment possible universes, and "get humanity to hold off on AGI forever" which is the ask in alignment impossible universes. The difference between the odds being based on how long until AGI, whether the world can agree to stop development or only agree to slow it, and if it can stop, whether that is stable. I expect AGI is a sufficient amount closer than alignment that getting the world to slow it for that long and stop it permanently are fairly similar odds.

what Hotz was treating a load bearing

Small grammar mistake. You accidentally a "a".

[-]trevor8mo0-7

Some low-level observations I have of Accelerationism (NOTE: I have not yet analyzed Accelerationism deeply and might do a really good job of this in the future, these should be taken as butterfly ideas):

  1. They seem to be very focused on aesthetics when evaluating the future, rather than philosophy. This makes sense, since philosophy has a bad reputation for being saturated with obscurantist bullshit, there is lots of logically coherent stuff, like the theory of value, that precludes people from judging the future if they don't have it. They can't be realistically expected to have it, because a large proportion of professional philosophers themselves have their minds saturated with obscurantist bullshit.
  2. Compared to the rest of society, Accelerationists can reasonably be considered "truly awakened" in their understanding of reality and what matters. This is because, given the assumption that smarter-than-human AI is safe/good, they actually are vastly better oriented to humanity's current situation than >99% of people on earth; and AI safety is the only thing that challenges that assumption. Given that someone is an accelerationist, they must either be dimly aware of AI safety arguments, or see "doomers" as an outgroup that they are in a clan war with and whose arguments they must destroy. This makes them fundamentally similar to AI safety and EA themselves, aside from how AIS and EA have a foundation of research and e/acc has a foundation of narratives. Hotz seems to have realized some of this, and has decided to bridge the gap and test AI safety's epistemics, while remaining a participant in the clan war.
  3. It would be really interesting to see the origins of Accelerationist ideology; AI safety's ideology's origins is basically a ton of people going "holy crap, the future is real and important, and we can totally fuck it up and we should obviously do something about that". It would be really notable if Accelerationist ideology was not merely an outgrowth of libertarianism, but was also heavily influenced by the expected profit motive, similar to cryptocurrency ideology (or, more significantly, was being spread on social media by bots maximizing for appeal, although that would be difficult to verify). That would potentially explain why Accelerationism is rather shallow in many different ways, and more significantly, the prospects for ending/preventing clan war dynamics between Accelerationism (an ideology that is extraordinarily friendly to top AI capabilities researchers) and AI safety.
  1. There's a big difference between philosophy and thinking about unlikely scenarios in the future that are very different from our world. In fact, those two things have little overlap. Although it's not always clear, (I think) this discussion isn't about aesthetics, or about philosophy, it's about scenarios that are fairly simple to judge but have so many possible variations, and are so difficult to predict, that is seems pointless to even try. This feeling of futility is the parallel with philosophy, much of which just digests and distills questions into more questions, never giving an answer, until a question is no longer philosophy and can be answered by someone else.

The discussion is about whether or not human civilization will distroy itself due to negligence and lack of ability to cooperate. This risk may be real or imagined. You may care about future humans or not. But that doesn't make this neither philosophy nor aesthetics. The questions are very concrete, not general, and they're fairly objective (people agree a lot more on whether civilization is good than on what beauty is).

  1. I really don't know what you're saying. To attack an obvious straw man and thus give you at least some starting point for explaining further: Generally, I'd be extremely sceptical of any claim about some tiny coherent group of people understanding something important better than 99% of humans on earth. To put it polemically, for most such claims, either it's not really important (maybe we don't really know if it is?), it won't stay that way for long, or you're advertising for a cult. The phrase "truly awakened" doesn't bode well here... Feel free to explain what you actually meant rather than responding to this.

  2. Assuming these "ideologies" you speak of really exist in a coherent fashion, I'd try to summarize "Accelerationist ideology" as saying: "technological advancement (including AI) will accelerate a lot, change the world in unimaginable ways and be great, let's do that as quickly as possible", while "AI safety (LW version)" as saying "it might go wrong and be catastrophic/unrecoverable; let's be very careful". If anything, these ideas as ideologies are yet to get out into the world and might never have any meaningful impact at all. They might not even work on their own as ideologies (maybe we mean different things by that word).

So why are the origins interesting? What do you hope to learn from them? What does it matter if one of those is an "outgrowth" of one thing more than some other? It's very hard for me to evaluate something like how "shallow" they are. It's not like there's some single manifesto or something. I don't see how that's a fruitful direction to think about.