All of Karl von Wendt's Comments + Replies

Thank you for being so open about your experiences. They mirror my own in many ways. Knowing that there are others feeling the same definitely helps me coping with my anxieties and doubts. Thank you also for organizing that event last June!

As a professional novelist, the best advice I can give comes from one of the greatest writers of the 20th century, Ernest Hemingway: "The first draft of anything is shit." He was known to rewrite his short stories up to 30 times. So, rewrite. It helps to let some time pass (at least a few days) before you reread and rewrite a text. This makes it easier to spot the weak parts.

For me, rewriting often means cutting things out that aren't really necessary. That hurts, because I have put some effort into putting the words there in the first place. So I use a si... (read more)

I think the term has many “valid” uses, and one is to refer to an object level belief that things will likely turn out pretty well. It doesn’t need to be irrational by definition.

Agreed. Like I said, you may have used the term in a way different from my definition. But I think in many cases, the term does reflect an attitude like I defined it. See Wikipedia.

I also think AI safety experts are self selected to be more pessimistic

This may also be true. In any case, I hope that Quintin and you are right and I'm wrong. But that doesn't make me sleep better.

From Wikipedia: "Optimism is an attitude reflecting a belief or hope that the outcome of some specific endeavor, or outcomes in general, will be positive, favorable, and desirable." I think this is close to my definition or at least includes it. It certainly isn't the same as a neutral view.

Thanks for pointing this out! I agree that my defintion of "optimism" is not the only way one can use the term. However, from my experience (and like I said, I am basically an optimist), in a highly uncertain situation, the weighing of perceived benefits vs risks heavily influences ones probability estimates. If I want to found a start-up, for example, I convince myself that it will work. I will unconsciously weigh positive evidence higher than negative. I don't know if this kind of focusing on the positiv outcomes may have influenced your reasoning and yo... (read more)

People here describe themselves as "pessimistic" about a variety of aspects of AI risk on a very regular basis, so this seems like an isolated demand for rigor.  This seems like a weird bait and switch to me, where an object-level argument is only ever allowed to conclude in a neutral middle-ground conclusion. A "neutral, balanced view of possibilities" is absolutely allowed to end on a strong conclusion without a forest of caveats. You switch your reading of "optimism" partway through this paragraph in a way that seems inconsistent with your earlier comment, in such a way that smuggles in the conclusion "any purely factual argument will express a wide range of concerns and uncertainties, or else it is biased".
1Nora Belrose3mo
I just disagree, I think the term has many “valid” uses, and one is to refer to an object level belief that things will likely turn out pretty well. It doesn’t need to be irrational by definition. I also think AI safety experts are self selected to be more pessimistic, and that my personal epistemic situation is at least as good as theirs on this issue, so I’m not bothered that I’m more optimistic than the median safety researcher. I also have a fairly good “error theory” for why many people are overly pessimistic, which will be elucidated in upcoming posts.

Defined well, dominance would be the organizing principle, the source, of an entity's behavior. 

I doubt that. Dominance is the result, not the cause of behavior. It comes from the fact that there are conflicts in the world and often, only one side can get its will (even in a compromise, there's usually a winner and a loser). If an agent strives for dominance, it is usually as an instrumental goal for something else the agent wants to achieve. There may be a "dominance drive" in some humans, but I don't think that explains much of actual dominant behav... (read more)

That "troll" runs one of the most powerful AI labs and freely distributes LLMs on the level of state-of-the-art half a year ago on the internet. This is not just about someone talking nonsense in public, like Melanie Mitchell or Steven Pinker. LeCun may literally be the one who contributes most to the destruction of humanity. I would give everything I have to convince him that what he's doing is dangerous. But I have no idea how to do that if even his former colleagues Geoffrey Hinton and Yoshua Bengio can't.

I think even most humans don't have a "dominance" instinct. The reasons we want to gain money and power are also mostly instrumental: we want to achieve other goals (e.g., as a CEO, getting ahead of a competitor to increases shareholder value and make a "good job"), impress our neighbors, generally want to be admired and loved by others, live in luxury, distract ourselves from other problems like getting older, etc. There are certainly people who want to dominate just for the feeling of it, but I think that explains only a small part of the actual dominant... (read more)

Thanks for pointing this out! I should have made it clearer that I did not use ChatGPT to come up with a criticism, then write about it. Instead, I wanted to see if even ChatGPT was able to point out the flaws in LeCun's argument, which seemed obvious to me. I'll edit the text accordingly.

Like I wrote in my reply to dr_s, I think a proof would be helpful, but probably not a game changer.

Mr. CEO: "Senator X, the assumptions in that proof you mention are not applicable in our case, so it is not relevant for us. Of course we make sure that assumption Y is not given when we build our AGI, and assumption Z is pure science-fiction."

What the AI expert says to Xi Jinping and to the US general in your example doesn't rely on an impossibility proof in my view. 

Yes.  Valid.  How to avoid reducing to a toy problem or such narrowing assumptions (in order to achieve a proof) that allows Mr. CEO to dismiss it. When I revise, I'm going to work backwards with CEO/Senator dialog in mind.

I agree that a proof would be helpful, but probably not as impactful as one might hope. A proof of impossibility would have to rely on certain assumptions, like "superintelligence" or whatever, that could also be doubted or called sci-fi.

I have strong-upvoted this post because I think that a discussion about the possibility of alignment is necessary. However, I don't think an impossibility proof would change very much about our current situation.

To stick with the nuclear bomb analogy, we already KNOW that the first uncontrolled nuclear chain reaction will definitely ignite the atmosphere and destroy all life on earth UNLESS we find a mechanism to somehow contain that reaction (solve alignment/controllability). As long as we don't know how to build that mechanism, we must not start an uncon... (read more)

Like dr_s stated, I'm contending that proof would be qualitatively different from "very hard" and powerful ammunition for advocating a pause... Senator X: “Mr. CEO, your company continues to push the envelope and yet we now have proof that neither you nor anyone else will ever be able to guarantee that humans remain in control.  You talk about safety and call for regulation but we seem to now have the answer.  Human control will ultimately end.  I repeat my question: Are you consciously working to replace humanity? Do you have children, sir?” AI expert to Xi Jinping: “General Secretary, what this means is that we will not control it.  It will control us. In the end, Party leadership will cede to artificial agents.  They may or may not adhere to communist principals.  They may or may not believe in the primacy of China.  Population advantage will become nothing because artificial minds can be copied 10 billion times.  Our own unification of mind, purpose, and action will pale in comparison.  Our chief advantages of unity and population will no longer exist.” AI expert to US General: “General, think of this as building an extremely effective infantry soldier who will become CJCS then POTUS in a matter of weeks or months.”
Lots of people when confronted with various reasons why AGI would be dangerous object that it's all speculative, or just some sci-fi scenarios concocted by people with overactive imaginations. I think a rigorous, peer reviewed, authoritative proof would strengthen the position against these sort of objections.

That's a good point, which is supported by the high share of 92% prepared to change their minds.

I've received my fair share of downvotes, see for example this post, which got 15 karma out of 24 votes. :) It's a signal, but not more than that. As long as you remain respectful, you shouldn't be discouraged from posting your opinion in comments even if people downvote it. I'm always for open discussions as they help me understand how and why I'm not understood.


I agree with that, and I also agree with Yann LeCun's intention to "not being stupid enough to create something that we couldn't control". I even think not creating an uncontrollable AI is our only hope. I'm just not sure whether I trust humanity (including Meta) to be "not stupid".

I don't see your examples contradicting my claim. Killing all humans may not increase future choices, so it isn't an instrumental convergent goal in itself. But in any real-world scenario, self-preservation certainly is, and power-seeking - in the sense of expanding one's ability to make decisions by taking control of as many decision-relevant resources as possible - is also a logical necessity.  The Russian roulette example is misleading in my view because the "safe" option is de facto suicide - if "the game ends" and the AI can't make any decisions anymore, it is already dead for all practical purposes. If that were the stakes, I'd vote for the gun as well.

Even assuming you are right on that inference, once we consider how many choices there are, it still isn't much evidence at all, and given that there are usually lots of choices, this inference is essentially not holding up the thesis that AI is an existential risk very much, without prior commitments to AI as being an existential risk. Also, this part of your comment, as well as my hopefully final quotes below, explains why you can't get from self-preservation and power-seeking, even if they happen, into an existential risk without more assumptions. That's the problem, as we have just as plausible, if not more plausible reasons to believe that there isn't an instrumental convergence towards existential risk, for reasons related to future choices. These quotes below also explains why instrumental convergence and self-preservation doesn't imply AI risk, without more assumptions.

To reply in Stuart Russell's words: "One of the most common patterns involves omitting something from the objective that you do actually care about. In such cases … the AI system will often find an optimal solution that sets the thing you do care about, but forgot to mention, to an extreme value."

There are vastly more possible worlds that we humans can't survive in than those we can, let alone live comfortably in. Agreed, "we don't want to make a random potshot", but making an agent that transforms our world into one of these rare ones where we want to liv... (read more)

It's difficult to create an aligned Sovereign, but easy not to create a Sovereign at all.

I'm not sure if I understand your point correctly. An AGI may be able to infer what we mean when we give it a goal, for instance from its understanding of the human psyche, its world model, and so on. But that has no direct implications for its goal, which it has acquired either through training or in some other way, e.g. by us specifying a reward function. 

This is not about "genie-like misunderstandings". It's not the AI (the genie, so to speak), that's misunderstanding anything - it's us. We're the ones who give the AI a goal or train it in some way... (read more)

Which is to say, it won't necessarily follow a goal correctly that is is capable of understanding correctly. On the other hand, it won't necessarily fail to. Both possibilities are open. Remember, the title of this argument is misleading: There's no proof that the genie will not care. Not all AI's have goals, not all have goal stability, not all are incorrigible. Mindspace is big.

the orthogonality thesis is compatible with ludicrously many worlds, including ones where AI safety in the sense of preventing rogue AI is effectively a non-problem for one reason or another. In essence, it only states that bad AI from our perspective is possible, not that it's likely or that it's worth addressing the problem due to it being a tail risk.

Agreed. The orthogonality thesis alone doesn't say anything about x-risks. However, it is a strong counterargument against the claim, made both by LeCun and Mitchell if I remember correctly, that a sufficie... (read more)

Yep, the orthogonality thesis is a pretty good defeater to the claims that AI intelligence alone would be sufficient to gain the right values for us, unlike where capabilities alone can be generated by say a simplicity prior. This is where I indeed disagree with Mitchell and LeCun. Not really, and this is important. Also, even if this was true, remember that given the world has many, many choices, it's probably not enough evidence to believe AI risk claims unless you already started with a high prior on AI risk, which I don't. Even at 1000 choices, the evidence is thin but not effectively useless, but by the time we reach millions or billions of choices this claim, even if true isn't very much evidence at all. Quote below to explain it in full why your statement isn't true:
That could be either of two arguments: that it would be capable of figuring out what we want from first principles; or that it would not commit genie-like misunderstandings.

Thanks for pointing this out - I may have been sloppy in my writing. To be more precise, I did not expect that I would change my mind, given my prior knowledge of the stances of the four candidates, and would have given this expectation a high confidence. For this reason, I would have voted with "no". Had LeCun or Mitchell presented an astonishing, verifiable insight previously unknown to me, I may well have changed my mind. 

Ah, okay. That makes much more sense.

Thank you for your reply and the clarifications! To briefly comment on your points concerning the examples for blind spots:

superintelligence does not magically solve physical problems

I and everyone I know on LessWrong agree.

evolution don’t believe in instrumental convergence

I disagree. Evolution is all about instrumental convergence IMO. The "goal" of evolution, or rather the driving force behind it, is reproduction. This leads to all kinds of instrumental goals, like developing methods for food acquisition, attack and defense, impressing the opposite sex,... (read more)

That’s all important points and I’d glad to discuss them. However I’m also noticing a wave of downvotes, so maybe we should go half private with whoever signal they want to read more? Or you think I should just ignore that and go forward with my answers? Both are ok but I’d like to follow your lead as you know the house better.

That’s the kind of sentence that I see as arguments for believing your assessment is biased. 

Yes, my assessment is certainly biased, I admitted as much in the post. However, I was referring to your claim that LW (in this case, me) was "a failure in rational thinking", which sounds a lot like Mitchell's "ungrounded speculations" in my ears.

Of course she gave supporting arguments, you just refuse to hear them

Could you name one? Not any of Mitchell's argument, but a support for the claim that AI x-risk is just "ungrounded speculation" despite decades of ... (read more)


Is the orthogonality thesis correct? (The term wasn’t mentioned directly in the debate) Yes, in the limit and probably in practice, but is too weak to be useful for the purposes of AI risk, without more evidence.

Also, orthogonality is expensive at runtime, so this consideration matters, which is detailed in the post below

I think the post you mention misunderstands what the "orthogonality thesis" actually says. The post argues that an AGI would not want to arbitrarily change its goal during runtime. That is not what the orthogonality thesis is about. It jus... (read more)

I accept the orthogonality thesis, in the sense that a paperclip maximizer can exist, at least in theory, in the sense of being logically and physically possible. The reason I view it as too weak evidence is that the orthogonality thesis is compatible with ludicrously many worlds, including ones where AI safety in the sense of preventing rogue AI is effectively a non-problem for one reason or another. In essence, it only states that bad AI from our perspective is possible, not that it's likely or that it's worth addressing the problem due to it being a tail risk. Imagine if someone wrote an article in thein New York Times claiming that halting oracles are possible, and that this would be very bad news for us, amounting to extinction, solely because it's possible for us to go extinct via this way. The correct response here is that you should ignore the evidence and go with your priors. I see the orthogonality thesis a lot like this: It's right, but the implied actions require way more evidence than it presents. Given that the orthogonality thesis, even if true shouldn't shift our priors much, due to it being very, very weak evidence, the fact that the orthogonality thesis is true doesn't mean that Lecun is wrong, without something else assumed. IMO, this also characterizes why I don't find AI risk that depends on instrumental convergence compelling, due to the fact that even if it's true, contra Bostrom without more assumptions that need to be tested empirically, and this is still very compatible with a world where instrumental convergence is a non-problem in any number of ways, and means that without more assumptions, LeCun could still be right that in practice instrumental convergence does not lead to existential risk or even leave us in a bad future. Post below: Some choice quotes to illustrate why instrumental convergence doesn't buy us much evidence at all: So my point is ev

but your own post make me update toward LW being a failure of rational thinking, e.g. it’s an echo chamber that makes your ability to evaluate reality weaker, at least on this topic.

I don't see you giving strong arguments for this. It reminds me of the way Melanie Mitchell argued: "This is all ungrounded speculation", without giving any supporting arguments for this strong claim. 

Concerning the "strong arguments" of LeCun/Mitchell you cite:

AIs will likely help with other existential risks

Yes, but that's irrelevant to the question of whether AI may pos... (read more)


That's really nice, thank you very much!

We added a few lines to the dialog in "Takeover from within". Thanks again for the suggestion!

Thank you for pointing this out. By "turning Earth into a giant computer" I did indeed mean "the surface of the Earth". The consequences for biological life are the same, of course. As for heat dissipation, I'm no expert but I guess there would be ways to radiate it into space, using Earth's internal heat (instead of sunlight) as the main energy source. A Dyson sphere may be optimal in the long run, but I think that turning Earth's surface into computronium would be a step on the way.

The way to kill everyone isn’t necessarily gruesome, hard to imagine, or even that complicated. I understand it’s a good tactic at making your story more ominous, but I think it’s worth stating it to make it seem more realistic.

See my comment above. We didn't intend to make the story ominous, but didn't want to put off readers by going into too much detail of what would happen after an AI takeover.

Lastly, it seems unlikely alignment research won’t scale with capabilities. Although this isn’t enough to align the ASI alone and the scenario can still happen,

... (read more)
1O O8mo
Don’t see why not. It would be far easier than automating neuroscience research. There is an implicit assumption that we won’t wait for alignment research to catch up but people will naturally get more suspicious of models as they get more powerful. We are already seeing the framework for a pause at this step Additionally, to align ASI correctly, you only need to align weakly super/human levels/below human level intelligent agents who will align super intelligent agents stronger then themselves or convince you to stop progress. Recursive self alignment is parallel to recursive self improvement.

As I've argued here, it seems very likely that a superintelligent AI with a random goal will turn earth and most of the rest of the universe into computronium, because increasing its intelligence is the dominant instrumental subgoal for whatever goal it has. This would mean inadvertent extinction of humanity and (almost) all biological life. One of the reasons for this is the potential threat of grabby aliens/a grabby alien superintelligence. 

However, this is a hypothesis which we didn't thoroughly discuss during the AI Safety Project, so we didn't fe... (read more)

1O O8mo
I have a lot of issues with the disassembling atoms line of thought, but I won’t argue it here. I think it’s been argued enough against in popular posts. But I think the gist of it is the Earth is a tiny fraction of the solar system/near solar systems’ resources (even smaller out of the light cone), and one of the worst places to host a computer vs say Pluto, because of heat, so ultimately it doesn’t take much to avoid using Earth for all of its resources. Grabby aliens don’t really limit us from using solar system/near solar system. And some of my own thoughts: the speed of light probably limits how useful that large of computers are (say planet size), while a legion of AI systems is probably slow to coordinate. They will still be very powerful but a planet sized computer just doesn’t sound realistic in the literal sense. A planet sized compute cluster? Sure, maybe heat makes that impractical, but sure.

Thank you very much for the feedback! I'll discuss this with the team, maybe we'll edit it in the next days.

Thank you! Very interesting and a little disturbing, especially the way the AI performance expands in all directions simultaneously. This is of course not surprising, but still concerning to see it depicted in this way. It's all too obvious how this diagram will look in one or two years. Would also be interesting to have an even broader diagram including all kinds of different skills, like playing games, steering a car, manipulating people, etc.

Thank you very much! I agree. We chose this scenario out of many possibilities because so far it hasn't been described in much detail and because we wanted to point out that open source can also lead to dangerous outcomes, not because it is the most likely scenario. Our next story will be more "mainstream".

Good point! Satirical reactions are not appropriate in comments, I apologize. However, I don't think that arguing why alignment is difficult would fit into this post. I clearly stated this assumption in the introduction as a basis for my argument, assuming that LW readers were familiar with the problem. Here are some resources to explain why I don't think that we can solve alignment in the next 5-10 years:,, (read more)

Yes, thanks for the clarification! I was indeed oversimplifying a bit.

This is an interesting thought. I think even without AGI, we'll have total transparency of human minds soon - already AI can read thoughts in a limited way. Still, as you write, there's an instinctive aversion against this scenario, which sounds very much like an Orwellian dystopia. But if some people have machines that can read minds, which I don't think we can prevent, it may indeed be better if everyone could do it - deception by autocrats and bad actors would be much harder that way. On the other hand, it is hard to imagine that the people in power wou... (read more)

I'm obviously all for "slowing down capabilites". I'm not for "stopping capabilities altogether", but for selecting which capabilites we want to develop, and which to avoid (e.g. strategic awareness). I'm totally for "solving alignment before AGI" if that's possible.

I'm very pessimistic about technical alignment in the near term, but not "optimistic" about governance. "Death with dignity" is not really a strategy, though. If anything, my favorite strategy in the table is "improve competence, institutions, norms, trust, and tools, to set the stage for right... (read more)

2Mo Putera9mo
Sure, I mostly agree. To repeat part of my earlier comment, you would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong. In other words, I'm giving you feedback on how to make your post more persuasive to the LW audience. This sort of response ("Well, yes, of course! Why didn't I think of it myself? /s") doesn't really persuade readers; bridging inferential gaps would.

Well, yes, of course! Why didn't I think of it myself? /s

Honestly, "aligned benevolent AI" is not a "better alternative" for the problem I'm writing about in this post, which is we'll be able to develop an AGI before we have solved alignment. I'm totally fine with someone building an aligned AGI (assuming that it is really aligend, not just seemingly aligned). The problem is, this is very hard to do, and timelines are likely very short.

8Mo Putera10mo
At least 2 options to develop aligned AGI, in the context of this discussion: 1. Slow down capabilities and speed up alignment just enough that we solve alignment before developing AGI  1. e.g. the MTAIR project, in this paper, models the effect of a fire alarm for HLMI as "extra time" as speeding up safety research, leading to a higher chance that it is successful before timeline for HLMI  2. this seems intuitively more feasible, hence more likely 2. Stop capabilities altogether -- this is what you're recommending in the OP 1. this seems intuitively far less feasible, hence ~unlikely (I interpret e.g. HarrisonDurland's comment as elaborating on this intuition) What I don't yet understand is why you're pushing for #2 over #1. You would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong. Edited to add: Matthijs Maas' Strategic Perspectives on Transformative AI Governance: Introduction has this (oversimplified) mapping of strategic perspectives. I think you'd probably fall under (technical: pessimistic or very; governance: very optimistic), while my sense is most LWers (me included) are either pessimistic or uncertain on both axes, so there's that inferential gap to address in the OP.

You may be right about that. Still, I don't see any better alternative. We're apes with too much power already, and we're getting more powerful by the minute. Even without AGI, there are plenty of ways to end humanity (e.g. bioweapons, nanobots, nuclear war, bio lab accidents ...) Either we learn to overcome our ape-brain impulses and restrict ourselves, or we'll kill ourselves. As long as we haven't killed ourselves, I'll push towards the first option.

I do! Aligned benevolent AI!

We're not as far apart as you probably think. I'd agree with most of your decisions. I'd even vote for you to become king! :) Like I wrote, I think we must also be cautious with narrow AI as well, and I agree with your points about opaqueness and the potential of narrow AI turning into AGI. Again, the purpose of my post was not to argue how we could make AI safe, but to point out that we could have a great future without AGI. And I still see a lot of beneficial potential in narrow AI, IF we're cautious enough.

1000 years is still just a delay.

Fine. I'll take it.

But I didn't see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.

Actually, my point in this post is that we don't NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong. The point of this post is not to argue that preventing AGI is easy.

However, it's actually very simple: If we build a misaligned AGI, we're dead. So there are only two options: A) s... (read more)

I don't have so much of a problem with that part. It would prevent my personal favorite application for fully generally strongly superhuman AGI... which is to have it take over the world and keep humans from screwing things up more. I'm not sure I'd want humans to have access to some of the stuff non-AGI could do... but I don't think here's any way to prevent that. C) Give up. You're not going to like it... Personally, if made king of the world, I would try to discourage at least large scale efforts to develop either generalized agents or "narrow AI", especially out of opaque technology like ML. Thats because narrow AI could easily become parts or tools for a generalized agent, because many kinds of narrow AI are too dangerous in human hands, and because the tools and expertise for narrow AI are too close to those for generalized AGI,. It would be extremely difficult to suppress one in practice without suppressing the other. I'd probably start by making it as unprofitable as I could by banning likely applications. That's relatively easy to enforce because many applications are visible. A lot of the current narrow AI applications need bannin' anyhow. Then I'd start working on a list of straight-up prohibitions. Then I'd dump a bunch of resources into research on assuring behavior in general and on more transparent architectures. I would not actually expect it to work, but it has enough of a chance to be worth a try,. That work would be a lot more public than most people on Less Wrong would be comfortable with, because I'm afraid of nasty knock-on effects from trying to make it secret. And I'd be a little looser about capability work in service of that goal than in service of any other. I would think very hard about banning large aggregations of vector compute hardware, and putting various controls on smaller ones, and would almost certainly end up doing it for some size thresholds. I'm not sure what the thresholds would be, nor exactly what the controls would

One of the reasons I wrote this post is that I don't believe in regulation to solve this kind of problem (I'm still pro regulation). I believe that we need to get a common understanding of what are stupid things no one in their right mind would ever do (see my reply to jbash). To use your space colonization example: we certainly can't regulate what people do somewhere in outer space. But if we survive long enough to get there, then we have either solved alignment or we have finally realized that it's not possible, which will hopefully be common knowledge b... (read more)

I am pretty sure you are just wrong about this and there are people who will gladly pay $10M to end the world, or they would use it as a blackmail threat and a threat ignorer would meet a credible threatener, etc. People are a bunch of unruly, dumb barely-evolved monkeys and the left tail of human stupidity and evil would blow your mind.

The point of this comment is less to say “this definitely can’t be done” (although I do think such a future is fairly implausible/unsustainable), and more to say “why did you not address this objection?” You probably ought to have a dedicated section that very clearly addresses this objection in detail.

That's a valid point, thank you for making it. I have given some explanation of my point of view in my reply to jbash, but I agree that this should have been in the post in the first place.

You cannot permanently stop self-improving AGI from being created or run. Not without literally destroying all humans.

You can't stop it for a "civilizationally significant" amount of time. Not without destroying civilization.

I'm not sure what this is supposed to mean. Are you saying that I'd have to kill everyone so noone can build AGI? Maybe, but I don't think so. Or are you saying that not building an AGI will destroy all humans? This I strongly disagree with. I don't know what a "civiliatzionally significant" amount of time is. For me, the next 10 years... (read more)

Yup. Anything short of that is just a delaying tactic. From the last part of your comment, you seem to agree with that, actually. 1000 years is still just a delay. But I didn't see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution. Also, isn't suppressing fully general AGI actually a separate question from building narrow AI? You could try suppress fully general AGI and narrow AI. Or you could build narrow AI while still also trying to do fully general AGI. You can do either with or without the other. I don't know if it's distracting any individuals from finding any way to guarantee good AGI behavior[1]. But it definitely tends to distract social attention from that. Finding one "solution" for a problem tends to make it hard to continue any negotiated process, including government policy development, for doing another "solution". The attitude is "We've solved that (or solved it for now), so on to the next crisis". And the suppression regime could itself make it harder to work on guaranteeing behavior. True, I don't don't know if the good behavior problem can be solved, and am very unsure that it can be solved in time, regardless. But at the very least, even if we're totally doomed, the idea of total, permanent suppression distracts people from getting whatever value they can out of whatever time they have left, and may lead them to actions that make it harder for others to get that value. Oh, no, I don't think that at all. Given the trends we seem to be on, things aren't looking remotely good. I do think there's some hope for solving the good behavior problem, but honestly I pin more of my hope for the future on limitations of the amount of intelligence that's physically possbile, and even more on limitations of what you can do with intelligence no matter how much of it you have. And another, smaller, chunk on it possibly turning out that a random self-improving intelligence simply w

The first rule is that ASI is inevitable, and within that there are good or bad paths.


I don't agree with this. ASI is not inevitable, as we can always decide not to develop it. Nobody will even lose any money! As long as we haven't solved alignment, there is no "good" path involving ASI, and no positive ROI. Thinking that it is better that player X (say, Google) develops ASI first, compared to player Y (say, the Chinese) is a fallacy IMO because if the ASI is not aligned with our values, both have the same consequence.

I'm not saying focusing on narro... (read more)

If you really think that through in the long-term it means a permanent ban on compute that takes computers back to the level they were in the 1970s, and a global enforcement system to keep it that way. Furthermore, there are implications for space colonization: if any group can blast away from the SOL system and colonize a different solar system, they can make an ASI there and it can come back and kill us all. Similarly, it means any development on other planets must be monitored for violations of the compute limit, and we must also monitor technology that would indirectly lead to a violation of the limits on compute, i.e. any technology that would allow people to build an advanced modern chip fab without the enforcers detecting it. In short, you are trying to impose an equilibrium which is not incentive compatible - every agent with sufficient power to build ASI has an incentive to do so for purely instrumental reasons. So, long-term the only way to not build misaligned, dangerous ASI is build an aligned ASI. However that is a long-term problem. In the short term there are a very limited number of groups who can build it so they can probably coordinate.

And if that function is simple (such as "exist as long as possible"), it can pretty soon research virtually everything that matters, and then will just go throw motions, devouring the universe to prolong it's own existence to near-infinity. 

I think that even with such a very simple goal, the problem of a possible rival AI somewhere out there in the universe remains. Until the AI can rule that out with 100% certainty, it can still gain extra expected utility out of increasing its intelligence.

Also, the more computronium there is, the bigger is the chan

... (read more)

I agree that with temporal discounting, my argument may not be valid in all cases. However, depending on the discount rate, even then increasing computing power/intelligence may raise the expected value enough to justify this increase for a long time. In the case of the squiggle maximizer, turning the whole visible universe into squiggles beats turning earth into squiggles by such a huge factor that even a high discount rate would justify postponing actually making any squiggles to the future, at least for a while. So in cases with high discount rates, it ... (read more)

Load More