Wiki Contributions


What 2026 looks like

I never meant to claim that my position was “clever people don’t seem worried so I shouldn’t be”. If that’s what you got from me, then that’s my mistake. I’m incredibly worried as a matter of fact, and much more importantly, everyone I mentioned also is to some extent or another, as you already pointed out. What I meant to say but failed to was that there’s enough disagreement in these circles that near-absolute confidence in doom seems to be jumping the gun. That argument also very much holds against people who are so certain that everything will go just fine.

I guess most of my disagreement comes from 4. Or rather, the implication that having an exact formal specification of human values ready to be encoded is necessarily the only way that things could possibly go well. I already tried to verbalize as much earlier, but maybe I didn’t do a good job of that either.

All AGI safety questions welcome (especially basic ones) [monthly thread]

I apologize for my ignorance, but are these things what people are actually trying in their own ways? Or are they really trying the thing that seems much, much crazier to me?

What 2026 looks like

Well, we can agree that the default outcome is probably death.

So, in my previous comment, I explained why I tend to not think Complexity of Value necessarily dooms us. I doubt you find the aforementioned reasoning remotely reassuring, but I’d be interested in finding out why you think that it shouldn’t be. Would you be willing to try and explain that to me?

What 2026 looks like

So, and please correct me if I’m wrong, would you say that the main source of your hopelessness comes from the idea of human values being too complex to correctly program into anything? Because I basically agree with that idea, but it doesn’t really inspire much doomerism in me. I already always believed that trying to “solve” ethics was pretty much futile before I got introduced to LW, but I never gave that much weight in terms how much it affects alignment due to the following reason:

I just don’t expect that any of the clever people who I tend to defer to are actually trying to do exactly this; “this” being trying to actually, literally reverse-engineer human values and then encode them.

The idea seems obviously wrong enough that I honestly don’t believe that anyone working in the alignment field who thinks that the problem is solvable from at least a technical standpoint (Paul Christiano, Richard Ngo, Nate Soares, etc.) haven’t already considered this.

However, our conversation here has inspired me to ask a question regarding this in the latest monthly AGI safety questions thread.

All AGI safety questions welcome (especially basic ones) [monthly thread]

So, something I am now wondering is: Why don’t Complexity of Value and Fragility of Value make alignment obviously impossible?

Maybe I’m misunderstanding the two theories, but don’t they very basically boil down to “Human values are too complex to program”? Because that just seems like something that’s objectively correct. Like, trying to do exactly that seems like attempting to “solve” ethics which looks pretty blatantly futile to me.

I (hopefully) suspect that I have the exact shape of the issue wrong, and that (most) people aren’t actually literally trying to reverse engineer human morality and then encode it.

If that actually is what everyone is trying to do, then why is it only considered “difficult” and not outright undoable?

What 2026 looks like

I read the blogspot post, and in the comments you said that even if every mind on the planet were to work on the problem we would still have almost no chance. Unless you know something nobody else does, this seems, and please forgive my bluntness, batshit crazy.

I understand the argument about how accurately judging difficulty is something that’s usually only doable when you’re already in a place where you can kind of see a solution, even if this argument doesn’t remotely register to me as the total knockdown I intuit you think it is. Even if I did totally agree that it was as bad a sign as you believe it is, I still don’t see how it warrants that level of pessimism.

You’ve claimed before that your hopelessness when it comes to alignment is based only on a strong intuition, and that you don’t believe you “know” anything that others don’t. I find this claim to be increasingly hard to believe given the near-total confidence in your continuous predictions about our odds.

Maybe asking you if you think you know something others don’t about alignment is a wrong question, so instead I’ll make a (hopefully) different attempt and ask the following; Do you believe that you are grasping something that seems objective to you on an intellectual and/or conceptual level that others (“others” being people doing research that is at least remotely relevant to alignment or knowledgeable people in the EA/LW/Rat-and-rat-adjacent communities who are more optimistic than you) are failing to grasp, and therefore not availing them the “truth” that alignment is so inhumanly difficult? (If so, but you worry that your answer might result in in you stepping on someone else’s toes, then feel free to message me your honest answer in dms. You can ignore this offer if you have no such worries).

If no, then I find it extremely difficult to sympathize with your efforts elsewhere to sell the idea to people that we are so hopelessly, irrevocably, inevitably doomed that the most “dignified” thing to do would be to spend our remaining lives enjoying the “sunshine” rather than actually trying to do anything about the problem.

Your behavior surprises me further when I realize that this is something even Yudkowsky, one of the most pessimistic people in the community, explicitly advises not to do in his “Death With Dignity” post, which seems to suggest, IMO, that your position is even more doomer-ific. Again, this honestly seems crazy to me when proclamations to that effect are coming from someone who claims that they don’t know anything. (Disclaimer: No, I’m not saying that you need to have some special knowledge/permission to be allowed to disagree with Yudkowsky about things without seeming crazy, nor could I plausibly see myself believing such a thing.)

I’d like to try and dissuade anyone from conceiving any notions that I think that I am privy to any information that makes doom or not-doom look inescapable, nor do I think that I’m grasping something that pessimistic people aren’t also grasping that avails me the “truth”. I don’t know much aside from the more basic arguments, and I’m not especially optimistic or pessimistic about our chances, just admittedly highly uncertain.

A Quick Guide to Confronting Doom

When you say that you know more of Yudkowsky's reasoning and find it compelling, is that meant to imply that he has a more explicit, stronger argument for P(doom) which he hasn't shared elsewhere? Or is the information publicly accessible?

Can someone explain to me why MIRI is so pessimistic of our chances of survival?

“Define "doomed". Assuming Murphy's law, they will eventually fail. Yet some "prosaic" approaches may be on average very helpful.”

I’m defining “doomed” here as not having a chance of actually working in the real world before the world ends. So yes, that they will eventually fail, in some way or another.

“Human values aren't a static thing to be "aligned" with. They can be loved, traded with, etc.”

My understanding is that human values don’t have to be static for alignment to work. Values are constantly changing and vary across the world, but why is it so difficult to align it with some version of human values that doesn’t result in everyone dying?

Can someone explain to me why MIRI is so pessimistic of our chances of survival?

Yeah, I was aware of that post before posting this question- I just posted it anyway in hopes of drawing in a different range of answers which feel more compelling to me personally.

A Quick Guide to Confronting Doom

Sorry if this is a silly question, but what exactly are “log odds” and what do they mean in this context?

Load More