Karl von Wendt — LessWrong

Yes, I agree, although I don't believe that COT reading will carry us very far into the future, it is already pretty unreliable and using it for optimization would ruin it completely.

Alignment is difficult, that is my whole point - with an emphasis on the hypothesis that with any kind of only reinforcement-learning-based approach, it is virtually impossible. IF we could find a way to create some kind of genuine "care about humans module" within an AI that is similar to the kind of parent-child-altruism that I write about, we might have a chance. But the problem is that no one knows how to do that, and even in humans it is a quite fragile mechanism.

One additional thought: Evolution has created the parent-child-care-mechanism through some kind of reinforcement-learning, but is optimizing for a different objective compared to our current AI training process - not any kind of direct evaluation of human behavior, but survival and reproduction. Maybe the evolution of spiral personas is closer to the way evolution works. But of course, in this case AI is a different species, a parasite, and we are the hosts.

What can we learn from parent-child-alignment for AI?

Karl von Wendt1d20

I'm not a machine learning expert, so I'm not sure what exactly causes sycophancy. I don't see it as a central problem of alignment; it is just a symptom of a deeper problem to me.

My point is more general: To achieve true alignment in the sense of an AI doing what it thinks is "best for us", it is not sufficient to train it by rewarding behavior. Even it the AI is not sycophantic, it will pursue some goal that we trained into it, so to speak, and that goal will most likely not be what we would have wanted it to be in hindsight.

Contrast that with the way I behave towards my sons: I have no idea what their goals are, so I can't say that my goals are "aligned" with their goals in any strict sense. Instead, I care about them, their wellbeing, but also their independence of me and their ability to find their own way to a good life. I don't think this kind of intrinsic motivation can be "trained" into an AI with any kind of reinforcement learning.

I enjoyed most of IABIED

Karl von Wendt1mo164

Tricky hypothesis 2: But the differences between the world of today and the world where ASI will be developed don't matter for the prognosis.

I don't think that the authors implied this. Right in the first chapter, they write:

If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.

(emphasis by me). Even if it is not always clearly stated, I think they don't believe that ASI should never be developed, or that it is impossible in principle to solve alignment. Their major statement is that we are much farther from solving alignment than from building a potentially uncontrollable AI, so we need to stop trying to build it.

Their suggested measures in part III (whether helpful/feasible or not) are meant to prevent ASI under the current paradigms, with the current approaches to alignment. Given the time gap, I don't think this matters very much, though - if we can't prevent ASI from being built as soon as it is technically possible, we won't be in a world that differs enough from today's to render the book title wrong.

The Rise of Parasitic AI

Karl von Wendt2mo191

Thank you very much for this post, which is one of the most scary posts I've read on LessWrong - mainly because I didn't expect that this could already happen right now at this scale.

I have created a German language video about this post for my YouTube channel, which is dedicated to AI existential risk:

If Anyone Builds It, Everyone Dies: Advertisement design competition

Karl von Wendt4mo10

Thanks again! My drafts are of course just ideas, so they can easily be adapted. However, I still think it is a good idea to create a sense of urgency, both in the ad and in books about AI safety. If you want people to act, even if it's just buying a book, you need to do just that. It's not enough to say "you should read this", you need to say "you should read this now" and give a reason for that. In marketing, this is usually done with some kind of time constraint (20% off, only this week ...).

This is even more true if you want someone to take measures against something that is in the mind of most people still "science fiction" or even "just hype". Of course, just claiming that something is "soon" is not very strong, but it may at least raise a question ("Why do they say this?").

I'm not saying that you should give any specific timeline, and I fully agree with the MIRI view. However, if we want to prevent superintelligent AI and we don't know how much time we have left, we can't just sit around and wait until we know when it will arrive. For this reason, I have dedicated a whole chapter on timelines in my own German language book about AI existential risk and also included the AI-2027 scenario as one possible path. The point I make in my book is not that it will happen soon, but that we can't know it won't happen soon and that there are good reasons to believe that we don't have much time. I use my own experience with AI since my Ph.D. on expert systems in 1988 and Yoshua Bengio's blogpost on his change of mind as examples of how fast and surprising progress has been even for someone familiar with the field.

I see your point about how a weak claim can water down the whole story. But if I could choose between a 100 people convinced that ASI would kill us all, but with no sense of urgency, and 50 or even 20 who believe both the danger and that we must act immediately, I'd choose the latter.

If Anyone Builds It, Everyone Dies: Advertisement design competition

Karl von Wendt4mo10

Thanks! I don't have access to the book, so I didn't know about the timelines stance they take.

Still, I'm not an advertising professional, but subjunctives like "may" and "could" seem significantly weaker to me. As far as I know, they are rarely used in advertising. Of course, the ad shouldn't contain anything that is contrary to what the book says, but "close" seems sufficiently unspecific to me - for most laypeople who never thought about the problem, "within the next 20 years" would probably seem pretty close.

A similar argument could be made about the second line, "it will kill everyone", while the book title says "would". But again, I feel "would" is weaker than "will" (some may interpret it to mean that there may be additional prerequisites necessary for an ASI to kill everyone, like "consciousness"). Of course, "will" can only be true if a superintelligence is actually built, but that goes without saying and the fact that the ASI may not be built at all is also implicit in the third line, "we must stop this".

If Anyone Builds It, Everyone Dies: Advertisement design competition

Karl von Wendt4mo10

Thanks!

If Anyone Builds It, Everyone Dies: Advertisement design competition

Karl von Wendt4mo50

I'm not a professional designer and created these in Powerpoint, but here are my ideas anyway.

General idea:

2:1 billboard version:

1:1 Metro version:

With yellow background:

Too Soon

Karl von Wendt5mo20

Thanks for your comment! If we talk about AGI and define this as "generally as intelligent as a human, but not significantly more intelligent", then by definition it wouldn't be significantly better at figuring out the right questions. Maybe AGI could help with that by enhancing our capacity for searching for the right questions, but it shouldn't be a fundamental difference, especially if we weigh the risk of losing control over AI against it. If we talk about superintelligent AI, it's different, but the risks are even higher (however, it's not easy to draw a clear line between AGI and ASI).

All in all, I would agree that we lose some capabilities to shape our future if we don't develop AGI, but I believe that this is the far better option until we understand how to keep AGI under control or safely and securely align it to our goals and values.

Too Soon

Karl von Wendt6mo158

Could you explain why exactly AGI is "a necessity"? What can we do with AGI that we can't do with highly specialized tool AI and one ore more skilled human researchers?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments