x

Jeremy Vonderfecht

Subscribe

Message

1

2

3y

Jeremy Vonderfecht

Subscribe

Message

1

2

3y

Alignment will happen by default. What’s next?

Jeremy Vonderfecht7mo10

Gotcha that is indeed meaningfully different. I still don't agree with this alignment-by-induction story, and I want to say that the standard yudkowskian model is watertight enough that your story must implicitly violate one of its assumptions. But I am still trying to pinpoint just which assumption that is.

Edit: also then this whole alignment-by-induction thing feels like a big load-bearing implicit assumption in this post

Reply

Alignment will happen by default. What’s next?

Jeremy Vonderfecht7moΩ021

Why would they suddenly start having thoughts of taking over, if they never have yet, even if it is in the training data?

This is the crux of the disagreement with the Bostrom/Yudkowsky crowd. Your syllogism seems to be

1 AGI will be an LLM

2 current LLMs don't exhibit power-seeking tendencies

3 the current training paradigm seems unlikely to instill power-seeking

4 therefore, AGI won't be power-seeking

I basically agree with this until step 4. Where I (and I think I can speak for the Bostrom/Yud crowd on this) diverge is that all of this... (read more)

Reply