Disposable Identity
Disposable Identity has not written any posts yet.

Disposable Identity has not written any posts yet.

But 'alignment is tractable when you actually work on it' doesn't imply 'the only reason capabilities outgeneralized alignment in our evolutionary history was that evolution was myopic and therefore not able to do long-term planning aimed at alignment desiderata'.
I am not claiming evolution is 'not able to do long-term planning aimed at alignment desiderata'.
I am claiming it did not even try.
If you're myopically optimizing for two things ('make the agent want to pursue the intended goal' and 'make the agent capable at pursuing the intended goal') and one generalizes vastly better than the other, this points toward a difference between the two myopically-optimized targets.
This looks like a strong steelman of the post,... (read more)
Many comparisons are made with Natural Selection (NS) optimizing for IGF, on the grounds that this is our only example of an optimization process yielding intelligence.
I would suggest considering one very relevant fact: NS has not optimized for alignment, but only for a myopic version of IGF. I would also suggest considering that humans have not optimized for alignment either.
Let's look at some quotes, with those considerations in mind:
... (read 492 more words →)And in the same stroke that its capabilities leap forward, its alignment properties are revealed to be shallow, and to fail to generalize. The central analogy here is that optimizing apes for inclusive genetic fitness (IGF) doesn't make the resulting humans optimize mentally for
I do not often comment on Less Wrong. (Although I am starting to, this is one of my first comment!)
... (read more)Hopefully, my thoughts will become clearer as I write more, and get myself more acquainted with the local assumptions and cultural codes.
In the meanwhile, let me expand: