[Before I begin: If you don't like this post, please let me know why. Even just few words like boring/off-topic/poorly-written may give me something to work with.] I wish to share a late struggle I have with rationality, because I think that it touches some interesting points. But more importantly...
One feature of natural languge that seem to be essential is the (blurry) distinction between 3 layers: 1. Semantics of basic units 2. Syntactic rules for how to combine the meanings of basic units. 3. Pragmatic considerations for modifying the literal meaning by taking into account things like context, shared...
There is a line in alignment-related thinking, of looking for ways that agents will tend to be similar. An early example is convergent instrumental goals, and a later is natural abstractions. Those two ideas share an important attribute - trying to think on something "mental" (values, abstractions) as at least...
I've been thinking about AI corrigibility lately and have come up with a potential solution that probably has been refuted, but I'm not aware of a refutation. The solution I'm proposing is to condition both the actor and the critic on a goal-representing vector g, change it multiple times during...
[Edit: after reading the comments and thinking more about in-context learning, I don't endorse most of what's written here. I explain why I'm the end of the post.] I notice a common confusion when people talk about deep learning. Before I try to describe it in general, let’s start with...
A Preface to Intro to Humans I used to understand humans. We were God's avatars on earth, sent to witness His Glory and enjoy His Grace - maybe up to a measurment error. Than God died, leaving some huge holes in my worldview. To address one of these holes I...