“Models that are only pre-trained almost certainly don’t have consequentialist goals beyond the trivial next token prediction.”
Why is it impossible for our model which is pre-trained on the whole internet to pick up consequentialism and maximization, especially when it is already picking up non-consequentialist ethics and developing a “nuanced understanding” and “some understanding of direction following … without any reinforcement learning”? Why is it not possible to gain goal-directness from pre-training on the whole internet, thereby learning it before the base goal is conceptualized/understood? For that matter, why can’t the model pickup goal-directedness and a proxy-goal at this stage? To complicate matters more couldn’t it pick up goal-directedness and a proxy-goal without picking up consequentialism and maximization?

1

5

Replying toOn Caring

UriKatz11y

On Caring

I think we need to consider another avenue in which our emotions are generated, and effect our lives. An immediate, short to medium term high is, in a way, the least valuable personal return we can expect from our actions. However, there is a more subtle yet long lasting emotional effect, which is more strongly correlated to our belief system, and our rationality. I refer to a feeling of purpose we can have on a daily basis, a feeling of maximizing personal potential, and even long term happiness. This is created when we believe we are doing the right thing, when we know there is till more to be done, and continue... (read more)

2

0

LESSWRONG
LW

LESSWRONG
LW

UriKatz

UriKatz

UriKatz

UriKatz

UriKatz

UriKatz