LESSWRONG
LW

SelonNerias

Message

Some alignment ideas

Epistemic status: untested ideas Introduction This post is going to describe a few ideas I have regarding AI alignment. It is going to be bad, but I think there are a few nuggets worth exploring. The ideas described in this post apply to any AI trained using a reward function...

Aug 10, 2023•1

SelonNerias

SelonNerias — LessWrong

SelonNerias

Message

Some alignment ideas

Aug 10, 2023•1

SelonNerias

Some alignment ideas

SelonNerias

Epistemic status: untested ideas

Introduction

This post is going to describe a few ideas I have regarding AI alignment. It is going to be bad, but I think there are a few nuggets worth exploring. The ideas described in this post apply to any AI trained using a reward function directly designed by humans. With AIs where the reward function is not directly designed by humans, it might be nice if some of the concepts here were present and some (Noise and A knowledge cap) could potentially be added as a final step.

Terms (in the context of this paper)

Reward-function: A function which the AI uses to adjust its model.

Variable: In this context variables are... (read 3009 more words →)