Epistemic status: untested ideas
Introduction
This post is going to describe a few ideas I have regarding AI alignment. It is going to be bad, but I think there are a few nuggets worth exploring. The ideas described in this post apply to any AI trained using a reward function directly designed by humans. With AIs where the reward function is not directly designed by humans, it might be nice if some of the concepts here were present and some (Noise and A knowledge cap) could potentially be added as a final step.
Terms (in the context of this paper)
Reward-function: A function which the AI uses to adjust its model.
Variable: In this context variables are... (read 3009 more words →)