Optimization, loss set at variance in RL
This is a suggestion of a reinforcement learning model with an additional conflicting dynamic between optimized data and loss function. That conflict is intended to reduce RL’s intrinsic Omohundro x-risk dynamic. Thus, it’s also an attempt to produce, though without details, a good idea for AI safety, and so as...
Jul 22, 20231