Shard Theory

Applied to Contra "Strong Coherence" by DragonGod 3mo ago

Yeah I think this is good practice.

Changed first instance of "RL" to "Reinforcement Learning (RL)" because if I didn't immediately realize what it meant, someone who is learning this for the first time won't think of it either.

2Raemon4mo
Yeah I think this is good practice.

Shard theory is an alignment research program, about the relationship between training variables and learned values in trained RLReinforcement Learning (RL) agents. It is thus an approach to progressively fleshing out a mechanistic account of human values, learned values in RL agents, and (to a lesser extent) the learned algorithms in ML generally.