Mitigating collusive self-preference by redaction and paraphrasing
tldr: superficial self-preference can be mitigated by perturbation, but can be hard to eliminate Introduction Our goal is to understand and mitigate collusion, which we define as an agent’s failure to adhere to its assigned role as a result of interaction with other agents. Collusion is a risk in control,...
Mar 76