Joseph Banks — LessWrong

Is Open Alignment Research Creating an Infohazard?

"The public dissemination of alignment research, methodologies, and discussions about it create a corpus of data, within the global dataset, that forms an "adversarial manual” for misaligned AI systems to follow in order to avoid detection and carry out their own misaligned objectives." This is general premise of my post called The Alignment Paradox: Why Transparency Can Breed Deception.

Beyond the general problem, I'm curious about the community's take on specific mitigation strategies. What coordination mechanisms could allow for necessary research collaboration without feeding this adversarial manual?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments