Joseph Banks's Shortform

7th Oct 2025

1 min read

2

This is a special post for quick takes by Joseph Banks. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Joseph Banks's Shortform

5Joseph Banks

2jbash

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:06 AM

[-]Joseph Banks2mo50

Is Open Alignment Research Creating an Infohazard?

"The public dissemination of alignment research, methodologies, and discussions about it create a corpus of data, within the global dataset, that forms an "adversarial manual” for misaligned AI systems to follow in order to avoid detection and carry out their own misaligned objectives." This is general premise of my post called The Alignment Paradox: Why Transparency Can Breed Deception.

Beyond the general problem, I'm curious about the community's take on specific mitigation strategies. What coordination mechanisms could allow for necessary research collaboration without feeding this adversarial manual?

[-]jbash1mo20

It's less about "collaboration" and more about informal cross-fertilization as ideas diffuse around via random paths.

You don't want to do first-order harm to mitigate a second-order concern.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Joseph Banks's Shortform

2