Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I believe Anthropic has said they won't publish capabilities research?
OpenAI seems to be sort of doing the same (although no policy AFAIK).
I heard FHI was developing one way back when...
I think MIRI sort of does as well (default to not publishing, IIRC?)

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

According to Chris Olah:

We don't consider any research area to be blanket safe to publish. Instead, we consider all releases on a case by case basis, weighing expected safety benefit against capabilities/acceleratory risk. In the case of difficult scenarios, we [Anthropic] have a formal infohazard review procedure.

Doesn't seem like it's super public though, unlike aspects of Conjecture's policy.