Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
We don't consider any research area to be blanket safe to publish. Instead, we consider all releases on a case by case basis, weighing expected safety benefit against capabilities/acceleratory risk. In the case of difficult scenarios, we [Anthropic] have a formal infohazard review procedure.
Doesn't seem like it's super public though, unlike aspects of Conjecture's policy.
I believe Anthropic has said they won't publish capabilities research?
OpenAI seems to be sort of doing the same (although no policy AFAIK).
I heard FHI was developing one way back when...
I think MIRI sort of does as well (default to not publishing, IIRC?)