ShayBenMoshe

Message

I recently finished my PhD in math (specifically, in homotopy theory), and have significant background in software engineering and cybersecurity research. More information can be found on my website.

On excluding dangerous information from training

Introduction In this short post, I would like to argue that it might be a good idea to exclude certain information – such as cybersecurity and biorisk-enabling knowledge – from frontier model training. I argue that this 1. is feasible, both technically and socially; 2. reduces significant misalignment and misuse...

Nov 17, 202323

LESSWRONG
LW

LESSWRONG
LW

ShayBenMoshe

ShayBenMoshe

ShayBenMoshe

ShayBenMoshe

On excluding dangerous information from training

On excluding dangerous information from training