x

LESSWRONG

LW

elspood — LessWrong

elspood

elspood

Message

535

2

57

15y

elspood

535

15y

Security Mindset - Fire Alarms and Trigger Signatures

Series Overview and Goals This is the second in a series of articles about applying traditional security mindset to the problems of alignment and AI research in general. As much as possible, we should try to mine the lessons from the history of security and apply them to the alignment...

Feb 9, 2023•24

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

Background I have been doing red team, blue team (offensive, defensive) computer security for a living since September 2000. The goal of this post is to compile a list of general principles I've learned during this time that are likely relevant to the field of AGI Alignment. If this is...

Jun 21, 2022•370