LESSWRONG
LW

649
Diogo de Lucena
662000
Message
Dialogue
Subscribe

Chief Scientist at AE Studio

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
81Mistral Large 2 (123B) seems to exhibit alignment faking
Ω
6mo
Ω
4
162Reducing LLM deception at scale with self-other overlap fine-tuning
Ω
6mo
Ω
46
100Science advances one funeral at a time
11mo
9
91Self-prediction acts as an emergent regularizer
Ω
11mo
Ω
9
77The case for a negative alignment tax
1y
20
226Self-Other Overlap: A Neglected Approach to AI Alignment
Ω
1y
Ω
51
27Video Intro to Guaranteed Safe AI
1y
0
67AE Studio @ SXSW: We need more AI consciousness research (and further resources)
1y
8