LESSWRONG
LW

591
Diogo de Lucena
663000
Message
Dialogue
Subscribe

Chief Scientist at AE Studio

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
81Mistral Large 2 (123B) seems to exhibit alignment faking
Ω
7mo
Ω
4
162Reducing LLM deception at scale with self-other overlap fine-tuning
Ω
8mo
Ω
46
100Science advances one funeral at a time
1y
9
91Self-prediction acts as an emergent regularizer
Ω
1y
Ω
9
77The case for a negative alignment tax
1y
20
227Self-Other Overlap: A Neglected Approach to AI Alignment
Ω
1y
Ω
51
27Video Intro to Guaranteed Safe AI
1y
0
67AE Studio @ SXSW: We need more AI consciousness research (and further resources)
2y
8