This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
649
Diogo de Lucena
Chief Scientist at AE Studio
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
81
Mistral Large 2 (123B) seems to exhibit alignment faking
Ω
6mo
Ω
4
162
Reducing LLM deception at scale with self-other overlap fine-tuning
Ω
6mo
Ω
46
100
Science advances one funeral at a time
11mo
9
91
Self-prediction acts as an emergent regularizer
Ω
11mo
Ω
9
77
The case for a negative alignment tax
1y
20
226
Self-Other Overlap: A Neglected Approach to AI Alignment
Ω
1y
Ω
51
27
Video Intro to Guaranteed Safe AI
1y
0
67
AE Studio @ SXSW: We need more AI consciousness research (and further resources)
1y
8
Comments