This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
2873
Diogo de Lucena — LessWrong
Diogo de Lucena
Chief Scientist at AE Studio
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
81
Mistral Large 2 (123B) seems to exhibit alignment faking
Ω
7mo
Ω
4
162
Reducing LLM deception at scale with self-other overlap fine-tuning
Ω
8mo
Ω
46
100
Science advances one funeral at a time
1y
9
91
Self-prediction acts as an emergent regularizer
Ω
1y
Ω
9
77
The case for a negative alignment tax
1y
20
226
Self-Other Overlap: A Neglected Approach to AI Alignment
Ω
1y
Ω
51
27
Video Intro to Guaranteed Safe AI
1y
0
67
AE Studio @ SXSW: We need more AI consciousness research (and further resources)
2y
8
Comments