x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Diogo de Lucena — LessWrong
Diogo de Lucena
Chief Scientist at AE Studio
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
81
Mistral Large 2 (123B) seems to exhibit alignment faking
Ω
9mo
Ω
4
162
Reducing LLM deception at scale with self-other overlap fine-tuning
Ω
9mo
Ω
46
100
Science advances one funeral at a time
1y
9
92
Self-prediction acts as an emergent regularizer
Ω
1y
Ω
9
79
The case for a negative alignment tax
1y
20
Review
238
Self-Other Overlap: A Neglected Approach to AI Alignment
Ω
1y
Ω
53
Review
27
Video Intro to Guaranteed Safe AI
1y
0
67
AE Studio @ SXSW: We need more AI consciousness research (and further resources)
2y
8
Comments