x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Puria Radmard — LessWrong
Puria Radmard
I’m helping build geodesicresearch.org
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
184
Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
Ω
1mo
Ω
23
20
Architectures for Increased Externalisation of Reasoning
2mo
2
44
Generalisation Hacking: a first look at adversarial generalisation failures in deliberative alignment
Ω
3mo
Ω
2
33
I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs
5mo
0
Comments