This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Language Models
•
Applied to
Looking beyond Everett in multiversal views of LLMs
by
kromem
4d
ago
•
Applied to
Testing for parallel reasoning in LLMs
by
Vanessa Kosoy
14d
ago
•
Applied to
Language Models Model Us
by
eggsyntax
16d
ago
•
Applied to
If language is for communication, what does that imply about LLMs?
by
Bill Benzon
21d
ago
•
Applied to
Applying refusal-vector ablation to a Llama 3 70B agent
by
Vanessa Kosoy
22d
ago
•
Applied to
Navigating LLM embedding spaces using archetype-based directions
by
mwatkins
25d
ago
•
Applied to
Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
by
Olli Järviniemi
1mo
ago
•
Applied to
On precise out-of-context steering
by
Olli Järviniemi
1mo
ago
•
Applied to
Mechanistically Eliciting Latent Behaviors in Language Models
by
Vanessa Kosoy
1mo
ago
•
Applied to
LLMs could be as conscious as human emulations, potentially
by
weightt an
1mo
ago
•
Applied to
An interesting mathematical model of how LLMs work
by
Bill Benzon
1mo
ago
•
Applied to
LLMs seem (relatively) safe
by
JustisMills
1mo
ago
•
Applied to
At last! ChatGPT does, shall we say, interesting imitations of “Kubla Khan”
by
Bill Benzon
1mo
ago
•
Applied to
How LLMs Work, in the Style of The Economist
by
Rocket
1mo
ago
•
Applied to
What's up with all the non-Mormons? Weirdly specific universalities across LLMs
by
mwatkins
1mo
ago
•
Applied to
Inducing Unprompted Misalignment in LLMs
by
Sam Svenningsen
1mo
ago
•
Applied to
An examination of GPT-2's boring yet effective glitch
by
niplav
1mo
ago