LESSWRONGTags
LW

Language Models

•

Applied to Looking beyond Everett in multiversal views of LLMs by kromem 4d ago

•

Applied to Testing for parallel reasoning in LLMs by Vanessa Kosoy 14d ago

•

Applied to Language Models Model Us by eggsyntax 16d ago

•

Applied to If language is for communication, what does that imply about LLMs? by Bill Benzon 21d ago

•

Applied to Applying refusal-vector ablation to a Llama 3 70B agent by Vanessa Kosoy 22d ago

•

Applied to Navigating LLM embedding spaces using archetype-based directions by mwatkins 25d ago

•

Applied to Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant by Olli Järviniemi 1mo ago

•

Applied to On precise out-of-context steering by Olli Järviniemi 1mo ago

•

Applied to Mechanistically Eliciting Latent Behaviors in Language Models by Vanessa Kosoy 1mo ago

•

Applied to LLMs could be as conscious as human emulations, potentially by weightt an 1mo ago

•

Applied to An interesting mathematical model of how LLMs work by Bill Benzon 1mo ago

•

Applied to LLMs seem (relatively) safe by JustisMills 1mo ago

•

Applied to At last! ChatGPT does, shall we say, interesting imitations of “Kubla Khan” by Bill Benzon 1mo ago

•

Applied to How LLMs Work, in the Style of The Economist by Rocket 1mo ago

•

Applied to What's up with all the non-Mormons? Weirdly specific universalities across LLMs by mwatkins 1mo ago

•

Applied to Inducing Unprompted Misalignment in LLMs by Sam Svenningsen 1mo ago

•

Applied to An examination of GPT-2's boring yet effective glitch by niplav 1mo ago