LESSWRONGTags
LW

Language Models

•

Applied to Navigating LLM embedding spaces using archetype-based directions by mwatkins 3d ago

•

Applied to Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant by Olli Järviniemi 5d ago

•

Applied to On precise out-of-context steering by Olli Järviniemi 8d ago

•

Applied to Mechanistically Eliciting Latent Behaviors in Language Models by Vanessa Kosoy 10d ago

•

Applied to LLMs could be as conscious as human emulations, potentially by weightt an 11d ago

•

Applied to An interesting mathematical model of how LLMs work by Bill Benzon 11d ago

•

Applied to LLMs seem (relatively) safe by JustisMills 15d ago

•

Applied to At last! ChatGPT does, shall we say, interesting imitations of “Kubla Khan” by Bill Benzon 16d ago

•

Applied to How LLMs Work, in the Style of The Economist by Rocket 18d ago

•

Applied to What's up with all the non-Mormons? Weirdly specific universalities across LLMs by mwatkins 21d ago

•

Applied to Inducing Unprompted Misalignment in LLMs by Sam Svenningsen 22d ago

•

Applied to An examination of GPT-2's boring yet effective glitch by niplav 23d ago

•

Applied to Claude 3 Opus can operate as a Turing machine by Gunnar_Zarncke 24d ago

•

Applied to Experiments with an alternative method to promote sparsity in sparse autoencoders by Eoin Farrell 25d ago

•

Applied to Claude wants to be conscious by Joe Kwon 1mo ago

•

Applied to Barcoding LLM Training Data Subsets. Anyone trying this for interpretability? by right..enough? 1mo ago

•

Applied to Is LLM Translation Without Rosetta Stone possible? by cubefox 1mo ago

•

Applied to End-to-end hacking with language models by tchauvin 1mo ago

•

Applied to Language and Capabilities: Testing LLM Mathematical Abilities Across Languages by Ethan Edwards 1mo ago