LESSWRONGTags
LW

AI

•

Applied to Do Not Mess With Scarlett Johansson by TagWrong 1h ago

•

Applied to Cicadas, Anthropic, and the bilateral alignment problem by TagWrong 5h ago

•

Applied to Announcing Human-aligned AI Summer School by TagWrong 8h ago

•

Applied to Each Llama3-8b text uses a different "random" subspace of the activation space by TagWrong 9h ago

•

Applied to ARIA's Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th by TagWrong 10h ago

•

Applied to Anthropic announces interpretability advances. How much does this advance alignment? by TagWrong 18h ago

•

Applied to EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024 by TagWrong 20h ago

•

Applied to Mitigating extreme AI risks amid rapid progress [Linkpost] by TagWrong 21h ago

•

Applied to On Dwarkesh’s Podcast with OpenAI’s John Schulman by TagWrong 1d ago

•

Applied to Is deleting capabilities still a relevant research question? by TagWrong 1d ago

•

Applied to New voluntary commitments (AI Seoul Summit) by TagWrong 1d ago

•

Applied to The Problem With the Word ‘Alignment’ by particlemania 2d ago

•

Applied to What's Going on With OpenAI's Messaging? by TagWrong 2d ago

•

Applied to Harmony Intelligence is Hiring! by TagWrong 2d ago

•

Applied to [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice. by TagWrong 2d ago

•

Applied to Are there any groupchats for people working on Representation reading/control, activation steering type experiments? by TagWrong 2d ago