LESSWRONGTags
LW

Honesty

•

Applied to Deep Honesty by Aletheophile 1mo ago

•

Applied to Glomarization FAQ by Zane 5mo ago

•

Applied to Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models by Felix Hofstätter 7mo ago

•

Applied to Lying is Cowardice, not Strategy by RobertM 8mo ago

•

Applied to Discovering Latent Knowledge in the Human Brain: Part 1 – Clarifying the concepts of belief and knowledge by Joseph Emerson 8mo ago

•

Applied to Uncovering Latent Human Wellbeing in LLM Embeddings by ChengCheng 9mo ago

•

Applied to Assume Bad Faith by Zack_M_Davis 10mo ago

•

Applied to Ground-Truth Label Imbalance Impairs the Performance of Contrast-Consistent Search (and Other Contrast-Pair-Based Unsupervised Methods) by Tom Angsten 10mo ago

•

Applied to “Desperate Honesty” by Agnes Callard by David Gross 10mo ago

•

Applied to Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS) by Scott Emmons 1y ago

•

Applied to [RFC] Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision". by gekaklam 1y ago

•

Applied to How to find cool things in a new place by Sam F. Brown 1y ago

•

Applied to "Status" can be corrosive; here's how I handle it by Akash 1y ago

•

Applied to Five Reasons to Lie by Dzoldzaya 1y ago

•

Applied to Honesty, Openness, Trustworthiness, and Secrets by NormanPerlmutter 1y ago

•

Applied to How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme by Collin 2y ago

•

Applied to Discussion: Was SBF a naive utilitarian, or a sociopath? by NicholasKross 2y ago