Reply to: Meta-Honesty: Firming Up Honesty Around Its Edge-Cases

Eliezer Yudkowsky, listing advantages of a "wizard's oath" ethical code of "Don't say things that are literally false", writes—

Repeatedly asking yourself of every sentence you say aloud to another person, "Is this statement actually and literally true?", helps you build a skill for navigating out of your internal smog of not-quite-truths.

I mean, that's one hypothesis about the psychological effects of adopting the wizard's code.

A potential problem with this is that human natural language contains a lot of ambiguity. Words can be used in many ways depending on context. Even the specification "literally" in "literally false" is less useful than it initially appears when you consider that the way people ordinarily speak when they're being truthful is actually pretty dense with metaphors that we typically don't notice as metaphors because they're common enough to be recognized legitimate uses that all fluent speakers will understand.

For example, if I want to convey the meaning that our study group has covered a lot of material in today's session, and I say, "Look how far we've come today!" it would be pretty weird if you were to object, "Liar! We've been in this room the whole time and haven't physically moved at all!" because in this case, it really is obvious to all ordinary English speakers that that's not what I meant by "how far we've come."

Other times, the "intended"[1] interpretation of a statement is not only not obvious, but speakers can even mislead by motivatedly equivocating between different definitions of words: the immortal Scott Alexander has written a lot about this phenomenon under the labels "motte-and-bailey doctrine" (as coined by Nicholas Shackel) and "the noncentral fallacy".

For example, Zvi Mowshowitz has written about how the claim that "everybody knows" something[2] is often used to establish fictitious social proof, or silence those attempting to tell the thing to people who really don't know, but it feels weird (to my intuition, at least) to call it a "lie", because the speaker can just say, "Okay, you're right that not literally[3] everyone knows; I meant that most people know but was using a common hyperbolic turn-of-phrase and I reasonably expected you to figure that out."

So the question "Is this statement actually and literally true?" is itself potentially ambiguous. It could mean either—

  • "Is this statement actually and literally true as the audience will interpret it?"; or,
  • "Does this statement permit an interpretation under which it is actually and literally true?"

But while the former is complicated and hard to establish, the latter is ... not necessarily that strict of a constraint in most circumstances?

Think about it. When's the last time you needed to consciously tell a bald-faced, unambiguous lie?—something that could realistically be outright proven false in front of your peers, rather than dismissed with a "reasonable" amount of language-lawyering. (Whether "Fine" is a lie in response to "How are you?" depends on exactly what "Fine" is understood to mean in this context. "Being acceptable, adequate, passable, or satisfactory"—to what standard?)

Maybe I'm unusually honest—or possibly unusually bad at remembering when I've lied!?—but I'm not sure I even remember the last time I told an outright unambiguous lie. The kind of situation where I would need to do that just doesn't come up that often.

Now ask yourself how often your speech has been partially optimized for any function other than providing listeners with information that will help them better anticipate their experiences. The answer is, "Every time you open your mouth"[4]—and if you disagree, then you're lying. (Even if you only say true things, you're more likely to pick true things that make you look good, rather than your most embarrassing secrets. That's optimization.)

In the study of AI alignment, it's a truism that failures of alignment can't be fixed by deontological "patches". If your AI is exhibiting weird and extreme behavior (with respect to what you really wanted, if not what you actually programmed), then adding a penalty term to exclude that specific behavior will just result in the AI executing the "nearest unblocked" strategy, which will probably also be undesirable: if you prevent your happiness-maximizing AI from administering heroin to humans, it'll start administering cocaine; if you hardcode a list of banned happiness-producing drugs, it'll start researching new drugs, or just pay humans to take heroin, &c.

Humans are also intelligent agents. (Um, sort of.) If you don't genuinely have the intent to inform your audience, but consider yourself ethically bound to be honest, but your conception of honesty is simply "not lying", you'll naturally gravitate towards the nearest unblocked cognitive algorithm of deception.[5]

So another hypothesis about the psychological effects of adopting the wizard's code is that—however noble your initial conscious intent was—in the face of sufficiently strong incentives to deceive, you just end up accidentally training yourself to get really good at misleading people with a variety of not-technically-lying rhetorical tactics (motte-and-baileys, false implicatures, stonewalling, selective reporting, clever rationalized arguments, gerrymandered category boundaries, &c.), all the while congratulating yourself on how "honest" you are for never, ever emitting any "literally" "false" individual sentences.

Ayn Rand's novel Atlas Shrugged[6] portrays a world of crony capitalism in which politicians and businessmen claiming to act for the "common good" (and not consciously lying) are actually using force and fraud to temporarily enrich themselves while destroying the credit-assignment mechanisms Society needs to coordinate production.[7]

In one scene, Eddie Willers (right-hand man to our railroad executive heroine Dagny Taggart) expresses horror that the government's official scientific authority, the State Science Institute, has issued a hit piece denouncing the new alloy, Rearden Metal, with which our protagonists have been planning to use to build a critical railroad line. (In actuality, we later find out, the Institute leaders want to spare themselves the embarrassment—and therefore potential loss of legislative funding—of the innovative new alloy having been invented by private industry rather than the Institute's own metallurgy department.)

"The State Science Institute," he said quietly, when they were alone in her office, "has issued a statement warning people against the use of Rearden Metal." He added, "It was on the radio. It's in the afternoon papers."

"What did they say?"

"Dagny, they didn't say it! ... They haven't really said it, yet it's there—and it—isn't. That's what's monstrous about it."

[...] He pointed to the newspaper he had left on her desk. "They haven't said that Rearden Metal is bad. They haven't said it's unsafe. What they've done is ..." His hands spread and dropped in a gesture of futility.

She saw at a glance what they had done. She saw the sentences: "It may be possible that after a period of heavy usage, a sudden fissure may appear, though the length of this period cannot be predicted. ... The possibility of a molecular reaction, at present unknown, cannot be entirely discounted. ... Although the tensile strength of the metal is obviously demonstrable, certain questions in regard to its behavior under unusual stress are not to be ruled out. ... Although there is no evidence to support the contention that the use of the metal should be prohibited, a further study of its properties would be of value."

"We can't fight it. It can't be answered," Eddie was saying slowly. "We can't demand a retraction. We can't show them our tests or prove anything. They've said nothing. They haven't said a thing that could be refuted and embarrass them professionally. It's the job of a coward. You'd expect it from some con-man or blackmailer. But, Dagny! It's the State Science Institute!"

I think Eddie is right to feel horrified and betrayed here. At the same time, it's notable that with respect to wizard's code, no lying has taken place.

I like to imagine the statement having been drafted by an idealistic young scientist in the moral maze of Dr. Floyd Ferris's office at the State Science Institute. Our scientist knows that his boss, Dr. Ferris, expects a statement that will make Rearden Metal look bad; the negative consequences to the scientist's career for failing to produce such a statement will be severe. (Dr. Ferris didn't say that, but he didn't have to.) But the lab results on Rearden Metal came back with flying colors—by every available test, the alloy is superior to steel along every dimension.

Pity the dilemma of our poor scientist! On the one hand, scientific integrity. On the other hand, the incentives.

He decides to follow a rule that he thinks will preserve his "inner agreement with truth which allows ready recognition": after every sentence he types into his report, he will ask himself, "Is this statement actually and literally true?" For that is his mastery.

Thus, his writing process goes like this—

"It may be possible after a period of heavy usage, a sudden fissure may appear." Is this statement actually and literally true? Yes! It may be possible!

"The possibility of a molecular reaction, at present unknown, cannot be entirely discounted." Is this statement actually and literally true? Yes! The possibility of a molecular reaction, at present unknown, cannot be entirely discounted. Okay, so there's not enough evidence to single out that possibility as worth paying attention to. But there's still a chance, right?

"Although the tensile strength of the metal is obviously demonstrable, certain questions in regard to its behavior under unusual stress are not to be ruled out." Is this statement actually and literally true? Yes! The lab tests demonstrated the metal's unprecedented tensile strength. But certain questions in regard to its behavior under unusual stress are not to be ruled out—the probability isn't zero.

And so on. You see the problem. Perhaps a member of the general public who knew about the corruption at the State Science Institute could read the report and infer the existence of hidden evidence: "Wow, even when trying their hardest to trash Rearden Metal, this is the worst they could come up with? Rearden Metal must be pretty great!"

But they won't. An institution that proclaims to be dedicated to "science" is asking for a very high level of trust—and in the absence of a trustworthy auditor, they might get it. Science is complicated enough and natural language is ambiguous enough, that that kind of trust that can be betrayed without lying.

I want to emphasize that I'm not saying the report-drafting scientist in the scenario I've been discussing is a "bad person." (As it is written, almost no one is evil; almost everything is broken.) Under more favorable conditions—in a world where metallurgists had the academic freedom to speak the truth as they see it (even if their voice trembles) without being threatened with ostracism and starvation—the sort of person who finds the wizard's oath appealing, wouldn't even be tempted to engage in these kinds of not-technically-lying shenanigans. But the point of the wizard's oath is to constrain you, to have a simple bright-line rule to force you to be truthful, even when other people are making that genuinely difficult. Yudkowsky's meta-honesty proposal is a clever attempt to strengthen the foundations of this ethic by formulating a more complicated theory that can account for the edge-cases under which even unusually honest people typically agree that lying is okay, usually due to extraordinary coercion by an adversary, as with the proverbial murderer or Gestapo officer at the door.

And yet it's precisely in adversarial situations that the wizard's oath is most constraining (and thus, arguably, most useful). You probably don't need special ethical inhibitions to tell the truth to your friends, because you should expect to benefit from friendly agents having more accurate beliefs.

But an enemy who wants to use information to hurt you is more constrained if the worst they can do is selectively report harmful-to-you true things, rather than just making things up—and therefore, by symmetry, if you want to use information to hurt an enemy, you are more constrained if the worst you can do is selectively report harmful-to-the-enemy true things, rather that just making things up.

Thus, while the study of how to minimize information transfer to an adversary under the constraint of not lying is certainly interesting, I argue that this "firming up" is of limited practical utility given the ubiquity of other kinds of deception. A theory of under what conditions conscious explicit unambiguous outright lies are acceptable doesn't help very much with combating intellectual dishonesty—and I fear that intellectual dishonesty, plus sufficient intelligence, is enough to destroy the world all on its own, without the help of conscious explicit unambiguous outright lies.

Unfortunately, I do not, at present, have a superior alternative ethical theory of honesty to offer. I don't know how to unravel the web of deceit, rationalization, excuses, disinformation, bad faith, fake news, phoniness, gaslighting, and fraud that threatens to consume us all. But one thing I'm pretty sure won't help much is clever logic puzzles about implausibly sophisticated Nazis.

(Thanks to Michael Vassar for feedback on an earlier draft.)

  1. I'm scare-quoting "intended" because this process isn't necessarily conscious, and probably usually isn't. Internal distortions of reality in imperfectly deceptive social organisms can be adaptive for the function of deceiving conspecifics. ↩︎

  2. If I had written this post, I would have titled it "Fake Common Knowledge" (following in the tradition of "Fake Explanations", "Fake Optimization Criteria", "Fake Causality", &c.) ↩︎

  3. But it's worth noting that the "Is this statement actually and literally true?" test, taken literally, should have caught this, even if my intuition still doesn't want to call it a "lie." ↩︎

  4. Actually, that's not literally true! You often open your mouth to breathe or eat without saying anything at all! Is the referent of this footnote then a blatant lie on my part?—or can I expect you to know what I meant? ↩︎

  5. A similar phenomenon may occur with other attempts at ethical bindings: for example, confidentiality promises. Suppose Open Opal tends to wear her heart on her sleeve and more specifically, believes in lies of omission: if she's talking with someone she trusts, and she has information relevant to that conversation, she finds it incredibly psychologically painful to pretend not to know that information. If Paranoid Paris has much stronger privacy intuitions than Opal and wants to message her about a sensitive subject, Paris might demand a promise of secrecy from Opal ("Don't share the content of this conversation")—only to spark conflict later when Opal construes the literal text of the promise more narrowly than Paris might have hoped ("'Don't share the content' means don't share the verbatim text, right? I'm still allowed to paraphrase things Paris said and attribute them to an anonymous correspondent when I think that's relevant to whatever conversation I'm in, even though that hypothetically leaks entropy if Paris has implausibly determined enemies, right?"). ↩︎

  6. I know, fictional evidence, but I claim that the kind of deception illustrated in quoted passage to follow is entirely realistic. ↩︎

  7. Okay, that's probably not exactly how Rand or her acolytes would put it, but that's how I'm interpreting it. ↩︎

New Comment
43 comments, sorted by Click to highlight new comments since: Today at 7:49 AM

It seems to me like 'intent to inform' is worth thinking about in the context of its siblings; 'intent to misinform' and 'intent to conceal.' Cousins, like 'intent to aggrandize' or 'intent to seduce' or so on, I'll leave to another time, tho you're right to point out they're almost always present, if just by being replaced by their reaction (like self-deprecation, to be sure of not self-aggrandizement).

Quakers were long renowned for following four virtues: peace, equality, simplicity, and truth. Unlike wizards, they have the benefit of being real, and so we can get more out of their experience of having to actually implement those virtues in a sometimes hostile world that pushes for compromises. So I pulled out my copy of Some Fruits of Solitude by William Penn, and the sections on Truth and Secrecy are short enough to quote in full (including Justice, which is in the middle):


144. When you speak, be sure to speak the truth, for misleading is halfway to lying, and lying is the whole way to Hell.


145. Don't believe anything against another unless you have good grounds. And don't share anything that might hurt another, unless it could cause greater hurt to others to keep it secret.


146. It is wise not to try to find out a secret, and honest not to reveal one.

147. Only trust yourself, and no one will betray you.

148. Excessive openness has the mischief of treachery, though not the malice.

One of the bits that fascinated me when I first read it was 146, which pretty firmly separates 'intent to inform' from 'honesty'; there is such a thing as ownership of information, and honesty doesn't involve giving up on that, or giving up on having secrets yourself.

What's also interesting to me is that several of them embody bits of information about the social context. 145, for example, is good advice in general, but especially important if you have a reputation for telling the truth; then you become a target for rumor-starters, as people would take seriously stories you repeat even if they wouldn't take them seriously from the original source. It also covers situations like Viliam's, drawing a line that determines which negative beliefs should be broadcast. And in 146 again, being able to respond to questions about secrets with "I'd rather not say" relies on a social context where people think it wise to not press for further details (because otherwise you encourage a lie, instead of a straightforward "please direct your attention elsewhere.").

This is personally quite helpful, thanks for posting it.

I recommend the whole book; it's quite short. (Tho looking at the linked copy, I see it's the original form, with all the archaisms; the physical one I have is edited into modern English by Eric K. Taylor.)

But I'm not trying to conceal information. I just want to ensure that the information is only engaged with in the context that doesn't lead people to draw the obviously wrong conclusions.

That feels like straightforwardly part of 'intent to inform', where if you expect someone to misunderstand you if you say X, then you don't say X. (And it seems like 148 applies.)

It's a form of writing the bottom line first and grants wide enough latitude to justify maintaining information asymmetries for personal benefit. It's even true a decent amount of the time, which is what makes it so dangerous.

At the risk of being self-aggrandizing, I think the idea of axiology vs. morality vs. law is helpful here.

"Don't be misleading" is an axiological commandment - it's about how to make the world a better place, and what you should hypothetically be aiming for absent other considerations.

"Don't tell lies" is a moral commandment. It's about how to implement a pale shadow of the axiological commandment on a system run by duty and reputation, where you have to contend with stupid people, exploitative people, etc.

(so for example, I agree with you that the Rearden Metal paragraph is misleading and bad. But it sounds a lot like the speech I give patients who ask for the newest experimental medication. "It passed a few small FDA trials without any catastrophic side effects, but it's pretty common that this happens and then people discover dangerous problems in the first year or two of postmarketing surveillance. So unless there's some strong reason to think the new drug is better, it's better to stick with the old one that's been used for decades and is proven safe." I know and you know that there's a subtle difference here and the Institute is being bad while I'm being good, but any system that tries to implement reputation loss for the Institute at scale, implemented on a mob of dumb people, is pretty likely to hurt me also. So morality sticks to bright-line cases, at the expense of not being able to capture the full axiological imperative.)

This is part of what you mean when you say the report-drafting scientist is "not a bad person" - they've followed the letter of the moral law as best they can in a situation where there are lots of other considerations, and where they're an ordinary person as opposed to a saint laser-focused on doing the right thing at any cost. This is the situation that morality (as opposed to axiology) is designed for, your judgment ("I guess they're not a bad person") is the judgment that morality encourages you to give, and this shows the system working as designed, ie meeting its own low standards.

And then the legal commandment is merely "don't outright lie under oath or during formal police interrogations" - which (impressively) is probably *still* too strong, in that we all hear about the police being able to imprison basically whoever they want by noticing small lies committed by accident or under stress.

The "wizard's oath" feels like an attempt to subject one's self to a stricter moral law than usual, while still falling far short of the demands of axiology.

(Thanks for your patience.)

This is part of what you mean when you say the report-drafting scientist is "not a bad person"—they've followed the letter of the moral law as best they can [...] your judgment ("I guess they're not a bad person") is the judgment that morality encourages you to give

So, from my perspective as an author (which, you know, could be wrong), that line was mostly a strategic political concession: there's this persistent problem where when you try to talk about harms from people being complicit with systems of deception (not even to do anything about it, but just to talk about the problem), the discussion immediately gets derailed on, "What?! Are you saying I'm a bad person!? How dare you!" ... which is a much less interesting topic.

The first line of defense against this kind of derailing is to be very clear about what is being claimed (which is just good intellectual practice that you should be doing anyway): "By systems of deception, I mean processes that systematically result in less accurate beliefs—the English word 'deception' is often used with moralizing connotations, but I'm talking about a technical concept that I can implement as literal executable Python programs. Similarly, while I don't yet have an elegant reduction of the underlying game theory corresponding to the word 'complicity' ..."

The second line of defense is to throw the potential-derailer a bone in the form of an exculpatory disclaimer: "I'm not trying to blame anyone, I'm just saying that ..." Even if (all other things being equal) you would prefer to socially punish complicity with systems of deception, by precomitting to relinquish the option to punish, you can buy a better chance of actually having a real discussion about the problem. (Making the precommitment credible is tough, though.)

Ironically, this is an instance of the same problem it's trying to combat ("distorting communication to appease authority" and "distorting communication in order to appease people who are afraid you're trying to scapegoat them on the pretext of them distorting communication to appease authority" are both instances of "distorting communication because The Incentives"), but hopefully a less severe one, whose severity is further reduced by explaining that I'm doing it in the comments.

You can also think of the "I'm not blaming you, but seriously, this is harmful" maneuver as an interaction between levels: an axiological attempt to push for a higher moral standard in given community, while acknowledging that the community does not yet uphold the higher standard (analogous to moral attempt to institute tougher laws, while acknowledging that the sin in question is not a crime under current law).

noticing small lies committed by accident or under stress.

Lies committed "by accident"? What, like unconsciously? (Maybe the part of your brain that generated this sentence doesn't disagree with Jessica about the meaning of the word lie as much as the part of your brain that argues about intensional definitions??)

Maybe I’m unusually honest—or possibly unusually bad at remembering when I’ve lied!?—but I’m not sure I even remember the last time I told an outright unambiguous lie. The kind of situation where I would need to do that just doesn’t come up that often.

I would say that you should consider yourself fortunate then, that you are living in a situation where most of the people surrounding you have your best interests in mind (or, at worst, are neutral towards your interests). For others in more adversarial situations, telling lies (or at least shading the truth to the extent that would be considered lying by the standards of this post) is a necessary survival skill.

In situations where others can hurt you, clever solution like "no comment - because this is the situation where in some counterfactual world I would prefer to be silent" results in you getting hurt.

(A few weeks ago, everyone in the company I am working for got a questionaire from management where they were asked to list the strengths and weaknesses of their colleagues. Cleverly refusing to answer, beyond plausible excuses such as "this guy works on a different project so I haven't really interacted with him much", would probably cost me my job, which would be inconvenient in multiple ways. At the same time, I consider this type of request deeply repulsive -- on Monday I am supposed to be a good team member who enjoys cooperation and teambuilding, and on Tuesday I am asked to snitch on my coworkers -- from my perspective this would hurt my personal integrity much more than mere lying. Sorry, I am officially a dummy who never notices a non-trivial weakness in anyone, now go ahead and try proving that I do.)

Also, it seems to me that in real world, bulding a prestige of a person who never lies, is more tricky than just never lying and cleverly glomarizing. For example, the prestige you keep building for years can be ruined overnight by a third party lying about you having lied to them. (And conversely, you could actually have a strategy of never lying... except to a designated set of "victims", in situations when there is no record of what you said, and who are sufficiently lower-status that you, so if they choose to accuse you publicly, they will be percieved as liars.)

First, some quick comments:

  1. Good post; I mostly agree with all specific points therein.

  2. I appreciate that this post has introduced me (via appropriate use of ‘Yudkowskian’ hyperlinking) to several interesting Arbital articles I’d never seen.

  3. Relevant old post by Paul Christiano: “If we can’t lie to others, we will lie to ourselves”.

All that having been said, I’d like to note that this entire project of “literal truth”, “wizard’s code”, “not technically lying”, etc., etc., seems to me to be quite wrongheaded. This is because I don’t think that any such approach is ethical in the first place. To the contrary: I think that there are some important categories of situations where lying is entirely permissible (i.e., ethically neutral at worst), and others where lying is, in fact, ethically mandatory (and where it is wrong not to lie). In my view, the virtue of honesty (which I take to be quite important indeed), and any commitment to any supposed “literal truth” or similar policy, are incompatible.

Clearly, this view is neither obvious nor likely to be uncontroversial. However, in lieu of (though also in the service of) further elaboration, let me present this ethical question or, if you like, puzzle:

Is it ethically mandatory always to behave as if you know all information which you do, in fact, know?

Is it ethically mandatory always to behave as if you know all information which you do, in fact, know?

Maybe I am missing the point, but since you do know all information which you do in fact, know, wouldn't behaving as if you do just mean behaving... the way in which you behave? In which case, isn't the puzzle meaningless?

On the other hand, if we understand the meaning of the puzzle to be illuminated by elriggs' first reply to it, we could rephrase it (or rather its negation) as follows:

Is it ever ethically acceptable to play a role, other than one of our common "game" roles like poker player, surprise party thrower, etc.

I would answer this question as "yes", but with a further appeal to honesty in my reasoning: I think that sometimes the inferential distance between you and the people around you is so great that the only way you can try to bridge it is by putting yourself into a role that they can understand. I can give more details over PM but am reluctant to share publically.

Maybe I am missing the point, but since you do know all information which you do in fact, know, wouldn’t behaving as if you do just mean behaving… the way in which you behave? In which case, isn’t the puzzle meaningless?

This is true in the same technically-correct-but-useless sense that it’s true to say something like “choosing what to do is impossible, since you will in fact do whatever you end up doing”. Unless we believe in substance dualism, or magic, or what have you, we have to conclude that our actions are determined, right? So when you do something, it’s impossible for you to have done something different! Well, ok, but having declared that, we do still have to figure out what to have for dinner, and which outfit to wear to the party, and whether to accept that job offer or not.

Neither do I think that talk of “playing roles” is very illuminating here.

For a better treatment of the topic, see this recent comment by Viliam.

OK, fair enough. So you are asking something like "is it ever ethical to keep a secret?" I would argue yes, because different people are entitled to different parts of your psyche. E.g. what I am willing to share on the internet is different from what I am willing to share in real life. Or am I missing something again?

Or am I missing something again?

Perhaps. Consider this scenario:

Your best friend Carol owns a pastry shop. One day you learn that her store manager, Dave, is embezzling large sums of money from the business. What do you do?

Silly question, obvious answer: tell Carol at once! Indeed, failing to do so would be a betrayal—later, when Carol has to close the shop and file for bankruptcy, her beloved business ruined, and she learns that you knew of Dave’s treachery and said nothing—how can you face her? The friendship is over. It’s quite clear: if you know that the guy is stealing, you will tell Carol, period.

Now suppose that Dave, the pastry shop manager, is actually also a friend of yours, and your knowledge of his crime is not accidental but comes because he confides in you, having first sworn you to secrecy. Foolishly, you agreed—a poor decision in retrospect, but such is life.

And now: (a) you have information (Dave is stealing from Carol); (b) ordinarily, having such information dictates your behavior in a clear way (you must take it at once to Carol); (c) yet you have sworn to keep said information secret.

Thus the question: are you obligated to behave as if you know this information (i.e., to inform Carol of Dave’s treachery)? Or, is it morally permissible for you to behave as if you know nothing (and thus to do nothing—and not only that, but to lie if Carol asks “do you know if Dave is stealing from the shop?”, etc.)?

If I understand your puzzle right, then poker, surprise parties/engagements, and those lying games you play with your friends where some people are “murderers” but are trying to hide the fact.

Is your puzzle different than that?

No, that is not the sort of thing I have in mind. Exclude games, performances, and similar things where you are (by consent of all involved) playing a role or otherwise “bracketing” your utterances within an artificial context, and consider the question again.

Ethical is undefined here, but if it was a defined standard, you’d just pick the available action that scores well on that standard, even if it doesn’t satisfy the constraint “behave as if you know all information you in fact know” (which I think the hiding Jews from a Nazi is the classic example)

If the point of solving the puzzle is to better understand the concept “ethics in relation to truth-acting” then I don’t think I’ve added much by the Nazi example or the games & performances ones.

What do you believe the point of the puzzle is? What would a good solution entail or imply?

The point of me positing the puzzle is for you (or anyone who cares to tackle this) to say what your chosen / preferred / ideal ethics answers to this; or, alternatively or additionally, what your moral intuitions say about this (and if your ethics, or your moral intuitions, or both, say nothing to this, then that too is an interesting and important fact). And the point of the puzzle itself is that the answer isn’t necessarily obvious.

“Do what your ethics says you should do” is therefore a non-answer.

If the point of solving the puzzle is to better understand the concept “ethics in relation to truth-acting” then I don’t think I’ve added much by the Nazi example or the games & performances ones.

I agree.

Thanks for the clarification.

For me the answer is no, I don’t believe it’s ethically mandatory to share all information I know to everyone if they happen to ask the right question. I can’t give a complete formalization of why, but three specific situations are 1) keeping someone else’s information secret & 2) when I predict the other person will assume harmful implications that aren’t true &3) when the other person isn’t in the right mind to hear the true information.

Ex for #3: you would like your husband to change more diapers and help clean up a little more before they leave work every day, but you just thought of it right when he came home from a long work day. It would be better to wait to give a criticism when you’re sure they’re in a good mood.

An example for #2: I had a friend have positive thoughts towards a girl that wasn’t his girlfriend. He was confused about this and TOLD HIS GIRLFRIEND WHEN THEY WERE DATING LONG DISTANCE. The two girls have had an estranged relationship for years since.

If I was my friend, I would understand that positive thoughts towards a pretty girl my age doesn’t imply that I am required to romantically engage them. Telling my girlfriend about these thoughts might be truthful and honest, but it would likely cause her to feel insecure and jealous, even though she has nothing to worry about.

For me the answer is no, I don’t believe it’s ethically mandatory to share all information I know to everyone if they happen to ask the right question.

Note that this is an answer to a considerably narrower question than the one I asked.

That having been said, I think at least some of what you mentioned / described was relevant. In any case, given your answer, the answer to the broader question must necessarily also be “no”.

So, what I wonder now is whether anyone is willing to take, and defend, the opposite view: that it is ethically mandatory at all times to behave as if you know all the information which, in fact, you know. (It is, I know, an odd—or, at least, oddly formulated—ethical principle. And yet it seems to me that it directly connected to the subject of the OP…)

I realized afterwards that only “not sharing others secrets” is an example of “it’s ethical to lie if someone asks a direct question”. The other two were more “don’t go out of your way to tell the whole truth in this situation (but wait for a better situation)”

I do believe my ethics is composed of wanting what’s “best” for others and truthful communication is just an instrumental goal.

If I had to blatantly lie every day, so that all my loved ones could be perfectly healthy and feel great, I would lie every day.

I don’t think anyone would terminally value honesty (in any of it’s forms).

Cade Metz hadn't had this much trouble with a story in years. Professional journalists don't get writer's block! Ms. Tam had rejected his original draft focused on the subject's early warnings of the pandemic. Her feedback hadn't been very specific ... but then, it didn't need to be.

For contingent reasons, the reporting for this piece had stretched out over months. He had tons of notes. It shouldn't be hard to come up with a story that would meet Ms. Tam's approval.

The deadline loomed. Alright, well, one sentence at a time. He wrote:

In one post, he aligned himself with Charles Murray, who proposed a link between race and I.Q. in "The Bell Curve."

Metz asked himself: Is this statement actually and literally true?

Yes! The subject had aligned himself with Charles Murray in one post: "The only public figure I can think of in the southeast quadrant with me is Charles Murray.".

In another, he pointed out that Mr. Murray believes Black people "are genetically less intelligent than white people."

Metz asked himself: Is this statement actually and literally true?

Yes! The subject had pointed that out in another post: "Consider Charles Murray saying that he believes black people are genetically less intelligent than white people."

Having gotten started, the rest of the story came out easily. Why had he been so reluctant to write the new draft, as if in fear of some state of sin? This was his profession—to seek out all the news that's fit to print, and bring it to the light of the world!

For that was his mastery.

Good point.

In Eliezer’s defense I’ll note that the original proposal took pains to say “At least as honest as an unusually honest person AND THEN also truthful in communicating about your meta-level principles about when you’ll lie”, so the above isn’t a literal following of what Eliezer said (because I don’t think an unusually honest person would write that). But I think that was not a very natural idea, and I mostly think of meta-honesty as about being honest on the meta level, and that it’s important, but I don’t think of it as really tied up with the object level being super honest.

New jargon term: SuperMetaHonest

And if I’m honest about being SuperMetaHonest, then I’m: SuperDuperMetaHonest.

If I wrote a sequence about it it'd be my SuperDuperMetaHonestEpistemicOpus


“If you try to Glomarize you will be too verbocious”

To effectively extend on Raemon's commentary:

I think this post is quite good, overall, and adequately elaborates on the disadvantages and insufficiencies of the Wizard's Code of Honesty beyond the irritatingly pedantic idiomatic example. However, I find the implicit thesis of the post deeply confusing (that EY's post is less "broadly useful" than it initially appears). As I understand them, the two posts are saying basically identical things, but are focused in slightly different areas, and draw very different conclusions. EY's notes the issues with the wizard's code briefly, and proceeds to go into a discussion of a potential replacement, meta-honesty, that came be summarized as "be unusually honest, and be absolutely honest about when you'd lie or refuse to answer questions." This post goes into detail about why literal honesty is insufficient in adversarial scenarios, and an excessive burden in friendly scenarios. This post then claims to "argue that this "firming up" is of limited practical utility given the ubiquity of other kinds of deception," which I think is unsupported by the arguments given in the post.

As I read the original essay, the entirety of the friendly scenarios mentioned in this post are dealt with extremely neatly by meta-honesty: be unusually honest does not preclude you from making jokes, using English, etc. Indeed, as this post argues, you don't need fancy ethics to figure out the right level of honesty with friends in the vast majority of scenarios. There are a few scenarios mentioned in the original essay where this is false, but they are also well-handled by meta-honesty applied correctly. 

The more interesting objection is what occurs in adversarial situations, where the wizard's code is hopelessly underspecified. I'd really like to see more engagement about how meta-honesty interacts with the Rearden situation, for example, from the author of this essay, because as I understand it, meta-honesty is designed to enable the exact kind of Bayesian inference that this essay dismisses as impossible. If you are in a room with our hypothetical scientist-intern, you can ask "Would you produce statements that you feel could be misleading in order to follow the wishes of your employer?" or some similar questions, allowing you to acquire all of the information necessary to Bayes out what this statement actually means. I think the bigger issue with this isn't anything about honesty, it's about what the State Science Institute is, and the effects that has on citizens. 

Another potential objection to meta-honesty based in this essay is that the type of deception involved in this essay could occur on the meta-honest level. I think that this is resolved by a difference in assumptions: EY assumes that at least the two conversational agents are roughly Bayesian, and specifies that no literal falsehoods can be provided, meaning meta-honest conversation should be strictly informative. 

Finally, as this essay points out, EY's original essay seems somewhat narrow, especially with the Bayesian stipulations. However, I think this is also addressed by the previous argument about how friendly situations are almost certainly not edge cases of honesty, meaning that definitionally, this type of work is only useful in extreme hypotheticals. The importance of this approach beyond this extreme scenarios, and the added importance in them, is discussed in the original essay.

Overall, I think this is a quite-good piece of rationalist writing, but I think it is not in opposition to the post it purports to respond to. Given that, I'm not sure what makes this post unique, and I suspect that there are quite a lot of high-quality posts that have other distinguishing factors above and beyond this post.

When's the last time you needed to consciously tell a bald-faced, unambiguous lie?—something that could realistically be outright proven false in front of your peers, rather than dismissed with a "reasonable" amount of language-lawyering.

I don't know about the last time I needed to do so, but the last time I did so was two days ago (Christmas Eve), when (IIRC) one of my grandparents asked me if I had brought board games to my aunt and uncle's house while in the presence of my aunt, uncle, and/or cousins. In fact I had, but didn't want to say that, because I had brought them as Christmas gifts for my aunt and uncle's family, and didn't want to reveal that fact, and didn't think I could get away with being evasive, so (again, IIRC) I lied about bringing them.

I have a pretty strong preference against literal/unambiguous lying, and usually I can get away with evasion when I want to conceal things, and I don't remember unambiguously lying previously, but I'm bad at remembering things and wouldn't be all that surprised if somebody showed me a recording of me telling a bald-faced lie at some other point during December.

I think one could argue that "have you brought board games?" isn't really intended to include the insides of yet-unopened presents in its scope, in which case you weren't really lying.

(I'm not sure whether I would argue that. It might depend on whether Christmas Day was nearer the start or the end of your stay...)

I think people quite frequently tell unambiguous lies of the form "I have read these terms and conditions".

'IIRC' because I remember being asked this question multiple times and lying once as an answer, but don't remember exactly who was around or who asked the time I remember lying, and am not certain that I actually lied as opposed to being very evasive or murmuring nonsensical syllables or something.

Lots of great comments already so not sure if this will get seen, but a couple possibly useful points —

Metaphors We Live By by George Lakoff is worth a skim —

Then I think Wittgenstein's Tractatus is good, but his war diaries are even better

"[Wittgenstein] sketches two people, A and B, swordfighting, and explains how this sketch might assert ‘A is fencing with B’ by virtue of one stick-figure representing A and the other representing B. In this picture-writing form, the proposition can be true or false, and its sense is independent of its truth or falsehood. LW declares that ‘It must be possible to demonstrate everything essential by considering this case’."

Lakoff illuminates some common metaphors — for example, a positive-valence mood in American English is often "up" and a negative-valence mood in American English is often "down."

If you combine Lakoff and Wittgenstein, using an accepted metaphor from your culture ("How are you?" "I'm flying today") makes the picture you paint for the other person correspond to your mood (they hear the emphasized "flying" and don't imagine you literally flying, but rather in a high-positive valence mood) — then you're in the realm of true.

There's independently some value in investigating your metaphors, but if someone asks me "Hey how'd custom building project your neighbor was doing go?" and I answer "Man, it was a fuckin' trainwreck" — you know what I'm saying: not only did the project fail, but it failed in a way that caused damage and hassle and was unaesthetic, even over and beyond what a normal "mere project failure" would be.

The value in metaphors, I think, is that you can get high information density with them. "Fuckin' trainwreck" conveys a lot of information. The only more denser formulation might be "Disaster" — but that's also a metaphor if it wasn't literally a disaster. Metaphors are sneaky in that way, we often don't notice them — but they seem like a valid high-accuracy usage of language if deployed carefully.

(Tangentially: Is "deployed" there a metaphor? Thinking... thinking... yup. Lakoff's book is worth skimming, we use a lot more metaphors than we realize...)


This post crystallized what I now think of as one of the major open problems in rationality, and in the (related but distinct) domain of intellectual integrity. While it doesn't propose solutions, I think clearly articulating a problem, and becoming deconfused about it, is often a good first step for tackling hard problems.

Two criticisms I'd make of this post are:

  • It'd be slightly nicer if it actually had a crisp summary of the problem at the end. I felt like I understood the "open problem of 'real' honesty" by the end of the post, but there wasn't a succinct paragraph I could copy into another thread to explain it. (I think this is was somewhat complicated by the final paragraphs aiming more to tie this into a critique of Meta-Honesty than to spell out the open problem)
  • Relatedly... I found this underwhelming as a critique of Meta-Honesty. The fact that Meta-Honesty does not solve the most important open problem in honesty (which, notably, neither does this post!) doesn't say much about whether Meta-Honesty is still useful for other reasons. I think Zack underestimates how important clear norms around Not-Lying are. And meanwhile, when you're in a confusing domain without a way forward, hacking away at the edges is an important tool to have in your toolbox.

I wouldn't mind removing hyperboles from socially accepted language. Don't say "everyone" if you don't mean literally everyone, duh. (I suppose that many General Semantic fans would agree with this.)

For me a complicated question is one that compares against an unspecified stardard, such as "is this cake sweet?" I don't know what kind of cakes you are used to eat, so maybe what's "quite sweet" to me is "only a bit sweet" for you. Telling literal truths, such as "yes, it has a nonzero amount of sugar, but also a nonzero amount of other things" will not help here. I don't know exactly how much sugar it contains. So, "it tastes quite sweet to me" is the best I can do here. Maybe that should be the norm.

I agree about the "nearest unblocked strategy". You make the rules; people maximize within the rules (or break them when you are not watching). People wanting to do X will do the thing closest to X that doesn't break the most literal interpretation of the anti-X rules (or break the rules in a deniable way). -- On the other hand, even trivial inconveniences can make a difference. We are not discussing superhuman AI trying to get out of the box, but humans with limited willpower who may at some level of difficulty simply give up.

The linked article "telling truth is social aggression" ignores the fact that even in competition, people make coalitions. And if you have large amounts of players, math is in favor of cooperation, at least on relatively small scale. If your school grades on a curve, it discourages helping your classmate without getting anything in return. But mutual cooperation with one classmate still helps you both against the rest of the class. The same is true about helping people create better models of the world, when the size of your group is tiny compared to the rest of the population.

The real danger these days usually isn't Gestapo, but thousands of Twitter celebrities trying to convert parts of your writing taken out of context into polarizing tweets, and journalists trying to convert those tweets into clickbait, where the damage caused to you and your family is just an externality no one cares about. This is the elephant in the room: "I personally don't disagree with X; or I disagree with X but I think there is no great harm in discussing it per se... but the social consequences of me being publicly known as 'person who talks about X' are huge, and I need to pick my battles. I have things to protect that are more important to me than my mere academic interest in X." Faced by: "But if you lie about X, how can I trust that you are not lying about Y, too?"

(Self-review.) I oppose including this post in a Best-of-2019 collection. I stand by what I wrote, but, as with "Relevance Norms", this was a "defensive" post; it exists as a reaction to "Meta-Honesty"'s candidacy in the 2018 Review, rather than trying to advance new material on its own terms.

The analogy between patch-resistence in AI alignment and humans finding ways to dodge the spirit of deontological rules, is very important, but not enough to carry the entire post.

A standalone canon-potential explanation of why I think we need a broader conception of honesty than avoiding individually false statements would look more like "Algorithms of Deception" (although that post didn't do so great karma-wise; I'm not sure whether because people don't want to read code, it was slow to get Frontpaged (as I recall), or if it's bad for some other reason).

I intend to reply to Fiddler's review, but likely not in a timely manner.

Let me argue for intentionality in communication. If your intent is to inform and communicate a fact, do so. If your intend is to convince someone to undertake an action, do so. If your intent is to impress people with your knowledge and savvy, do so. If your intent is to elicit ideas and models to see where you differ, do so.

One size does not fit all situations or all people. Talking is an act. Choose the mechanisms that fit your goals, like you do in all actions.

Humans aren't perfect, and most humans are actually pretty bad at both giving and receiving "honest" communication. Attempting to communicate with them on the level you prefer, rather than the level they're ready for, is arrogant and unhelpful.

Humans are not naturally nor necessarily aligned with your goals (in fact, nobody is fully aligned, though many of us are compatible if you zoom out far enough). It's an important social fiction to pretend they are, in order to cooperate with them, but you don't have to actually believe this falsehood.

The underlying purpose of any sort of "don't lie" moral code is to limit what you can say when "you intend to convince someone to undertake an action" or "impress people with ..."

Nominated, for the reasons outlined in my curation post. 

A well-written post on an important topic.

On the one hand this post does a great job of connecting to previous work, leaving breadcrumbs and shortening the inferential distance. On the other hand what is this at the end?

But one thing I'm pretty sure won't help much is clever logic puzzles about implausibly sophisticated Nazis.

I have no idea what this is talking about.