The Onion Test for Personal and Institutional Honesty

[co-written by Chana Messinger and Andrew Critch, Andrew is the originator of the idea]

You (or your organization or your mission or your family or etc.) pass the “onion test” for honesty if each layer hides but does not mislead about the information hidden within.

When people get to know you better, or rise higher in your organization, they may find out new things, but should not be shocked by the types of information that were hidden. If they are, you messed up in creating the outer layers to describe appropriately the kind-of-thing that might be inside. 

Examples

Positive Example: 

Outer layer says "I usually treat my health information as private."

Next layer in says: "Here are the specific health problems I have: Gout, diabetes."
 

Negative example:

Outer layer says: "I usually treat my health info as private."

Next layer in: "I operate a cocaine dealership.  Sorry I didn't warn you that I was also private about my illegal activities."
 

Negative example:

Outer layer says: "Is it ok if I take notes on our conversation?"

Next layer in: "Here’s the group chat where I mocked each point you made to 12 people, some of whom know you”

 

Positive Example: 

Outer layer says "Is it ok if I take notes on our conversation?  Also, I’d like to share my unfiltered thoughts about it with some colleagues later."

Next layer in says: "Jake thinks the new emphasis on wood-built buildings won’t last. Seems overconfident."

------------------------------------------------------------------------------------------------

Passing the test is a function both of what you conveyed (explicitly and implicitly) and the expectations of others. If it’s normal to start mocking group chats, then it doesn’t need to be said to avoid shock and surprise. The illusion of transparency comes to bite here.

Andrew:

Social friction minimization is the default trend that shapes the outer layers of a person or institution, by eroding away the bits of information that might cause offence, leaving layers of more pungent information underneath.  The “onion model” of honesty or integrity is that each layer of your personality or institution should hide but not mislead about the layer underneath it.   This usually involves each layer sharing something about the kinds of information that are in the next layer in, like “I generally keep my health information private”, so people won’t assume that a lack of info about your health means you’re doing just fine health-wise.

It takes a bit of work to put sign-posts on your outer layer about what kinds of information are inside, and it takes more work to present those sign-posts in a socially smooth way that doesn’t raise unnecessary fears or alarms.  However, if you put in that work, you can safely get to know people without them starting to wonder, “What else is this person or institution hiding from me?”  And, if everyone puts in that work, society in general becomes more trustworthy and navigable.

I started using the onion model in 2008, and since then, I’ve never told a lie.  It’s surprisingly workable once you get the hang of it.  Some people think privacy is worse than lies, but I believe the opposite is true, and I think it's worth putting in the effort to quit lying entirely if you’re up to the challenge.  Going a bit further, you can add an outer layer of communications that basically tells people what kinds of things you’re keeping private, so not only have you not lied, you’ve also avoided misleading them.  That’s the whole onion model.

Chana:

I have found this model extremely useful in the last few months talking about organizational strategy as a way of carving between “not everyone gets to know everything” and “actively pointing people in the wrong direction about what’s true lacks integrity” and avoiding “I didn’t lie but I knowingly misled.”

So far I have thought about it as a backwards reflecting device - what on the inside would people be shocked to find out, and how can I make sure they are not shocked, rather than forward thinking and signposting all the things I might want to signpost, but I could imagine that changing. (ie right now I’m taking this as a useful quick tool, rather than a full orientation to honesty as Andrew does, but that could definitely change).

In general, over the last few years I have shifted pretty far towards “transparency, honesty, earnestness are extremely powerful and fix a lot of things that can otherwise go wrong.”

On a different note, for me, virtue ethics is attractive, but not real, and tests for integrity are important and useful pointers at things that frequently go wrong and can go better, rather than referenda on your soul. I would guess there are situations in which glomarizing is insufficient, and focusing too much on integrity will reveal the existence of secrets you have no interest in revealing, at least if you are not massively skilled at it.

[Some small edits made, including to the title, for clarification purposes]

New to LessWrong?

New Comment
31 comments, sorted by Click to highlight new comments since: Today at 8:40 PM

I have thoughts, but they are contingent on better understanding what you mean by "types" of hidden information. For example, you used "operating a cocaine dealership" as a "type" of information hidden by the question about health information. Operating a cocaine dealership is not a health matter, except perhaps indirectly if you get high on your own supply.

To further illustrate this ambiguity, a person might be having gay sex in a country where gay sex is criminalized, morally condemned, and viewed as a shocking aberration from normal human behavior. It does not seem to me to be out of integrity for a gay person to refrain from telling other people that they have gay sex in this (or any other) context.

Where this becomes problematic is when the two people have different expectations about what constitutes reasonable expectations and moral behavior. If we give free moral license to choose what to keep private, then it seems to me there is little difference between this onion model and an algorithm for "defining away dishonesty." One can always justify oneself by saying "other people were foolish for having had such unreasonable expectations as to have been mislead or upset by my nondisclosure." If "outsiders" are expected to be sufficiently cynical, then the onion model would even justify outright lying and withholding outrageous misbehaviors as within the bounds of "integrity," as long as other people expected such infractions to be occurring.

It short, it seems that this standard of integrity reduces to "they should have known what they were getting into."

As such, the onion model seems to rely on a pre-existing consensus on both what is epistemically normal and what is morally right. It is useful as a recipe for producing a summary in this context, but not for dealing with disagreement over what is behaviorally normal or right.

As such, the onion model seems to rely on a pre-existing consensus on both what is epistemically normal and what is morally right. It is useful as a recipe for producing a summary in this context, but not for dealing with disagreement over what is behaviorally normal or right.

I agree.  For me it's more of a characterization of honesty, not integrity (even though I consider honesty an aspect of integrity).  Perhaps we should change the name.

I think part of my answer here is "The more a person is a longterm trade partner, the more I invest in them knowing about my inner layers. If it seems like they have different expectations than I do, I'm proactive in sharing information about that."

Yes, I agree the onion model can be a way to think about summarization and how to prioritize information sharing. I just don't see it as particularly helpful for integrity.

Figuring out the edge cases about honesty and truth seem important to me, both as a matter of personal aesthetics and as a matter for LessWrong to pay attention to. One of the things people have used to describe what makes LessWrong special is that it's a community focused on truth-seeking, which makes "what is truth anyway and how do we talk about it" a worthwhile topic of conversation. This article talks about it, in a way that's clear. (The positive example negative example pattern is a good approach to a topic that can really suffer from illusion of transparency.)

Like Eliezer's Meta-Honesty post, the approach suggested does rely on some fast verbal footwork, though the footwork need not be as fast as Meta-Honesty. Passing the Onion Test consistently requires the same kind of comparison to alternate worlds as glomarization, which is a bit of a strike against it but that's hardly unique to the Onion Test.

I don't know if people still wind up feeling mislead? For instance, I can imagine someone saying "I usually keep my financial state private" and having their conversation partners walk away with wildly different ideas of how they're doing. Is it so bad they don't want to talk about it? Is it so good they don't want to brag? If I thought it was the former and offered to cover their share of dinner repeatedly, I might be annoyed if it turns out to be the latter.

I don't particularly hold myself to the Onion Test, but it did provide another angle on the subject that I appreciated. Nobody has yet used it this way around me, but I could also see Onion Test declared in a similar manner to Crocker's Rules, an opt-in social norm that might be recognized by others if it got popular enough. I'm not sure it's worth the limited conceptual slots a community can have for those, but I wouldn't feel the slot was wasted if Onion Tests made it that far.

This might be weird, but I really appreciate people having the conversations about what they think is honest and in what ways they think we should be honest out loud on the internet where I can read them. One can't assume that everyone has read your article on how you use truth and is thus fairly warned, but it is at least a start. Good social thing to do, A+. I don't know if more people thinking about this means we'd actually find a real consensus solution and it's probably not actually the priority, but I would like a real consensus solution and at some previous point someone's going to have to write down the prototype that leads to it.

Ultimately I don't actually want this in the Best of 2022, not because it isn't good, but because I'd worry a little about someone reading through the Best Of collections and thinking this was more settled or established than it is. The crux here is that I don't think it's settled, established, or widely read enough that people will know what you mean if you declare Onion Test. If I knew everyone on LessWrong would read everything in the Best Of 2022, then I'd change my mind and want this included so as to add the Test to our collective lexicon.

I don't have much to add, but I basically endorse this, and it's similar to what I try to do myself.

Figuring out when and how to proactively share information about my inner-layers with people is a balancing act I'm still figuring out, but I've found it generally valuable for building trust.

It's not that clear to me exactly what test/principle/model is being proposed here.

A lot of it is written in terms of not being "misleading", which I interpret as 'intentionally causing others to update in the wrong direction'. But the goal to have people not be shocked by the inner layers suggests that there's a duty to actively inform people about (some aspects of) what's inside; leaving them with their priors isn't good enough. (But what exactly does "shocked" mean, and how does it compare with other possible targets like "upset" or "betrayed"?) And the parts about "signposting" suggest that there's an aim of helping people build explicit models about the inner layers, which is not just a matter of what probabilities/anticipations they have.

I meant signposting to indicate things like saying "here's a place where I have more to say but not in this context" etc, during for instance a conversation, so I'm truthfully saying that there's more to the story.

Yeah, I think "intentionally causing others to update in the wrong direction" and "leaving them with their priors" end up pretty similar (if you don't make strong distinctions between action and omission, which I think this test at least partially rests on) if you have a good model of their priors (which I think is potentially the hardest part here).

This is an example of a clear textual writeup of a principle of integrity. I think it's a pretty good principle, and one that I refer to a lot in my own thinking about integrity.

But even if I thought it was importantly flawed, I think integrity is super important, and therefore I really want to reward and support people thinking explicitly about it. That allows us to notice that our notions are flawed, and improve them, and it also allows us to declare to each other what norms we hold ourselves to, instead of sort of typical minding and assuming that our notion of integrity matches others' notion, and then being shocked when they behave badly on our terms.

I think about this framing quite a lot. Is what I say going to lead to people assuming roughly the thing I think even if I'm not precise. So the concept is pretty valuable to me. 

I don't know if it was the post that did it, but maybe!

Isn't this more like an onion test for... honesty?

Integrity is broader.

I agree with this.

Title changed!

Curated. I'm a big fan of people thinking about how to be honest/meta-honest/not mislead/etc, it feels like doing so important for cooperation, but also important for one's own epistemics. An easy way to lie to others is to lie to yourself, and by corollary, if you can make it easy to not lie to others, you can not lie to yourself.  

I feel like there is a difference between a Secret That You Must Protect, and information that is status-restricted. 

Say you are preparing a secret birthday party for Alice, and they ask you if you have plans on their birthday. If the birthday is a Secret You Must Protect, then you would be Meta-Honest, and tell Alice you don't have plans. If it's just status-restricted, then you could tell Alice that you have something planned, but you can't say more or ruin the surprise. Thereby passing the Onion test. 

I think the danger is in confusing the two types of information. If you make a secret smelly, so that people know what kind of thing it is, then it loses a lot of the protection of being a secret in the first place. Half of a secret is the fact there is one, right? On the other hand, if you make everything a Secret You Must Protect then people may be surprised and feel betrayed when information was not sign-posted. 

birthday is a Secret You Must Protect, then you would be Meta-Honest,

Calling this "being meta-honest" feels wrong to me. (I'm not sure if it's technically wrong, but I think meta-honesty is something easy to get confused about and I think it's worth putting extra effort into making clear the term doesn't dilute by casual readers picking it up from context)

Eliezer's definition of meta-honesty is:

"Don't lie when a normal highly honest person wouldn't, and furthermore, be honest when somebody asks you which hypothetical circumstances would cause you to lie or mislead—absolutely honest, if they ask under this code. However, questions about meta-honesty should be careful not to probe object-level information."

(note sure if you're intending to use Eliezer's definition here.)

Independent of Eliezer's definition... I do think lying when keeping a secret about someone's birthday is the type of thing a meta-honest person might do... but, I wouldn't call the part where you're explicitly lying "being meta-honest." (The part where, at an ordinary time, someone asks you "would you lie to protect a surprise party?" and you say "yes" is the part where you're being meta-honest). There are a few reasonable definitions of meta-honest, but it seems wrong to call any instance of lying "being meta-honest." 

[edited]

Ray, I have several times seen you trying to defend terms that will not — and in my opinion, should not be expected to — retain the specific meaning that you or the originator hope the term will retain.  In particular,  I predict that "meta-honesty" will not stably mean the specific thing you are quoting it as being supposed to mean, even amongst the crowd you-specifically are trying to define it to.

The reason is that your linguistic prescriptions are not respecting the compositional structure of language.  "Meta" already has a meaning and "honesty" already has a meaning.  In the equilibrium of (even LessWrong readers) using the term "meta-honesty", it will end up meaning something about as general as "meta" and "honesty" suggest, such as 
a) "honesty about what you're honest about" or 
b) "honesty about meta-level questions", or
c) something else I haven't thought of.

Your quoted definition is a lot like (a), except the quote has these extra details: "Don't lie when a normal highly honest person wouldn't".  These details are not carried by the "meta" modifier in any way that seems normal to me.  For reference, here is a broadly agreed upon definition of "meta":
https://www.dictionary.com/browse/meta

I'm not sure exactly what "meta-honesty" will settle to meaning amongst your target audience.  I'm just predicting it won't be as specific as the thing you just quoted.  You can probably push towards (a) if you like that more than (b) or (c) (as I do).  But, you probably can't get all the way to a situation where "meta-honesty" stably means the full quoted definition.  When you push for it to mean the full definition, your readers have to choose between
1) trusting their own internal sense of how to compose concepts clearly and usefully, versus
2) memorizing a particular definition without a cited source and binding the definition to some otherwise-fairly-general-concept-words,
... and I think your readers who are best at clear individual thinking will reliably choose (1) over (2).

I.e.,  many of your readers will trust their own conceptual metacognition more than your assertion that they should be memorizing and awkwardly re-binding their concept-words.

Generally, I think they're right to do that, in that I think doing (1) more than (2) will improve their general ability to think clearly with themselves and with others.

Not-withstanding these points about how to go about defending terms, I think it's not unreasonable to want there to be a short term that captures "honesty that is also closed under reflection", i.e. a high level of honesty that is also fully consistent when making statements about itself e.g. "I am honest/dishonest in situation X".

Phrases like "I'm honest-under-reflection" or "I'm reflectively-honest" or "I'm meta-consistently honest" seem... more cumbersome and likely-to-cause-confusion to me, than the current attempt of "I'm meta-honest". 

"I claim to have the property 'meta-honesty', where I strive to be honest and to be reflectively consistent about it."

We can't always get what we want, and English doesn't allow all important ideas to be said succinctly, but I want to defend the attempt to say important things in short phrases.

Huh, weird.  I read Eliezer's definition of meta-honesty as not the same thing as your definition of «honesty that is closed under reflection».  Specifically, in Eliezer-meta-honesty, his honesty at the meta-level is stronger (i.e., zero tolerance for lies) than his honesty at the object level (some tolerance for lies), whereas your notion sounds like it has no such strengthening-as-you-go-up-in-meta pattern to it.  Am I misunderstanding you?

No, but I think you're misunderstanding Eliezer. Let me explain.

When I ask myself "Should I be dishonest at all in a particular situation?" I have pretty similar standards for lots of domains. The primary reason to ask is when there's genuine questions to ask about whether an extremely powerful force is attempting to extract a specific lie from me, or whether an extremely powerful immoral force is leaving me no control over what it does except via deception. For domains where this is not the case, I want to speak plainly and honestly.

When I list domains and ask how honest one ought to be in them (things like being honest about your work history to the government, honest about your relationship history to prospective partners, honest about your criminal record to anyone, honest about how your work is going to your boss, honest in conversations about your honesty to anyone, and so on), the standard is to be truthful except in a small number of situations where incredibly powerful entities or forces have broken the game board badly enough that the moral thing to do is to lie.

I say this because I don't think that being honest about your honesty is fundamentally different than being honest about other things, for all of them there's a standard of no-lying, and an extremely high bar for an powerful entity to be threatening you and everything you care about for you to have to lie.

Eliezer writes this reasoning about honesty:

And I think it's reasonable to expect that over the course of a human lifetime you will literally never end up in a situation where a Gestapo officer who has read this essay is pointing a gun at you and asking overly-object-level-probing meta-honesty questions, and will shoot you if you try to glomarize but will believe you if you lie outright, given that we all know that everyone, innocent or guilty, is supposed to glomarize in situations like that. Up until today I don't think I've ever seen any questions like this being asked in real life at all, even hanging out with a number of people who are heavily into recursion.

So if one is declaring the meta-honesty code at all, then one shouldn't meta-lie, period; I think the rules have been set up to allow that to be absolute.

I don't believe that Eliezer applies different standards of honesty to normal situations and to meta-sentences about honesty. I think he applies the same standards, and finds that you are more under threat on the object level than you are on the (explicitly-discussed) meta level.

Eliezer is very explicit and repeats many times in that essay, including in the very segment you quote, that his code of meta-honesty does in fact compel you to never lie in a meta-honesty discussion. The first 4 paragraphs of your comment are not elaborating with what Eliezer really meant, they are disagreeing with him. Reasonable disagreements too, in my opinion, but conflating them with Eliezer's proposal is corrosive to the norms that allows people to propose and test new norms.

Re-reading the post, I see I was mistaken. Eliezer is undeniably proposing an absolute rule on the meta-level, not one where dishonesty should be "held to an extremely high bar" as I discussed. 

I'll try to compress the difference between our proposals: I was proposing "Be highly honest, and be consistent when you talk about it on the meta-level", whereas Eliezer is proposing "Be highly honest, and be absolutely honest when you talk about it on the meta-level". The part I quoted was his consequentialist argument that the absolute rule would not be that costly, not a consequentialist account of when to be honest on the meta-level.

Nod. I do generally agree with this (fyi I think I more frequently complain to jargon-coiners that they are trying to coin jargon that won't actually survive memetic drift, than I complain to people about using words wrong). 

And reflecting on both this most recent example, and on Pivotal Acts Means Something Specific, (not sure if you had a third example in mind), I also think the way I went about arguing the case wasn't that great (I was doing it in a way that sounded like "speak authoritatively about what the term means" as opposed to a clarifying "so-and-so defined the word this way, for these reasons.")

I've updated my previous comment here to say "Eliezer defines it such-and-such way (not sure if you mean to be it as Eliezer defines it)", and made a similar update to the pivotal act post.

I have more thoughts about meta-honesty and how it should be defined but it's probably getting off topic. 

Cool!  This was very much in line with the kind of update I was aiming for here, cheers :)

I maybe want to add: 

The reason I made a big deal about these particular jargon-terms, was that they were both places where Eliezer noted "this is a concept that there will be a lot of pressure to distort or drift the term, and this concept is really important, so I'm going to add BIG PROMINENT WARNINGS about how important it is not to distort the concept." (AFAICT he's only done this twice, for metahonesty and pivotal acts)

I think I agree with you in both cases that Eliezer didn't actually name the concept very well, but I think it was true that the concepts were important, and likely to get distorted, and probably still would have gotten distorted even if he had named them better. So I endorse people having the move available of "put a giant warning that attempts to fight against linguistic entropy, when you have a technical term you think is important to preserve its meaning, which the surrounding intellectual community helps reinforce." 

In this case I think there were some competing principles (protect technical terms, vs avoid cluttering the nomenclature-commons with bad terms). I was trying to do the former. My main update here is that I can do the former without imposing as much costs from the latter, and think more about the tradeoffs.

Here are some possible solutions to this problem that I think will work better than trying to get people to bind "meta-honesty" to the full definition:

- You could use a name that references the source, like "meta-honesty in the sense of [source]" or "[source-name]-meta-honesty".

- You could use a longer name that captures more of the nuances you bolded, like "meta-honesty with object-level normalcy", or something else that resonates better for you.

Thinking a bit more, I think meta-honesty is most useful if it means "honest about when you're honest" ("honest about meta-level questions" doesn't actually seem that important a concept at first glance, although open to counterarguments)

I think the thing Eliezer was aiming at in Meta Honesty should probably just be called "Eliezer's code of honesty" or something (which features meta-honesty as one of it's building blocks). I agree "meta honest" doesn't sound like a term meaning the thing Eliezer was aiming at (which was more of a code to follow, than a property of a type of statement)

Yep I agree with this whole-sale.

I agree. I'll try to be more careful and clear about the wording in the future. 

This seems interesting to me but I can't yet latch onto it. Can you give examples of secrets being one or the other?

Are you distinguishing between "secrets where the existence of the secret is a big part of the secret" and "secrets where it's not"?

I think that's the gist of it. I categorize them as Secret and Private. Where Secret information is something I deny knowing, (and therefore fails to pass the onion test), and Private information is something that people can know exist, even if I won't tell them what it is (thereby passing the onion test).

Also, see this which I found relevant.