How is it possible to tell the truth?

I mean, sure, you can use your larynx to make sound waves in the air, or you can draw a sequence of symbols on paper, but sound waves and paper-markings can't be true, any more than a leaf or a rock can be "true". Why do you think you can tell the truth?

This is a pretty easy question. Words don't have intrinsic ontologically-basic meanings, but intelligent systems can learn associations between a symbol and things in the world. If I say "dog" and point to a dog a bunch of times, a child who didn't already know what the word "dog" meant, would soon get the idea and learn that the sound "dog" meant this-and-such kind of furry four-legged animal.

As a formal model of how this AI trick works, we can study sender–receiver games. Two agents, a "sender" and a "receiver", play a simple game: the sender observes one of several possible states of the world, and sends one of several possible signals—something that the sender can vary (like sound waves or paper-markings) in a way that the receiver can detect. The receiver observes the signal, and makes a prediction about the state of the world. If the agents both get rewarded when the receiver's prediction matches the sender's observation, a convention evolves that assigns common-usage meanings to the previously and otherwise arbitrary signals. True information is communicated; the signals become a shared map that reflects the territory.

This works because the sender and receiver have a common interest in getting the same, correct answer—in coordinating for the signals to mean something. If instead the sender got rewarded when the receiver made bad predictions, then if the receiver could use some correlation between the state of the world and the sender's signals in order to make better predictions, then the sender would have an incentive to change its signaling choices to destroy that correlation. No convention evolves, no information gets transferred. This case is not a matter of a map failing to reflect the territory. Rather, there just is no map.

How is it possible to lie?

This is ... a surprisingly less-easy question. The problem is that, in the formal framework of the sender–receiver game, the meaning of a signal is simply how it makes a receiver update its probabilities, which is determined by the conditions under which the signal is sent. If I say "dog" and four-fifths of the time I point to a dog, but one-fifth of the time I point to a tree, what should a child conclude? Does "dog" mean dog-with-probability-0.8-and-tree-with-probability-0.2, or does "dog" mean dog, and I'm just lying one time out of five? (Or does "dog" mean tree, and I'm lying four times out of five?!) Our sender–receiver game model would seem to favor the first interpretation.

Signals convey information. What could make a signal, information, deceptive?

Traditionally, deception has been regarded as intentionally causing someone to have a false belief. As Bayesians and reductionists, however, we endeavor to pry open anthropomorphic black boxes like "intent" and "belief." As a first attempt at making sense of deceptive signaling, let's generalize "causing someone to have a false belief" to "causing the receiver to update its probability distribution to be less accurate (operationalized as the logarithm of the probability it assigns to the true state)", and generalize "intentionally" to "benefiting the sender (operationalized by the rewards in the sender–receiver game)".

One might ask: why require the sender to benefit in order for a signal to count as deceptive? Why isn't "made the receiver update in the wrong direction" enough?

The answer is that we're seeking an account of communication that systematically makes receivers update in the wrong direction—signals that we can think of as having been optimized for making the receiver make wrong predictions, rather than accidentally happening to mislead on this particular occasion. The "rewards" in this model should be interpreted mechanistically, not necessarily mentalistically: it's just that things that get "rewarded" more, happen more often. That's all—and that's enough to shape the evolution of how the system processes information. There need not be any conscious mind that "feels happy" about getting rewarded (although that would do the trick).

Let's test out our proposed definition of deception on a concrete example. Consider a firefly of the fictional species P. rey exploring a new area in the forest. Suppose there are three possibilities for what this area could contain. With probability 1/3, the area contains another P. rey firefly of the opposite sex, available for mating. With probability 1/6, the area contains a firefly of a different species, P. redator, which eats P. rey fireflies. With probability 1/2, the area contains nothing of interest.

A potential mate in the area can flash the P. rey mating signal to let the approaching P. rey know it's there. Fireflies evolved their eponymous ability to emit light specifically for this kind of sexual communication—potential mates have a common interest in making their presence known to each other. Upon receiving the mating signal, the approaching P. rey can eliminate the predator-here and nothing-here states, and update its what's-in-this-area probability distribution from { mate, predator, nothing} to { mate}. True information is communicated.

Until "one day" (in evolutionary time), a mutant P. redator emits flashes that imitate the P. rey mating signal, thereby luring an approaching P. rey, who becomes an easy meal for the P. redator. This meets our criteria for deceptive signaling: the P. rey receiver updates in the wrong direction (revising its probability of a P. redator being present downwards from to 0, even though a P. redator is in fact present), and the P. redator sender benefits (becoming more likely to survive and reproduce, thereby spreading the mutant alleles that predisposed it to emit P. rey-mating-signal-like flashes, thereby ensuring that this scenario will systematically recur in future generations, even if the first time was an accident because fireflies aren't that smart).

Or rather, this meets our criteria for deceptive signaling at first. If the P. rey population counteradapts to make correct Bayesian updates in the new world containing deceptive P. redators, then in the new equilibrium, seeing the mating signal causes a P. rey to update its what's-in-this-area probability distribution from { mate, predator, nothing} to { mate, predator}. But now the counteradapted P. rey is not updating in the wrong direction. If both mates and predators send the same signal, than the likelihood ratio between them is one; the observation doesn't favor one hypothesis more than the other.

So ... is the P. redator's use of the mating signal no longer deceptive after it's been "priced in" to the new equilibrium? Should we stop calling the flashes the "P. rey mating signal" and start calling it the "P. rey mating and/or P. redator prey-luring signal"? Do we agree with the executive in Moral Mazes who said, "We lie all the time, but if everyone knows that we're lying, is a lie really a lie?"

Some authors are willing to bite this bullet in order to preserve our tidy formal definition of deception. (Don Fallis and Peter J. Lewis write: "Although we agree [...] that it seems deceptive, we contend that the mating signal sent by a [predator] is not actually misleading or deceptive [...] not all sneaky behavior (such as failing to reveal the whole truth) counts as deception".)

Personally, I don't care much about having tidy formal definitions of English words; I want to understand the general laws governing the construction and perversion of shared maps, even if a detailed understanding requires revising or splitting some of our intuitive concepts. (Cailin O'Connor writes: "In the case of deception, though, part of the issue seems to be that we generally ground judgments of what is deceptive in terms of human behavior. It may be that there is no neat, unitary concept underlying these judgments.")

Whether you choose to describe it with the signal/word "deceptive", "sneaky", Täuschung, הונאה, 欺瞞, or something else, something about P. redator's signal usage has the optimizing-for-the-inaccuracy-of-shared-maps property. There is a fundamental asymmetry underlying why we want to talk about a mating signal rather than a 2/3-mating-1/3-prey-luring signal, even if the latter is a better description of the information it conveys.

Brian Skyrms and Jeffrey A. Barrett have an explanation in light of the observation that our sender–receiver framework is a sequential game: first, the sender makes an observation (or equivalently, Nature chooses the type of sender—mate, predator, or null in the story about fireflies), then the sender chooses a signal, then the receiver chooses an action. We can separate out the propositional content of signals from their informational content by taking the propositional meaning to be defined in the subgame where the sender and receiver have a common interest—the branches of the game tree where the players are trying to communicate.

Thus, we see that deception is "ontologically parasitic" in the sense that holes are. You can't have a hole without some material for it to be a hole in; you can't have a lie without some shared map for it to be a lie in. And a sufficiently deceptive map, like a sufficiently holey material, collapses into noise and dust.


I changed the species names in the standard story about fireflies because I can never remember which of Photuris and Photinus is which.

Fallis, Don and Lewis, Peter J., "Toward a Formal Analysis of Deceptive Signaling"

O'Connor, Cailin, Games in the Philosophy of Biology, §5.5, "Deception"

Skyrms, Brian, Signals: Evolution, Learning, and Information, Ch. 6, "Deception"

Skyrms, Brian and Barrett, Jeffrey A., "Propositional Content in Signals"


16 comments, sorted by Highlighting new comments since Today at 9:38 AM
New Comment

This just seems like a Wittgensteinian Language Game crossed with the Symbol Grounding Problem. It's not so much that "lying can't exist" as "it is impossible to distinguish intentional deception from using different symbols". A person can confidently and truthfully state "two plus two equals duck" - all we need is for "duck" to be their symbol for "four". They're not "lying", or even "wrong" or "incoherent", their symbols are just different. Those symbols are incompatible with our own, but we don't "really" disagree. A different person could, alternatively, say "two plus two equals duck" and be intentionally deceiving - but there's nothing that can be observed about the situation to prove it, just by looking at a transcription of the text. There's also no way, exclusively through textual conversation, to "prove" that another person is using their symbols in the same way as you! Even kiki-bouba effects aren't universal - symbols can be arbitrary, once pulled out of their original context. If everyone's playing their own Language Game, shared maps are illusory - How Can Maps Be Real If Our Words Aren't Real?

P. rey consistently and unambiguously uses the symbol to mean "mating time". P. redator consistently and unambiguously uses the symbol to mean "I would like to eat you, please". Neither, in this language game, is lying to each other, or violating their own norms - but the same behaviour as above happens. Lying is just dependent on reference frame; just because there's a hole in one map doesn't open a hole in another. In any given example of "deception", we can (however artificially) construct a language game where everyone acted honestly. Lying isn't a part of what we can check on the maps here - it's an element of territory, in so far as we could only tell if someone was "really lying" if we could make direct neurological observations. Maybe not even then, if that understanding's some privileged qualia. The only time you can observe a lie with certainty is if you're the liar, as the only beliefs you can directly observe with confidence are your own.

The territory only contains signals, consequences, and benefits. Lying postulates about intention, which is unverifiable from the outside. That doesn't make "lying" meaningless, though - we can absolutely lie, and be certain that we lied - so it has meaning, but it's dependent on reference frame. When two people observe a relativistic object moving at different speeds, they can both provide truthful yet contradictory claims. When each claims the other "lied", in so far as they have their own evidence and certainty, it's a consequence of reference frame. Lying is centrifugal force, signals are centripetal. Both can be treated as real when useful for analysis. Hooray for compatibilism!

Wait, this doesn't seem right. Say 49% of people are good and truthful, 49% are evil and truthful, and 2% are evil liars. You meet a random person and are deciding whether to be friends with them. Apriori they're about equally likely to be good or evil. You ask "are you good?" They say "yeah". Now they are much more likely to be good than evil. So if the person is in fact an evil liar, their lie had the intended effect on you. It wasn't "priced into the equilibrium" or anything.

The technical explanation is still correct in the narrow sense - the message can be interpreted as "I'm either good or an evil liar", and it does increase the probability of "evil liar". But at the same time it increases the probability of "good" relative to "evil" overall, and often that's what matters.

Agreed. In this post, lying is reinterpreted as an honest signal that there are liars in our midst. The response to lying happens through a passive process of evolution.

In the human world, to accuse somebody of lying means not only that we’ve updated our probabilities on whether they‘re a liar, but is a signal to others that we should fight back, to the liar that they’ve been discovered, and to ourselves that we should protect ourselves against the threat. You could say we have an active cultural immune system against lying. “Lying” is a reference to deceptive human behavior that takes place within this context.

What would a collective of AI agents do? Hard to say. Maybe something akin to what we do, or maybe something entirely different due to its greater speed, intelligence, and different construction.

Under typical game-theoretic assumptions, we would assume all players to be strategic. In that context, it seems much more natural to suppose that all evil people would also be liars.

For me, the main point of the post is that you can't simultaneously (1) buy the picture in which the meaning of a signal is what it probabilistically implies, and (2) have a sensible definition of "lying". So when you say "2% are evil liars", Zack's response could be something like, "what do you mean by that?" -- you're already assuming that words mean things independent of what they signal, which is the assumption Zack is calling out here (on my reading).

Under typical game-theoretic assumptions, we would assume all players to be strategic. In that context, it seems much more natural to suppose that all evil people would also be liars.

Why? Maybe some evil people are ok with kicking puppies but not with lying - that's part of their utility function. (If such differences in utility functions can't exist, then there's no such thing as "good" or "evil" anyway.)

There are simpler examples where identifying deception seems more straightforward. e.g., If a non-venomous snake takes on the same coloration as a venomous snake, this is intended to increase others' estimates of p(venomous) and reduce their estimates of p(not venomous), which is a straightforward update in the wrong direction. 

In the fist attempt at a definition of deceptive signalling, it seems like a mistake to only look at the probability assigned to the true state ("causing the receiver to update its probability distribution to be less accurate (operationalized as the logarithm of the probability it assigns to the true state)"). Actions are based on their full probability distribution, not just the probability assigned to the true state. In the firefly example, P. rey is updating in the right direction on p(predator) (and on p(nothing)), but in the wrong direction on p(mate). And their upward update on p(mate) seems to be what's driving the predator's choice of signal. Some signs of this:

The predator mimicked the signal that the mates were using, when it could have caused a larger correct update to p(predator) and reversed the incorrect update to p(mate) by choosing any other signal. Also, P. redator chose the option that maximized the prey's chances of approaching it, and the prey avoids locations when p(predator) is sufficiently high. If we model the prey as acting according to a utility function, the signal caused the prey to update its expected utility estimate in the wrong direction by causing it to update one of its probabilities in the wrong direction (equivalently: the prey updated the weighted average of its probabilities in the wrong direction, where the weights are based on the relevant utilities). We could also imagine hypothetical scenarios, like if the predator was magically capable of directly altering the prey's probability estimates rather than being limited to changing its own behavior and allowing the prey to update. 

Here's a toy example which should make it clearer that the probability assigned to the true state is not the only relevant update.

Let's say that a seeker is searching for something, and doesn't know whether it is in the north, east, south, or west. If the object is in the north, then it is best for the seeker to go towards it (north), worst for the seeker to go directly away from it (south), and intermediate for them to go perpendicular to it (east or west). The seeker meets a witness who knows where the thing is. The majority (2/3) of witnesses want to help the seeker find it and the rest (1/3) want to hinder the seeker's search. And they have common knowledge of all of this.

In this case, the witness can essentially just direct the seeker's search - if the witness says "it's north" then the seeker goes north, since 2/3 of witnesses are honest. So if it's north and the witness wants to hinder the seeker, they can just say "it's south". This seems clearly deceptive - it's hindering the seeker's search as much as possible by messing up their beliefs. But pointing them south does actually lead to a right-direction update on the true state of affairs, with p(north) increasing from 1/4 (the base rate) to 1/3 (the proportion of witnesses who aim to hinder). It's still a successful deception because it increases p(south) from 1/4 to 2/3, and that dominates the seeker's choice.

I'm glad you're bringing sender-receiver lit into this discussion! It's been useful for me to ground parts of my thinking. What follows is almost-a-post's worth of, "Yes, and also..."

Stable "Deception" Equilibrium

The firefly example showed how an existing signalling equilibrium can be hijacked by a predator. What once was a reliable signal becomes unreliable. As you let things settle into equilibrium, the signal of seeing a light should lose all informational content (or at least, it should not give any new information about whether or not the signal is coming from mate or predator). 

Part of the what ensures this result is the totally opposed payoffs of P.rey and P.redator. In any signalling game where the payouts are zero-sum there isn't going to be an equilibrium where the signals conveys information.

More complex varied payouts can have more interesting results:

from one of Skyrms' book

Again, at the level of the sender-receiver game this is deception, but it still feels a good bit different from what I intuitively track as deception. This might be best stated as an example of "equilibrium of ambiguous communication as a result of semi-adversarial payouts"


I would not speculate on the mental life of bees; to talk of the mental life of bacteria seems absurd; and yet signalling plays a vital biological role in both cases.

I want to emphasize that the sender-receiver model and Skyrms' use of "informational content" are not meant to provide an explanation of intention. Information is meant to be more basic than intent, and present in cases (like bacteria) where there seems to be no intent. Skyrms seems to be responding to some scholars who want to say "intent is what defines communication!", and like Skyrms, I'm happy to say that communication and signals seems to cover a broad class of phenomena, of which intent would be a super-specialized subset.

For my two-cents, I think that intent in human communication involves both goal-directedness and having a model of the signalling equilibrium that can be plugged into an abstract reasoning system.

In sender-receiver games, the learning of signalling strategy often happens either through replicator-dynamics or a very simple Roth-Erev reinforcement learning. These are simple mechanisms that act quite directly and don't afford any reflection on the mechanism itself. Humans can not only reliably send a signal in the presence of certain stimulus, but can also do "I'm bored, I know that if I shout 'FIRE!' Sarah is gonna jump out of her skin, and then I'll laugh at her being surprised." Another fun example is that seems to rely on being able to reason about the signalling equilibrium itself is "what would I have to text you to covertly convey I've been kidnapped?"

I think human communication is always a mix of intentional and non-intentional communication, as I explore in another post. When it comes to deception, while a lot of people seem to want to use intention to draw the boundary between "should punish" and "shouldn't punish", is see it more as a question of "what sort of optimization system is working against me?" I'm tempted to say "intentional deception is more dangerous because that means the full force of their intellect is being used to deceive you, as opposed to just their unconscious" but that wouldn't be quite right. I'm still developing thoughts on this.

Far from equilibrium

I expect it's most fruitful to think of human communication as an open system that's far from equilibrium, most of the time. Thinking of equilibrium helps me think of directions things might move, but I don't expect everyone's behavior to be "priced into" most environments.

I still think this is one of the best recent posts. Well-researched and well-presented, and calls into question some of my tacit assumptions about how words work.

Going back to your plain English definition of deception:

intentionally causing someone to have a false belief

notice that it is the liar's intention for the victim to have a false belief. That requires the liar to know the victim's map!

So I would distinguish between intentionally lying and intentionlessly misleading.

P. redator is merely intentionlessly misleading P. rey. The decision to mislead P. rey was made by evolution, not by P. redator. On the other hand, if I were hungry and wanted to eat a P. rey, and made mating sounds, I would be intentionally lying. My map contains a map of P. rey's map, and it is my decision, not evolution's, to exploit the signal.

causing the receiver to update its probability distribution to be less accurate

This is an undesired consequence of deception (undesired by the liar, that is), so it seems strange to use it as part of the definition of deception. An ideal deceiver leaves its victim's map intact, so that it can exploit it again in the future.

This is an undesired consequence of deception (undesired by the liar, that is), so it seems strange to use it as part of the definition of deception. An ideal deceiver leaves its victim's map intact, so that it can exploit it again in the future.

Yes. But I think the question in this post is trying to pose is "can lying actually exist, in practice, in equilibrium?" or something similar. (I'm guessing the goal here is for Zack to wrap his brain around those executives who say "everyone knows we're lying so we're not lying" and have a crisp understanding of what's going on).

Some things that sticks out with the Plausibly Deniable Executives is that they are creating noise where there previously wasn't, where they are probably "actually lying" about at least some things (i.e CEO Alice.says words that customer Bob definitely interpret in a way that leaves Bob mislead) but it's harder to call them on in because Alice says so many god damn things with varying degrees of plausible deniability that it's hard to notice and reason about.

Alice might be less like P. Redator and more like a squid clouding the waters with an inkjet.

I like the attempt to separate intent from effect. I don't think you've quite succeeded in this, though - you probably need new words - "deceptive", "lie", and the like are VERY entangled with common social judgement and signaling (which themselves are often map-manipulating uses).

It may also help to separate intent/causality of behavior at different times. P.redator is, presumably, evolved rather than using a cognitive model, but the lesson applies the same. It's adopted a behavior (mimicking P.rey's mating signal) BECAUSE it misleads P.rey. This adoption can be termed "deception", and once P.rey has adapted, continuing the behavior is less effective (but not zero, or P.redator would evolve not to pay the cost of the signal).

The impact side of deception is of course not binary. P.rey has a strong mating signal before P.redator takes advantage of it, but even after adaptation, P.rey now only has a weaker signal available. P.redator's behavior continues to add noise, even after P.rey "knows" about the lie.

Is the common-usage of "deception" equivalant to "injected noise with a causal tie from conflicting beliefs"? Perhaps - I haven't deeply considered counter-examples, and I'd like to add in the concept that if the deceiver is more powerful than the victim (can model the victim, and/or adapt faster), the deception is more than just noise, it's actually negative information.

In the scenario where both benefit from honest communication what their task is where they get the reward from doesn't need ot be same but can differ. If the task is different the action is probably different so they are probably also using different representations to process the information. It is an interoperable meaning but it doesn't need to be shared.

I was expecting a different concet to be formed. What I here now dub "exploitative communication" is when you could have sent a signal that would have resulted in more success to the receiver but instead you send a different signal that results in more success to you. This doesn't refer to beliefs. And it is clearer that defection can be separate from deception.

I wouldn't be that surprised if the main use case for deception would be exploitation. However in magic deception can be cooperative. And I think it would be hard to think that camouflage is some sort of lie but it is easier to think of it as exploiting a visual system for quiet. In a kind of hypnosis way the prey is asking the eye not to see it and it is complying because how expertly the ask was phrased. It is not that data is not received or processed but just that the outcome is favourable.

It wouldn't be that hard to call the twinkle a luring signal whter in the allure sense or fish lure sense. In that way both the mate and the predator use it to move the target to their proximity for their own advantage. The difference is that for the predator it is to the targets disadvantage.

What the selection is focused on is in action rather than beliefs althought if our agent manages actions via beliefs there might be a coupling. But the coupling need not exist.

(I haven't read any of the literature on deception you cite, so this is my unimformed opinion.)

I don't think there's any propositional content at all in these sender-receiver games. As far as the P.redator is concerned, the signal means "I want to eat you" and the P.rey wants to be eaten.

If the environment were somewhat richer, the agents would model each other as agents, and they'd have a shared understanding of the meaning of the signals, and then I'd think we'd have a better shot of understanding deception.

[+][comment deleted]3mo 1