Dishonest Update Reporting

Zvi

Related to: Asymmetric Justice, Privacy, Blackmail

Previously (Paul Christiano): Epistemic Incentives and Sluggish Updating

The starting context here is the problem of what Paul calls sluggish updating. Bob is asked to predict the probability of a recession this summer. He said 75% in January, and how believes 50% in February. What to do? Paul sees Bob as thinking roughly this:

If I stick to my guns with 75%, then I still have a 50-50 chance of looking smarter than Alice when a recession occurs. If I waffle and say 50%, then I won’t get any credit even if my initial prediction was good. Of course if I stick with 75% now and only go down to 50% later then I’ll get dinged for making a bad prediction right now—but that’s little worse than what people will think of me immediately if I waffle.

Paul concludes that this is likely:

Bob’s optimal strategy depends on exactly how people are evaluating him. If they care exclusively about evaluating his performance in January then he should always stick with his original guess of 75%. If they care exclusively about evaluating his performance in February then he should go straight to 50%. In the more realistic case where they care about both, his optimal strategy is somewhere in between. He might update to 70% this week.

This results in a pattern of “sluggish” updating in a predictable direction: once I see Bob adjust his probability from 75% down to 70%, I expect that his “real” estimate is lower still. In expectation, his probability is going to keep going down in subsequent months. (Though it’s not a sure thing—the whole point of Bob’s behavior is to hold out hope that his original estimate will turn out to be reasonable and he can save face.)

This isn’t ‘sluggish’ updating, of the type we talk about when we discuss the Aumann Agreement Theorem and its claim that rational parties can’t agree to disagree. It’s dishonest update reporting. As Paul says, explicitly.

I think this kind of sluggish updating is quite common—if I see Bob assign 70% probability to something and Alice assign 50% probability, I expect their probabilities to gradually inch towards one another rather than making a big jump. (If Alice and Bob were epistemically rational and honest, their probabilities would immediately take big enough jumps that we wouldn’t be able to predict in advance who will end up with the higher number. Needless to say, this is not what happens!)

Unfortunately, I think that sluggish updating isn’t even the worst case for humans. It’s quite common for Bob to double down with his 75%, only changing his mind at the last defensible moment. This is less easily noticed, but is even more epistemically costly.

When Paul speaks of Bob’s ‘optimal strategy’ he does not include a cost to lying, or a cost to others getting inaccurate information.

This is a world where all one cares about is how one is evaluated, and lying and deceiving others is free as long as you’re not caught. You’ll get exactly what you incentivize.

What that definitely won’t get you are a lot more than just accurate probability estimates.

The only way to get accurate probability estimates from Bob-who-is-happy-to-strategically-lie is to use a mathematical formula to reward Bob based on his log likelihood score. Or to have Bob bet in a prediction market, or another similar robust method. And then use that as the entirety of how one evaluates Bob. If human judgment is allowed in the process, the value of that will overwhelm any desire on Bob’s part to be precise or properly update.

Since Bob is almost certainly in a human context where humans are evaluating him based on human judgments, that means all is mostly lost.

As Paul notes, consistency is crucial in how one is evaluated. Even bigger is avoiding mistakes.

Given the asymmetric justice of punishing mistakes and inconsistency that can be proven and identified, the strategic actor must seek cognitive privacy. The more others know about the path of your beliefs, the easier it will be for them to spot an inconsistency or a mistake. It’s hard enough to give a reasonable answer once, but updating in a way that never can be shown to have ever made a mistake or been inconstant? Impossible.

A mistake or inconsistency are the bad things one must avoid getting docked points for.

Thus, Bob’s full strategy, in addition to choosing probabilities that sound best and give the best cost/benefit payoffs in human intuitive evaluations of performance, is to avoid making any clear statements of any kind. When he must do so, he will do his best to be able to deny having done so. Bob will seek to destroy the historical record of his predictions and statements, and their path. And also prevent the creation of any common knowledge, at all. Any knowledge of the past situation, or the present outcome, could be shown to not be consistent with what Bob said, or what we believe Bob said, or what we think Bob implied. And so on.

Bob’s optimal strategy is full anti-epistemology. He is opposed to knowledge.

In that context, Paul’s suggested solutions seem highly unlikely to work.

His first suggestion is to exclude information – to judge Bob only by the aggregation of all of Bob’s predictions, and ignore any changes. Not only does this throw away vital information, it also isn’t realistic. Even if it was realistic for some people, others would still punish Bob for updating.

Paul’s second suggestion is to make predictions about others’ belief changes, which he himself notes ‘literally wouldn’t work.’ And that it is ‘a recipe for epistemic catastrophe.’ The whole thing is convoluted and unnatural at best.

Paul’s third and final suggestion is social disapproval of sluggish updating. As he notes, this twists social incentives potentially in good ways but likely in ways that make things worse:

Having noticed that sluggish updating is a thing, it’s tempting to respond by just penalizing people when they seem to update sluggishly. I think that’s a problematic response:

I think the rational reaction to norms against sluggish updating may often be no updating at all, which is much worse.
In general combating non-epistemic incentives with other non-epistemic incentives seems like digging yourself into a hole, and can only work if you balance everything perfectly. It feels much safer to just try to remove the non-epistemic incentives that were causing the problem in the first place.
Sluggish updating isn’t easy to detect in any given case. For example, suppose that Bob expects an event to happen, and if it does he expects to get a positive sign on any given day with 1% probability. Then if the event doesn’t happen his probability will decay exponentially towards zero, falling in half every ~70 days. This will look like sluggish updating.

Bob already isn’t excited about updating. He’d prefer to not update at all. He’s upset about having had to give that 75% answer, because now if there’s new information (including others’ opinions) he can’t keep saying ‘probably’ and has to give a new number, again giving others information to use as ammunition against him.

The reason he updated visibly, at all, was that not updating would have been inconsistent or otherwise punished. Punish updates for being too small on top of already looking bad for changing at all, and the chance you get the incentives right here are almost zero. Bob will game the system, one way or another. And now, you won’t know how Bob is doing it. Before, you could know that Bob moving from 75% to 70% meant going to something lower, perhaps 50%. Predictable bad calibration is much easier to fix. Twist things into knots and there’s no way to tell.

Meanwhile, Bob is going to reliably get evaluated as smarter and more capable than Alice, who for reasons of principle is going around reporting her probability estimates accurately. Those observing might even punish Alice further, as someone who does not know how the game is played, and would be a poor ally.

The best we can do, under such circumstances, if we want insight from Bob, is to do our best to make Bob believe we will reward him for updating correctly and reporting that update honestly, then consider Bob’s incentives, biases and instincts, and attempt as best we can to back out what Bob actually believes.

As Paul notes, we can try to combat non-epistemic incentives with equal and opposite other non-epistemic incentives, but going deep on that generally only makes things more complex and rewards more attention to our procedures and how to trick us, giving Bob an even bigger advantage over Alice.

A last-ditch effort would be to give Bob sufficient skin in the game. If Bob directly benefits enough from us having accurate models, Bob might report more accurately. But outside of very small groups, there isn’t enough skin in the game to go around. And that still assumes Bob thinks the way for the group to succeed is to be honest and create accurate maps. Whereas most people like Bob do not think that is how winners behave. Certainly not with vague things that don’t have direct physical consequences, like probability estimates.

What can be done about this?

Unless we care enough, very little. We lost early. We lost on the meta level. We didn’t Play in Hard Mode.

We accepted that Bob was optimizing for how Bob was evaluated, rather than Bob optimizing for accuracy. But we didn’t evaluate Bob on that basis. We didn’t place the virtues of honesty and truth-seeking above the virtue of looking good sufficiently to make Bob’s ‘look good’ procedure evolve into ‘be honest and seek truth.’ We didn’t work to instill epistemic virtues in Bob, or select for Bobs with or seeking those virtues.

We didn’t reform the local culture.

And we didn’t fire Bob the moment we noticed.

Game over.

I once worked for a financial firm that made this priority clear. On the very first day. You need to always be ready to explain and work to improve your reasoning. If we catch you lying, about anything at all, ever, including a probability estimate, that’s it. You’re fired. Period.

It didn’t solve all our problems. More subtle distortionary dynamics remained, and some evolved as reactions to the local virtues, as they always do. For these and other reasons, that I will not be getting into here or in the comments, it ended up not being a good place for me. Those topics are for another day.

But they sure as hell didn’t have to worry about the likes of Bob.

There is a strategy that is almost mentioned here, but not pursued, that I think is near-optimal - explaining your reasoning as a norm. This is the norm I have experienced in the epistemic community around forecasting. (I am involved in both Good Judgment, where I was an original participant, and have resumed work, and on Metaculus's AI instance. Both are very similar in that regard.)

If such explanation is a norm, or even a possibility, the social credit for updated predictions will normally be apportioned based on the reasoning as much as the accuracy. And while individual brier scores are useful, forecasters who provide mediocre calibration but excellent public reasoning and evidence which others use are more valuable for an aggregate forecast than excellent forecasters who explain little or nothing.

If Bob wants social credit for his estimate in this type of community, he needs to publicly explain his model - at least in general. (This includes using intuition as an input - there are superforecasters who I update towards based purely on claims that the probability seems too low / high.) Similarly, if Bob wants credit for updating, he needs to explain his updated reasoning - including why he isn't updating based on evidence that prompted Alice's estimate, which would usually have been specified, or updated based on Alice's stated model and her estimate itself. If Bob said 75% initially, but now internally updates to think 50%, it will often be easier to justify a sudden change based on an influential datapoint, rather than a smaller one using an excuse.

Right. I kinda implied it was part of the solution but didn't say it explicitly enough, and may edit.

The problem for implementation, of course, is that explaining your reasoning is toxic in worlds with the models we describe. It's the opposite of not taking positions, staying hidden and destroying records. It opens you up to being blamed for any aspect of your reasoning. That's pretty terrible. It's doubly terrible if you're in any sort of double-think equilibrium (see SSC here). Because now, you can't explain your reasoning.

Political contexts are poisonous, of course, in this and so many other ways, so politics should be kept as small as possible. In most contexts, however, including political ones, the solution is to give no credit for those that don't explain, or even to assign negative credit for punditry that isn't demonstrably more accurate than the corwd - which leads to a wonderful incentive to shut up unless you can say something more than "I think X will happen."

And in collaborative contexts, people are happy to give credit for mostly correct thinking that assist their own, rather than attack for mistakes. We should stay in those contexts and build them out where possible - positive sum thinking is good, and destroying, or at least ignoring, negative sum contexts is often good as well.

The ideal thing is to judge Bob as if he were making the same prediction every day until he makes a new one, and log-score all of them when the event is revealed. (That is, if Bob says 75% on January 1st and 60% on February 1st, and then on March 1st the event is revealed to have happened, Bob's score equals 31*log(.25) + 28*log(.4). Then Bob's best strategy is to update his prediction to his actual current estimate as often as possible; past predictions are sunk costs.

The real-world version is remembering to dock people's bad predictions more, the longer they persisted in them. But of course this is hard.

538 did do this with their self-evaluation, which is a good way to try and establish a norm in the domain of model-driven reporting.

Yes, that seems right, if it can be used as the sole criteria, and be properly normalized for the time frames and questions involved. There are big second-level Goodhart traps lying in wait if people care about this metric.

In a prediction market your belief is not shared, but contributes to the consensus (market price of a futures). Many traders become agnostic about a question (close their position) before the underlying fact of the matter is revealed (delivery), perhaps shortly after stating the direction in which they expect the consensus to move (opening the position), to contribute (profit from) their rare knowledge while it remains rare. Requiring traders to own up to a prediction (hold to delivery) interferes with efficient communication of rare information into common knowledge (market price).

So consider declaring that the consensus is shifting in a particular direction, without explaining your reasoning, and then shortly after bow out of the discussion (taking note of how the consensus shifted in the interim). This seems very strange when compared to common norms, but I think something in this direction could work.

A key active ingredient here seems to be that exact ability to disguise your true position. Even if someone knows your trades, they don't know why you did them. You could have a different fair value (probability estimate), you could be hedging risk, you could expect the price to move in a direction without thinking that move is going to be accurate, and so on.

By not requiring the trader to be pinned down to anything (except profit and loss) we potentially extract more information.

And all of that applies to non-prediction markets, too.

Note that most markets don't have any transparency about who buys or sells, and external factors are often more plausible reasons than a naive outsider expects. A drop in the share price of a retailer could be reflecting lower confidence in their future earnings, or result from a margin call on a firm that made a big bet on the firm that it needed to unwind, or even be because a firm that was optimistic about the retailer decided to double down, and move a large call options position out 6 months, so that their counterparty sold to hedge their delta - there is no way to tell the difference. (Which is why almost all market punditry is not only dishonest, but laughable once you've been on the inside.)

In a (deep enough, which is an unsolved problem) prediction market, there is a clear mechanism to be rewarded for indicating that your private beliefs differ from the consensus. When they no longer differ, it doesn't matter whether you close out your position or not.

In fact, you're right that you're really publishing a difference between current consensus and your private beliefs about future consensus, which may differ from truth, but that difference is opportunity for future participants who will get paid when the prediction resolves.

Holding to delivery is already familiar for informal communication. But short-term speculation is a different mode of contributing rare knowledge into consensus that doesn't seem to exist for discussions of beliefs that are not on prediction markets, and breaks many assumptions about how communication should proceed. In particular it puts into question the virtues of owning up to your predictions and of regularly publishing updated beliefs.

I'm confused whether we're talking about informal communication, where holding to delivery is the norm because nobody actually cares about the results, or about endorsed public predictions that we want to make decisions based on. I don't think the problems nor their solutions are the same for these different kinds of predictions.

By "informal" I meant that the belief is not on a prediction market, so you can influence consensus only by talking, without carefully keeping track of transactions. (I disagree with it being appropriate not to care about results in informal communication, so it's not a distinction I was making.)

exploring here, not sure where it'll go.

What is the value, to whom, of the predictions being correct? The interesting cases are one where there is something performing the function of a prediction market in feeding back some value for correct and surprising predictions. All else is "informal" and mostly about signaling rather than truth.

The value of caring about informal reasoning is in training the same skills that apply for knowably important questions, and in seemingly unimportant details adding up in ways you couldn't plan for. Existence of a credible consensus lets you use a belief without understanding its origin (i.e. without becoming a world-class expert on it), so doesn't interact with those skills.

When correct disagreement of your own beliefs with consensus is useful at scale, it eventually shifts the consensus, or else you have a source of infinite value. So almost any method of deriving significant value from private predictions being better than consensus is a method of contributing knowledge to consensus.

(Not sure what you were pointing at, mostly guessing the topic.)

For oneself, caring about reasoning and correct predictions is well worthwhile. And it requires some acknowledgement that your beliefs are private, and that they are separate from your public claims. Forgetting that this applies to others as well as yourself seems a bit strange.

I may be a bit too far on the cynicism scale, but I start with the assumption that informal predictions are both oversimplified to fit the claimant's model of their audience, and adjusted in direction (from the true belief) to have a bigger impact on their audience.

That is, I think most public predictions are of the form "you should have a higher credence in X than you seem to", but for greater impact STATED as "you should believe X".

I don't like reifying this as dishonesty when the outside view on taking ideas seriously says that it's pretty reasonable to update slowly as you gather more kinds of evidence than just logical argument.

I think it's definitely not dishonest to actually update too slowly versus what would be ideal. As you say, almost everyone does it.

What's dishonest is for Bob to think 50% and say 70% (or 75%) because it will look better.

agree, in this situation he should state that he feels incentivized to state 70% and that that's a problem.

This post seems to me to be misunderstanding a major piece of Paul's "sluggish updating" post, and clashing with Paul's post in ways that aren't explicit.

The core of Paul's post, as I understood it, is that incentive landscapes often reward people for changing their stated views too gradually in response to new arguments/evidence, and Paul thinks he has often observed this behavioral pattern which he called "sluggish updating." Paul illustrated this incentive landscape through a story involving Alice and Bob, where Bob is thinking through his optimal strategy, since that's a convenient way to describe incentive landscapes. But that kind of intentional strategic thinking isn't how the incentives typically manifest themselves in behavior, in Paul's view (e.g., "I expect this to result in unconscious bias rather than conscious misrepresentation. I suspect this incentive significantly distorts the beliefs of many reasonable people on important questions"). This post by Zvi misunderstands this as Paul describing the processes that go on inside the heads of actual Bobs. This loses track of the important distinction (which is the subject of multiple other LW Review nominees) between the rewards that shape an agent's behavior and the agent's intentions. It also sweeps much of the disagreement between Paul & Zvi's posts under the rug.

A few related ways the views in the two posts clash:

This post by Zvi focuses on dishonesty, while Paul suggests that unconsciously distorted beliefs are the typical case. This could be because Zvi disagrees with Paul and thinks that dishonesty is the typical case. Or it could be that Zvi is using the word "dishonest" broadly - he mostly agrees with Paul about what happens in people's heads, but applies the "dishonesty" frame in places where Paul wouldn't. Or maybe Zvi is just choosing to focus on the dishonest subset of cases. Or some combination of these.

Zvi focuses on cases where Bob is going to the extreme in following these incentives, optimizing heavily for it and propagating it into his thinking. "This is a world where all one cares about is how one is evaluated, and lying and deceiving others is free as long as you’re not caught." "Bob’s optimal strategy is full anti-epistemology." Paul seems especially interested in cases where pretty reasonable people (with some pretty good features in their epistemics, motivations, and incentives) still sometimes succumb to these incentives for sluggishness. Again, it's unclear how much of this is due to Zvi & Paul having different beliefs about the typical case and how much is about choosing to focus on different subsets of cases (or which cases to treats as central for model-building).

Paul's post is written from a perspective of 'Good epistemics don't happen by default', where thinking well as an individual involves noticing places where your mental processes haven't been aimed towards accurate beliefs and trying to do better, and social epistemics are an extension of that at the group level. Zvi's post is written from a perspective of 'catching cheaters', where good social epistemics is about noticing ways that people are something-like-lying to you, and trying to stop that from happening.

Zvi treats Bob as an adversary. Paul treats him as a potential ally (or as a state that you or I or anyone could find oneself in), and mentions "gaining awareness" of the sluggishness as one way for an individual to counter it.

Related to all of this, the terminology clashes (as I mentioned in a comment). I'd like to say a simple sentence like "Paul sees [?sluggishness?] as mainly due to [?unconscious processes?], Zvi as mainly due to [?dishonest update reporting?]" but I'm not sure what terms go in the blanks.

The "fire Bob" recommendation depends a lot on how you're looking at the problem space / which part of the problem space you're looking at. If it's just a recommendation for a narrow set of cases then I think it wouldn't apply to most of the cases that Paul was talking about in his "Observations in the wild", but if it's meant to apply more widely then that could get messy in ways that interact with the clashes I've described.

The other proposed solutions seem less central to these two posts, and to the clash between Paul & Zvi's perspectives.

I think there is something interesting in the contrast between Paul & Zvi's perspectives, but this post didn't work as a way to shine light on that contrast. It focuses on a different part of the problem space, while bringing in bits from Paul's post in ways that make it seem like it's engaging with Paul's perspective more than it actually does and make it confusing to look at both perspectives side by side.

This is an important line of thought, but I find myself very distracted by use of the word "updating" when you actually mean "publishing". In my mind, "updating a belief" strongly implies an internal state change, which may or may not be externally visible. It's a completely separate question of whether publishing or communicating a partial set of beliefs (because we can't yet publish our entire belief state) is helpful or harmful to one's goals.

All human interaction is a mix of cooperative and adversarial motives. Looking for mechanisms to increase cooperation and limit competitive motives is excellent, but we need to be clear that this isn't about updating beliefs, it's about broader human goal alignment.

Agreed. Changed to dishonest update reporting.

Seems like the terminology is still not settled well.

There's a general thing which can be divided into two more specific things.

General Thing: The information points to 50%, the incentive landscape points to 70%, Bob says "70%".

Specific Thing 1: The information points to 50%, the incentive landscape points to 70%, Bob believes 50% and says "70%".

Specific Thing 2: The information points to 50%, the incentive landscape points to 70%, Bob believes and says "70%".

There are three Things and just two names, so the terminology is at least incomplete.

"Dishonest update reporting" sounds like the name of Specific Thing 1.

In Paul's post "sluggish updating" referred to the General Thing, but Dagon's argument here is that "sluggish updating" should only refer to Specific Thing 2. So there's ambiguity.

It seems most important to have a good name for the General Thing. And that's maybe the one that's nameless? Perhaps "sluggish update reporting", which can happen either because the updating is sluggish or because the reporting is sluggish/dishonest. Or "sluggish social updating"? Or something related to lightness? Or maybe "sluggish updating" is ok despite Dagon's concerns (e.g. a meteorologist updating their forecast could refer to changes that they make to the forecast that they present to the world).

This is a true engagement with the ideas in Paul original post. It actively changed my mind – at first I thought Paul was making a good recommendation, but now I think it was a bad one. It helped me step back from a very detailed argument and notice what rationalist virtues were in play. I think it's a great example of what a rebuttal of someone else's post looks like. I'd like to see it in the review, and I will vote on it somewhere between +3 and +7.

Mostly seconding Ben's nomination.

But also, additionally, a bit more flavor from me: I really like the double-punch of both Paul's ideas about sluggish updating, together with Zvi's great elaboration on the topic in this post. Very dense in insights.

This gave me a further perspective on a topic I'd gotten from Paul, and I really value the new perspective. Changed my mind on the overall question.

My experience has been that everyone is Bob, at least some of the time in some contexts, and that leads to many situations being comprised mostly of Bobs. Bob is simply correct - he has a more accurate map than you seem to - on the topic of whether sharing his true predictions will improve or harm his future experiences.

I don't even know how to formulate the problem statement that describes this - it feels like "humans are barely-evolved apes and consistently optimize for local/individual benefit at the expense of cooperative potential outcomes" is a bit too big to take on, but any narrower definition is missing an important root cause.

Designing mechanisms to align individual reward with the designers' goals is one way to approach this, and prediction markets are the best suggestion I've heard on the topic. And they fall prey to the same underlying problem: most people aren't seeking to improve group consensus of truth, so don't really want to participate in activities where they don't have some comparative advantage.