Review

Do some of you keep a record of personal predictions?

I.e. either predictions of personal events, such as

likelihood of getting at least a 10% raise in the next 12 months

or events conditional on decisions, actions, such as

likelihood of getting at least 10% higher income in the next 12 months given that I search for other jobs instead of not

If yes, how useful do you find this? Have you validated it in some way? Do you keep yourself honest this way by looking at your track record? Are you trying to track and improve your calibration and priors this way?

I’m quite interested in this. However, after a naive try in a spreadsheet file, which was easy to set up, I find myself at some stumbling blocks. I find it difficult to be sufficiently detailed and specific that allows for unambiguous resolution of the questions (after which brier scores can easily be attained), and making them sufficiently conditional so as to avoid interfering feedback loops, while not having to spend too much time writing the scenarios themselves.

Let’s see the above sentences as an example:

likelihood of getting at least 10% higher income in the next 12 months given that I search for other jobs instead of not

Having such a prediction sounds very useful at a first glance to me, it could help me assess if I should look for a different job! Whether it’s worth it in expectation, and how much effort might be worth expanding on it.

I.e. if the answer is 90%, that’s a good signal that I might need to take a shot at it.

If the answer is 1%, that’s a good signal that maybe I shouldn’t bother, or very little.

But let’s look closer: at second glance it seems woefully underspecified:

Just how much effort does this mean? Literally spending a single second on a job board? How can I avoid gaming myself? Let’s try to specify this better:

likelihood of getting at least 10% higher income in the next 12 months given that I apply to 10 job postings

This is better, but again what do we mean here? Should I just apply to the first 10 that I see?

I can think of a few general ways to try to solve this, to try to bring in line the actions and the predictions:

I could implicitly or explicitly append “reasonable effort” or something similar to these questions, i.e.

likelihood of getting at least 10% higher income in the next 12 months given that I apply to 10 job postings with reasonable effort

Ok, so I should be somewhat discerning in this case, but I quite dislike how fuzzy the resolution becomes this way: did I really undertake a reasonable effort? Did I undershot or overshot it?

Perilous feedback loops can also creep in nonetheless: my reasonable effort for a 90% prediction might mean being more relaxed than otherwise: it’s a done deal, I might think. Having it lower might motivate me more, thinking that I have to improve my chances, but having it too low might de-motivate me and reasonable effort could in fact be very low effort in this case. All too fuzzy!

The other way would be to instead change what I am estimating, and already assume that I’ll minimize effort in advance, and if the question is written like:

likelihood of getting at least 10% higher income in the next 12 months given that I apply to 10 job postings

Then I should estimate literally just sending my CV to the first 10 companies that I can come across, meaning my prediction should not be much any higher than not doing anything, because notice how sneaky this assumption is: I did not specify that I’ll have to subject myself to being interviewed too, so I must assume I’ll ignore all 10 even if all of them wants to interview me. Or something even more unreasonable like sending an application letter with no CV in sight.

Now we venture into more of an unknown territory: could I grade the resolutions somehow? Traditional forecasting wisdom as I understand would say “Never!”: a resolution either fully happened or fully not, or if it’s in any way ambiguous, then it fully doesn’t count, as if never made in the first place. Not ideal. But what if I could try to estimate reasonable job hunting effort, and then later grade myself of how much I actually did? Not sure how the math could work out here. 

Or take this simpler version:

If I switch to another job in the next 12 months, how likely is it that I’ll be more satisfied with it in the first two months than I’m now? 

Hoo boy, where do we even start with this one, even though lots of people make major life decisions on exactly these kinds of hinges! What if I am just a little bit happier afterward, and it’s hard to say? Can I grade this as 60% passed (and 40% failed)?

There could be another kind of prediction to the rescue: estimating a value instead of a probability:

If I switch to another job in the next 12 months what’s my expected satisfaction with it in the first 2 months on a scale of 1-10?

This is better. I can say 8, with 7-9 being the 95% confidence interval! This can be calculated with! At evaluation, I need only concern myself with how sure I am that I’m below 9 and above 7--"or am I at only 6.8?"

As you can see, *gestures at own confusion*, I’m a bit lost with all this. I’m familiar with superforecasting as a concept, and prediction markets, but both of these seem to be for multi-person bets that have at least some wider appeal and relevance. Calibration as a concept still seems applicable, but it’s usually in the service of the above.

But as I tried to indicate, some questions can be very important for a single person and not to others, so what can be done in this case, if one wants to improve how rational they are? Can this game be played solo? I tried searching Google, using metaphor.systems, asking LLMs, and I haven’t found satisfactory answers, so I turn to you.

And maybe the answer to all this is that one has to bite the bullet, and really go into the nitty gritty when writing questions, drill down into acceptance criteria, and then resolution is straightforward, and prediction can take all that into account.

I could try to specify further this way:

likelihood of getting at least 10% higher income in the next 12 months given that I apply to 10 job postings, go through their interview process, and if offered accept at least one such offer that is a net-improvement all things considered in expectation to my current job

(I tried to be more specific above, but I might just have exchanged a problem to a harder problem: will I be able to aggregate and assess “net-improvement all things considered”? If the difference is large, sure, but otherwise unclear.)

Or maybe for some people the “reasonable effort” or similar condition works, but I’ll be curious how you don’t fall into all kinds of problems here: fuzziness of resolutions and perilous negative and positive feedback loops, e.g. potentially divergent predictions.

And maybe the moral of this story is that this tool should only be reached to if a question is important enough that one already knows that spending 5, 10, 30 or even more minutes in really pinning it down is worth it in expectation, otherwise it becomes an exercise in futility as soon as a resolution needs to be chosen.

I'd like to hear from all of you who have experience with this or have relevant insights, or can point me towards those who do. Also feel free to recommend me any other fora where I could post this and it may be more relevant; e.g. I looked for a general conversational or Q&A forum for Metaculus but I did not find one.

New Answer
New Comment

3 Answers sorted by

omark

70

I have no data and all I'll talk about is my experience and my gut feelings on this topic.

The first question I ask myself is what problem am I trying to solve or what am I trying to improve? The answer for me is that I suspect that I am vastly overconfident in my predictions and that I selectively forget my worst forecasts. For example I remember being skeptical about someone after interviewing them for a job, writing as much to my supervisor and the applicant getting the job anyway. A few years later the person was doing an excellent job and I was surprised when stumbling upon my own e-mail. I had forgotten about it. On the other hand, I believe I have almost never forgotten cases where I made a good call about something.

So the first problem I want to solve is to become more humble in my predictions by making my failures more visible to myself.

The second improvement I would like to achieve is determining whether the probability numbers I attach to future events are reliable indicators or completely useless. That is calibration e.g. Brier score. I suspect these values have to be interpreted by "category" (i.e. you might have good calibration in politics but bad calibration in personal relationships) and that you only start getting useful results after one or two years and a few hundred forecasts.

I find it difficult to be sufficiently detailed and specific that allows for unambiguous resolution of the questions

Future-you is presumably not maliciously trying to deceive you, right? So the only case you need to worry about is future-you misunderstanding what you meant when you wrote the forecast.

I quite dislike how fuzzy the resolution becomes this way: did I really undertake a reasonable effort? Did I undershot or overshot it?

Do you think it very likely that present-you and future-you will have a very different perspective on what "reasonable effort" means? I would only clarify things up to the point where you trust future-you to do the right thing.

Perilous feedback loops can also creep in nonetheless: my reasonable effort for a 90% prediction might mean being more relaxed than otherwise: it’s a done deal, I might think.

I agree with these feedback loops. My perspective is that you should not strive for perfection but try to improve upon the status-quo. Even without writing down explicit predictions you will have such feedback loops. Do you think they become worse when you write the predictions down i.e. worse than when you just have this gut feeling something is a "done deal"?

You are right that making predictions and asking yourself questions that you might not have asked yourself otherwise might change your behavior. I would even say that it's not uncommon to use predictions as a motivational tool because you don't want to be proven wrong in front of yourself or others. The feedback loop is then purposefully built in.

One way of minimizing this might be to try to make predictions that are farther in the future and then trying to forget about them. For example make a lot of predictions so that you forget the particulars and then only look at the file a year later. This is a trade-off with updating the predictions on a regular basis with new information, which to me is more important.

Another potential solution is to ask other people (friends) to make predictions about you without telling you the details. They could give you a happiness questionnaire once every 3 months and not tell you until after resolution what they do with the data. In this case they are the ones working on their calibration. If you want to work on your own, you can make predictions about them.

Or take this simpler version:

If I switch to another job in the next 12 months, how likely is it that I’ll be more satisfied with it in the first two months than I’m now?

Hoo boy, where do we even start with this one, even though lots of people make major life decisions on exactly these kinds of hinges! What if I am just a little bit happier afterward, and it’s hard to say? Can I grade this as 60% passed (and 40% failed)?

No, I don't think you should grade it as 60% passed. It was a yes/no question. As long as you are even a little bit happier, the answer is yes.

At evaluation, I need only concern myself with how sure I am that I’m below 9 and above 7--"or am I at only 6.8?"

When making the prediction you already knew that your judgement at resolution was going to be subjective. If you dislike that, maybe it's not a useful prediction to make.

One way around this might be to try to make "job satisfaction" something you derive from multiple variables (e.g. take the mean value of "how nice is the office", "how nice are the colleagues", "how challenging are the tasks", ...). Then it won't be obvious at resolution time how to get the result that you wanted but rather you aggregate and you roll with the result.

I am really interested in forecasting and getting better at it so I am developing Cleodora, a free and open-source tool to track such forecasts. I encourage you and other readers to have a look, leave me your thoughts and help me improve it!

Dumbledore's Army

30

Metaculus lets you write private questions. Once you have an account, it’s as simple as selecting ‘write a question’ from the menu bar, and then setting the question to private not public, as a droplist in the settings when you write it. You can resolve your own questions ie mark them as yes/no or whatever, and then it’s easy to use Metaculus’ tools for examining your track record, including Brier score.

Dalmert

10

I expect that until I find a satisfactory resolution to this topic, I might come back to it a few times, and potentially keep a bit of a log here of what I find in case it does add up to something. So far this is one of the things I found:

https://www.lesswrong.com/posts/JnDEAmNhSpBRpjD8L/resolutions-to-the-challenge-of-resolving-forecasts

This seems very relevant to a part of what I was pondering about, but not sure how actionable are the takeaways yet.

1 comment, sorted by Click to highlight new comments since:

I think it should be more atomized, like A. Within the next 12 months, I will find at least N attractive offers. (Maybe insert predictions about applying, being interviewed, getting the position) B. If I do get a new job, I will (outcome(s)).

The (outcome(s)) should better be atomic as possible, too. Like "I will solve the washer issue", "I will have unresolved problems with my current job", etc. Still hard to quantify, but you don't have to cover all the territory.