I've set up a prediction tracking system for personal use. I'm assigning confidence levels to each prediction so I can check for areas of under- or over-confidence.

My question: If I predicted X, and my confidence in X changes, will it distort the assessment of my overall calibration curve if I make a new prediction about X at the new confidence level, keep the old prediction, and score both predictions later? Is that the "right" way to do this?

More generally, if my confidence in X fluctuates over time, does it matter at all what criterion I use for deciding when and how many predictions to make about X, if my purpose is to see if my confidence levels are well calibrated? (Assuming I've predetermined which X's I want to eventually make predictions about)

My thinking is that a confidence level properly considers it's own future volatility, and so it shouldn't matter when I "sample" by making a prediction. But if I imagine a rule like: "Whenever your confidence level about X is greater than 90%, make two identical predictions instead of one", it feels like I'm making some mistake.

If you ask "Does it matter?" the answer is probably: Yes.

How you query yourself and when has effects. The effects are likely to be complicated and you are unlikely to fully aware of all of them.
When it comes to polling it frequently happens that the way you ask a question has effects.

I've set up a prediction tracking system for personal use. I'm assigning confidence levels to each prediction so I can check for areas of under- or over-confidence.

My question: If I predicted X, and my confidence in X changes, will it distort the assessment of my overall calibration curve if I make a new prediction about X at the new confidence level, keep the old prediction, and score both predictions later? Is that the "right" way to do this?

More generally, if my confidence in X fluctuates over time, does it matter at all what criterion I use for deciding when and how many predictions to make about X, if my purpose is to see if my confidence levels are well calibrated? (Assuming I've predetermined which X's I want to eventually make predictions about)

My thinking is that a confidence level properly considers it's own future volatility, and so it shouldn't matter when I "sample" by making a prediction. But if I imagine a rule like: "Whenever your confidence level about X is greater than 90%, make two identical predictions instead of one", it feels like I'm making some mistake.

If you ask "Does it matter?" the answer is probably: Yes.

How you query yourself and when has effects. The effects are likely to be complicated and you are unlikely to fully aware of all of them. When it comes to polling it frequently happens that the way you ask a question has effects.