# Nisan

Vanessa Kosoy's Shortform

My takeaway from this is that if we're doing policy selection in an environment that contains predictors, instead of applying the counterfactual belief that the predictor is always right, we can assume that we get rewarded if the predictor is wrong, and then take maximin.

How would you handle Agent Simulates Predictor? Is that what TRL is for?

"Rationalizing" and "Sitting Bolt Upright in Alarm."

It sounds like you want a word for "Alice is wrong, and that's terrible". In that case, you can say "Alice is fucking wrong", or similar.

Why it took so long to do the Fermi calculation right?

Good point. In that case the Drake equation must be modified to include panspermia probabilities and the variance in time-to-civilization among our sister lineages. I'm curious what kind of Bayesian update we get on those...

An environment for studying counterfactuals

The observation can provide all sorts of information about the universe, including whether exploration occurs. The exact set of possible observations depends on the decision problem.

and can have any relationship, but the most interesting case is when one can infer from with certainty.

Beliefs at different timescales

Thanks, I made this change to the post.

Beliefs at different timescales

Yeah, I think the fact that Elo only models the macrostate makes this an imperfect analogy. I think a better analogy would involve a hybrid model, which assigns a probability to a chess game based on whether each move is plausible (using a policy network), and whether the higher-rated player won.

I don't think the distinction between near-exact and nonexact models is essential here. I bet we could introduce extra entropy into the short-term gas model and the rollout would still be superior for predicting the microstate than the Boltzmann distribution.

Beliefs at different timescales

Sure: If we can predict the next move in the chess game, we can predict the next move, then the next, then the next. By iterating, we can predict the whole game. If we have a probability for each next move, we multiply them to get the probability of the game.

Conversely, if we have a probability for an entire game, then we can get a probability for just the next move by adding up all the probabilities of all games that can follow from that move.

Beliefs at different timescales

Thanks, I didn't know that about the partition function.

In the post I was thinking about a situation where we know the microstate to some precision, so the simulation is accurate. I realize this isn't realistic.

Beliefs at different timescales

The sum isn't over , though, it's over all possible tuples of length . Any ideas for how to make that more clear?

EDT solves 5 and 10 with conditional oracles

I'm having trouble following this step of the proof of Theorem 4: "Obviously, the first conditional probability is 1". Since the COD isn't necessarily reflective, couldn't the conditional be anything?