A simpler way to say all this is "Pick a depth where you will stop recursing (due to growing uncertainty or computational limits) and at that depth assume your opponent acts randomly." Is my first attempt needlessly verbose?
Agents A & B are two TDT agents playing some prisoner's dilemma scenario. A can reason:
u(c(A)) = P(c(B))u(C,C) + P(d(B))u(C,D)
u(d(A)) = P(c(B))u(D,C) + P(d(B))u(D,D)
( u(X) is utility of X, P() is probability, c() & d() are cooperate & defect predicates )
A will always pick the option with higher utility, so it reasons B will do the same:
p(c(B) u'(c(B)) > u'(d(B)) --> c(B)
(u'() is A's estimate of B's utility function)
But A can't perfectly predict B (even though it may be quite good at it), so A can represent this uncertainty as a random variable e:
u'(c(B)) + e > u'(d(B)) - e --> c(B)
In fact, we can give e a parameter, N, which is given by the depth of recursion, like a game of telephone:
u'(c(B)) + e(N) > u'(d(B)) - e(N) --> c(B)
Intuitively, it seems e(N) will tend to overwhelm u() for high enough N (since utilities don't increase as you recurse.) At that recursion depth:
p(c(B)) = p(d(B))
u(c(A)) = u(C,C) +u(C,D)
u(d(A)) = u(D,C) + u(D,D)
u(D,C) > u(C,C) > u(D,D) > u(C,D)
so u(d(A)) > u(c(A)), meaning defection at the recursive depth where uncertainty overwhelms other considerations.
Does this mean a TDT agent must revert to CDT if it is not smart enough (or does not believe its opponent is smart enough) to transform the recursion to a closed-form solution?