JoshBurroughs

Sorted by New

# Wiki Contributions

A simpler way to say all this is "Pick a depth where you will stop recursing (due to growing uncertainty or computational limits) and at that depth assume your opponent acts randomly." Is my first attempt needlessly verbose?

Agents A & B are two TDT agents playing some prisoner's dilemma scenario. A can reason:

u(c(A)) = P(c(B))u(C,C) + P(d(B))u(C,D)

u(d(A)) = P(c(B))u(D,C) + P(d(B))u(D,D)

( u(X) is utility of X, P() is probability, c() & d() are cooperate & defect predicates )

A will always pick the option with higher utility, so it reasons B will do the same:

p(c(B) u'(c(B)) > u'(d(B)) --> c(B)

(u'() is A's estimate of B's utility function)

But A can't perfectly predict B (even though it may be quite good at it), so A can represent this uncertainty as a random variable e:

u'(c(B)) + e > u'(d(B)) - e --> c(B)

In fact, we can give e a parameter, N, which is given by the depth of recursion, like a game of telephone:

u'(c(B)) + e(N) > u'(d(B)) - e(N) --> c(B)

Intuitively, it seems e(N) will tend to overwhelm u() for high enough N (since utilities don't increase as you recurse.) At that recursion depth:

p(c(B)) = p(d(B))

so:

u(c(A)) = u(C,C) +u(C,D)

u(d(A)) = u(D,C) + u(D,D)

u(D,C) > u(C,C) > u(D,D) > u(C,D)

so u(d(A)) > u(c(A)), meaning defection at the recursive depth where uncertainty overwhelms other considerations.

Does this mean a TDT agent must revert to CDT if it is not smart enough (or does not believe its opponent is smart enough) to transform the recursion to a closed-form solution?