So, a softmax can never emit a probability of 0 or 1, maybe they were implicitly assuming the model ends in a softmax (as is the common case)? Regardless, the proof is still wrong if a model is allowed unbounded context, as an infinite product of positive numbers less than 1 can still be nonzero. For example, if the probability of emitting another " 0" is even just as high as $1 - \frac1{n^{1.001}}$ after already having emitted $n$ copies of " 0", then the limiting probability is still nonzero.

But if the model has a finite context and ends in a softmax then I think there is some minimum probability of transitioning to a given token, and then the proposition is true. Maybe that was implicitly assumed?

So, a softmax can never emit a probability of 0 or 1, maybe they were implicitly assuming the model ends in a softmax (as is the common case)? Regardless, the proof is still wrong if a model is allowed unbounded context, as an infinite product of positive numbers less than 1 can still be nonzero. For example, if the probability of emitting another " 0" is even just as high as $1 - \frac1{n^{1.001}}$ after already having emitted $n$ copies of " 0", then the limiting probability is still nonzero.

But if the model has a finite context and ends in a softmax then I think there is some minimum probability of transitioning to a given token, and then the proposition is true. Maybe that was implicitly assumed?