Boris Kashirin's Shortform
Jul 7, 20253
I was thinking about how prompt differs from training data in terms of tokenization. If i am to prompt with "solution:" as opposed to "solution: " it seems like it can influence the result, as in training data last token contain some information about next token. If there is token...
Apparently with reflection technique (answer-critique-improve) GTP4 capable of giving much better answers. But that implies it should be capable of doing essentially Alpha Go Zero type of learning! It can't do complete self play from zero as there is no ground truth for it to learn from, but that basically...