Wiki Contributions

Comments

That's an interesting idea, I may test that out at some point. I'm assuming the softmax would be for kings / queens, where there is typically only one on the board, rather than for e.g. blank squares or pawns?

The all stockfish data engine played at a level that was 100-200 Elo higher in my tests, with a couple caveats. First, I benchmarked the LLMs against stockfish, so an all stockfish dataset seems helpful for this benchmark. Secondly, the stockfish LLM would probably have an advantage for robustness because I included a small percentage of stockfish vs random move generator games in the stockfish dataset in the hopes that it would improve its ability.

I haven't done an in depth qualitative assessment of their abilities to give a more in depth answer unfortunately.

Yes, in this recent OpenAI superalignment paper they said that GPT-4's training dataset included a dataset of chess games filtered for players with greater than 1800 Elo. Given gpt-3.5-turbo-instruct's ability, I'm guessing that its dataset included a similar collection.