I'm curious if LLMs would do better on later-gen games. However, they don't have as robust emulation tools as far as I know.
I've been testing on FireRed, and the improvements are marginal at best (though admittedly the tileset is very similar). I wouldn't predict a significant vision difference in DPPt or B/W, but maybe the 3D graphics would help? There's the advantage of less Gen 1 jank like tiny inventory, no "press A to cut" (this might've saved Claude), and stuff like that, but those are slight factors.
A lot (~90%) of my experimentation has been with Gemini 2.5 Flash, which I think has been instructive; reminds me that the intelligence-frontier models are... (read more)
I've been testing on FireRed, and the improvements are marginal at best (though admittedly the tileset is very similar). I wouldn't predict a significant vision difference in DPPt or B/W, but maybe the 3D graphics would help? There's the advantage of less Gen 1 jank like tiny inventory, no "press A to cut" (this might've saved Claude), and stuff like that, but those are slight factors.
A lot (~90%) of my experimentation has been with Gemini 2.5 Flash, which I think has been instructive; reminds me that the intelligence-frontier models are... (read more)