Thanks for the good thoughts and questions on this! We're taking a closer look at the behavioral statistics modeling, and here are some heatmaps that visualize the "cheese Euclidean distance to top-right corner" metric's relationship with the chance of successful cheese-finding.
These plots show the frequency of cheese-finding over 10k random mazes (sampled from the "maze has a decision square" distribution) vs the x/y offset from the top-right corner to the cheese location. The raw data is shown, plus a version binned into 5x5 patches to get more samples in each bin. The bin counts are also plotted for reference. (The unequal sampling is expected, as all maze sizes can have small cheese-corner offsets, but only large mazes can have large offsets. The smallest 5x5 bin by count has 35 data points).
We can see a pretty clear relationship between cheese-corner offset and probability of finding the cheese, with the expected perfect performance in the top-right 5x5 patch that was the only allowed cheese location during the training of this particular agent. But the relationship is non-linear, and of cause doesn't provide direct evidence of causality.