I am quite surprised that this happened 3 years ago! This seems really impressive for 3 years ago GPT series? And I expect the models to get better? Yes, it might be a fluke, but wouldn't we expect current models to have a higher chance of doing a fluke this good?
Though others have made the point essentially, I feel like (a) simple answer bears a simple explanation: Just imagine your hypothesis-turing-machines output (approximate) probabilistic predictions. For example, imagine you output probabilities that are fractions, over a finite portion of the input space, so that you don't have to worry about that messy infinite continuous stuff.
Note - not sure if the exact form has the same nice properties. However the approximate form should be workable I think.
But the clockwise rule only tells you anything when more than two people are there at the same time. Because A < B < A in clockwise order.
Possible example: Laennec's invention of the stethoscope in 1816. Of course we would've come up with it eventually. But note that Laennec got his inspiration from kids playing with sticks and from his prudishness about putting his ear to a woman's chest.
Consider that people have been using sound in diagnosis for millenia. But even something as simple as tapping a finger on another (to e.g. feel and hear the liquid in e.g. your lungs (which you don't want)) was introduced in the mid 1700s by Auenbrugger (though some medieval guy had it too? Not going to count it since it seemed to not be advanced further) and the method also influenced Laennec. Auenbrugger was inspired by his father's wine business - you tap the barrel to see how much fluid is in it!
So, consider: anyone 'could have' come up with either of these for... literal millenia? But they didn't? And the main inspiration was stuff most medical practicioners weren't looking at? Note that Laennec had some experience in flute making that helped him make his stethoscopes.
Lastly (Corvisart)[https://en.wikipedia.org/wiki/Jean-Nicolas_Corvisart] appears to have helped keep the percussion technique of Auenbrugger alive. Laennec learned of percussion from Corvisart's translation of Auenbrugger - and Corvisart expanded on his findings of how to use the sound info. This isn't a fundamental discovery, but it looks like he did have significant impact.
Semmelweis, Lister, and Pasteur are great examples. Early adopters of Germ Theory and related issues like sanitation and antiseptics, disbelieved by everyone around them. But you can't say that they didn't have impact due to the disbelief - Pasteur was definitely influenced by Lister and Semmelweis, and Pasteur really got the purposefully made vaccines down (whereas with smallpox we lucked out with cowpox happening to already exist). So unlike others whose ideas are sufficiently strange as to be rejected (thus giving good evidence of counterfactual discovery, e.g. Mendel whose work was only rediscovered about when the actual content was being refigured out.), they managed to create huge counterfactual impact.
So I guess, if you can't convince most people, at least manage to convince a handful of early adopters well positioned to reap the rewards of your ideas?
Additionally, the neural nets (afaik) are used in evaluation of a position, but not for stateful "strategy". That is, the overall algorithm has a heuristic evaluation function (potentially incorporating a neural network), and then chooses a move by doing the sort of "future calculation" that humans do, except in fancy ways to make it compute fast.
An example, to point out that this isn't necessarily a market failure caused by imperfect information/biases: fiction. Something new has a lower bar that something old. You can't surprise me with the same plot twists, can't give the same novel speculation (especially for the most important parts of the work, which I forget less).
Likewise if I have a way of detecting errors in e.g. code, I may want a completely-different-paradigm tester even if it's on average worse, in hopes of catching the places where my first tester failed - likewise for emergency preparedness and backup techniques generally, where you want to minimize positive correlation in error so that something is very likely to work at all.
Sub-likewise, generally if you are willing to take a hit to the mean in favor of increasing variance (because you care about the positive heavy tails more than the negative ones, e.g. if you can take the max of your attempts, or if you need a hail mary in football to win) you will have an example of wanting worse but different.
Can a computer do this? That is, take in the footage and output a drawing or a 3D model that accurate? I don't know what SOTA of that sort of image processing is (and of course nowadays we have better ML models).
Noether's theorem is an interesting one. The evidence was there, but it's the sort discovery that's incredibly nonobvious even if you have a pile of evidence staring right at you. Perhaps Einstein would've gotten it. That she figured it out while working with Hilbert and Einstein on relativity suggests that the ideas that lead to relativity help you think of the ideas of Noether's Theorem. But I think it's pretty likely she was quite counterfactual here.
Well, since you quite literally begged the question: what are you sexual kinks?