but my intuition suggests that giant amounts of transistors shouldn't be the fastest way to compute almost everything,
You really want your computation devices to be small, fast, cheap and reliable. And transistors are very small and fast and reliable. Also, binary has a lot of advantages, and transistors can do arbitrary logic gates.
Also, special purpose components have a lot of limitations.
Your gravity sort only works if the computer is the right way up.
Analogue processes in general are hard to do to reasonable precision. The sort of optics based fourier transform hardware would not only be low precision, and probably nondeterministic, it would also be fixed size. If you want to do bigger or smaller fourier transforms, tough.
It's hard to replace general purpose components with special purpose ones because there are so many different things you might want to compute. Modern computers can do loads of tasks at a well enough level. A device that could do all sorting magically and instantly, and made your computer 10% more expensive, would still probably not be worth it. How much of your computers time is actually spent on sorting.
A lot of the code currently run isn't a neat maths thing like sorting or fourier transform. It's full of 1000's of messy details. (eg the linux kernal, firefox, most other packages) It would be possible to make specialized hardware with a specific version of all the details built into it. But other than that, what you want is a general instruction following machine.
What you're describing is not really different in principle from using specialized hardware like GPUs for rendering polygons instead of running everything on the same general CPUs. There are ASICs for hashing (used for Bitcoin mining), FGPAs (real-time signal processing, I think), and of course, TPUs for AI inference. And with cloud-computing, would you even know if your computation was actually being done with different physics than you thought?
Yes, those are importantly different and do in fact add diversity. They are still made from transistors though.
Computers now are mostly semiconductors doing logic operations. There are, of course, other parts, but they are mostly structural, not doing actual computation.
But imagine computer history took a different route: you could buy different units with different physics doing different calculations. You could buy a module with a laser and liquid crystal screen doing a Fourier transform. You could buy a module with tiny beads doing gravity sort. I could think of more examples, but I think you got the idea.
Maybe it's not going to work because it's much easier economically to set up a unified manufacturing pipeline and focus on speeding up those general-purpose computers than setting up many specialized pipelines for specialized computations? Am I just describing the pre-digital era with mechanical integrators and various radio schemes? Maybe what I'm trying to describe took the form of much more easily shareable libraries?
And, of course, there are examples of different physics used to build computers (quantum computers being the most famous example), but my intuition suggests that giant amounts of transistors shouldn't be the fastest way to compute almost everything, and I don't observe enough variety that this same intuition would suggest.
Those, of course, are silly examples which wouldn't actually work; they are here just to point out that different possibilities exist. The general idea is that you could outsource your math to physics in more than one way.