I agree that the results are legit, just taking issue with the authors presenting them without prior work context (e.g. setting the wrong reference class s.t. the improvement over baselines appears larger). RNNs getting outsized performance on maze/sudoku is to be expected and the main ARC result seems to be more of a strong data augmentation + SGD baseline rather than something unique to the architecture, ARC-1 was pretty susceptible to this (eg ARC-AGI Without Pretraining)
This being said I think it's a big deal that various RNN architectures have such di...
Flagging that the HRM paper strongly reads as low-substance, after seeing this post I revisited it for a deeper read to fully understand their method and for me this confirmed initial impressions. I used to get very excited about every novel architecture published and over time I think there's some amount of cognitive immunity you can build up, e.g. spending most of the paper rehashing vague "inspirations" tends to be a dark pattern employed when you want to make your use of a standard method seem more novel than it is.
I don't really have the time to...
Update: ARC has published a blog post analyzing this, https://arcprize.org/blog/hrm-analysis. As expected swapping in a transformer works approx the same.