In HCH, the human user does a little work then delegates subquestions/subproblems to a few AIs, which in turn do a little work then delegate their subquestions/subproblems to a few AIs, and so on until the leaf-nodes of the tree receive tiny subquestions/subproblems which they can immediately solve.
This does not agree with my understanding of what HCH is at all. HCH is a definition of an abstract process for thought experiments, much like AIXI is. It's defined as the fixed point of some iterative process of delegation expanding out into a tree. It's also not something you could actually implement, but it's a platonic form like "circle" or "integral".
This has nothing to do with the way an HCH-like process would be implemented. You could easily have something that's designed to mimic HCH but it's implemented as a single monolithic AI system.
Okay so where do most of your hopes route through then?
SAME
I'm just here to note that the "canonical example" thing mentioned here is very similar to the "nothing-up-my-sleeve numbers" used in the definition of some cryptographic protocols.
I think you may have meant this as a top-level comment rather than a reply to my comment?
Actually I would still really appreciate the training hyperparameters like batch size, learning rate schedule...
Ah, never mind, I believe I found the relevant hyperparameters here: https://github.com/adamimos/epsilon-transformers/blob/main/examples/msp_analysis.ipynb
In particular, the stuff I needed was that it has only a single attention head per layer, and 4 layers.
No, the actual hidden Markov process used to generate the awesome triangle fractal image is not the {0,1,random} model but a different one, which is called "Mess3" and has a symmetry between the 3 hidden states.
Also, they're not claiming the transformer learns merely the hidden states of the HMM, but a more complicated thing called the "mixed state presentation", which is not the states that the HMM can be in but the (usually much larger number of) belief states which an ideal prediction process trying to "sync" to it might go thru.
Can we expect to see code for this on https://github.com/agencyenterprise sometime soon? I'm excited to fiddle with this.