have you considered dropping backprop entirely and using blackbox methods like evolutionary computing
This is a really neat idea that I'd love to explore more. I've tried some brief experiments in that area in the past, using Z3 to find valid parameter combinations for different logic gates using that custom activation function. I didn't have any luck, though; the optimizer ran for hours without finding any solutions and I fell back to a brute-force search instead.
A big part of the issue for me is that I'm just very unfamiliar with the whole domain. There's probably a lot I did wrong in that experiment that caused it... (read more)
This post contains an overview of my research and experiments on growing sparse computational graphs I'm calling "Bonsai Networks" by training small RNNs. It describes the architecture, training process, and pruning methods used to create the graphs and then examines some of the learned solutions to a variety of objectives.
Its main theme is mechanistic interpretability, but it also goes into significant detail on the technical side of the implementation for the training stack, a custom activation function, bespoke sparsity-promoting regularizer, and more.
The site contains a variety of interactive visualizations and other embeds that are important to its content. That's... (read more)
Wow, I appreciate this list! I've heard of a few of the things you list like the weight-agnostic NNs, but most is entirely new to me.
Tyvm for taking the time to put it together.