The most important thing is approaching other points of view with an open mind, with epistemic humility , that is, knowing that something of what you think can be wrong, even if, from the inside, everything feels right.
On the object level:
That's when I discovered more effective ways to approach reading, including what I'll call "Guess-and-Check," the technique of scanning and making predictions. Instead of trying to read every word in a textbook, in Guess-and-Check you scan the material and make predictions about what you think the text is saying. This active reading process can help you better engage with the material and activate your prior knowledge. After making your prediction, be sure to confirm or correct it by checking it against the text.
This is similar to the way GPT-3 was trained! Pretty cool that you also found it effective!
Yes, the tone of my comment could be improved. I appreciate him for publishing his lessons to the community and wanted to give some suggestions to improve (eventual) future ones, if he feels like the higher quality is worth the higher effort, and with no obligation. "Al caval donato non si guarda in bocca" (You should not look at the teeth of a gift horse (to learn about its age))
In Italy we have this, in the Ferragosto week (around the 15 of August) a huge percentage of people is on vacation. In general a lot of people take vacations in August and schools of all order and levels (including university) are closed the whole month.
In the simplest possible way to partecipate, yes, but a hackathon is made to elicit imaginative and novel ways to approach the problem (how? I do not know, it is the partecipants' job to find out).
Yes of course:
Also in the performance metric, the sum of the performance of each layer should probably be weighted to give less importance to the initial layers, otherwise we encourage the models to do as much of the work as possible at the start instead of being gradual.
Probably even if not completely by hand, MNIST is so simple that hybrid human-machine optimization could be possible, maybe with a UI where you can see the effect on validation loss in (almost) real time of changing a particular weight with a slider. I do not know if it would be possible to improve the final score by changing the weights one by one. Or maybe the human can use instinctual vision knowledge to improve the convolutional filters.
On Cifar this looks very hard to do manually given that the dataset is much harder than Mnist.
I think that a too large choice of lambda is better than a too small one because if lambda is too big the results will still be interesting (which model architecture is the best under extremely strong regularization?) while if it is too small you will just get a normal architecture slightly more regularized.
So what about an ensamble of the top 20 linear probes? Is it substantially better than using just the best one alone? I would expect so given that they are orthogonal, so they are using ~uncorrelated information.