This is probably obvious, but maybe still worth mentioning:
It’s important to take into account the ROI per unit time. In the amount of time it would take for me to grok transformers (let’s say 100 hours), I could read ~1 million tokens, which is ~0.0002% of the training set of GPT3.
The curves aren’t clear to me, but i would bet grokking transformers would be more effective than a 0.0002% increase in training set knowledge.
This might change if you only want to predict GPT’s output in certain scenarios.
I agree that recursive self-improvement can be very very bad; in this post I meant to show that we can get less-bad-but-still-bad behavior from only (LLM, REPL) combinations.
Yeah, this isn't something I have an ugh field around, but having portable versions of travel stuff like shampoo, skincare, and chargers ready to go is nice.
Awesome! What GitHub integration are you talking about?
I think the hackernews comment section, though still somewhat emotionally charged, is of substantially better quality.
Also, I responded to some comments/questions there.
This is awesome! I highly encourage you to write up your experience; I think this should be more normalized!
I didn’t feel a difference; I guess because it was my iron reserves that were low, not my actual iron levels.
But yeah, if it does continue to be a problem I will do something like that.
Sure, but my point was I don't know what dose per unit time to take.
You might want to clarify that, because in the post you explicitly say things like “if your goal is to predict the logits layer, then you should probably learn about Shakespearean dramas, Early Modern English, and the politics of the Late Roman Republic.”