ZeroRelevance — LessWrong

An Opinionated Guide to Using Anki Correctly

The problem with these cloze cards is that you tend to link the shape to the information rather than the words themselves. After a few goes you'll basically stop reading the words entirely. It's not very effective for recalling the facts irl, since usually you'll be trying to recall the answer to a specific question (prompt), not fill in the blanks. I find that the things the OP talks about in the above guide are much better for actually recalling info when it counts.

As an aside, the philosophy of impulsive deletion/suspension from your main comment seems like a promising idea. I typically take the opposite approach and don't even suspend leeches, with that being exclusively for useless or obviously-defined words. I might try it out if I go about learning another language though, it definitely has potential (though it also seems far more suited for high-immersion learners which isn't something I'm good at).

What Does LessWrong Think of Human Intelligence Augmentation in 2023?

ZeroRelevance2y10

The population growth problem should be somewhat addressed by healthspan extension. A big reason as to why people aren't having kids now is that they lack the resources - be it housing, money, or time. If we could extend the average healthspan by a few decades, then older people who have spent enough time working to accumulate those resources, but are too old to raise children, should now be able have kids. Moreover, it means that people who are already have many kids but have just become too old will also be able to have more. For those reasons, I don't think a future birth limit of 30 billion is particularly reasonable.

However, I don't think it will make a difference, at least for addressing AI. Once computing reaches a certain level of advancement, it will simply be unfeasible for something the size of a human brain, no matter how enhanced, to compete with a superintelligence running on a supercomputer the size of a basketball court. And that level of computing/AI advancement will almost certainly be achieved before the discussed genetic enhancement will ever bear fruit, probably even before it's made legal. Moreover, it's doubtful we'll see any significant healthspan extensions particularly long before achieving ASI, so that makes it even less relevant, although I don't think any of these concerns were particularly significant in the first place as it also seems like we'll see ASI long before global population decline.

GPT-4

ZeroRelevance3y10

Sorry for the late reply, but yeah, it was mostly vibes based on what I'd seen before. I've been looking over the benchmarks in the Technical Report again though, and I'm starting to feel like 500B+10T isn't too far off. Although language benchmarks are fairly similar, the improvements in mathematical capabilities over the previous SOTA is much larger than I first realised, and seem to match a model of that size considering the conventionally trained PaLM and its derivatives' performances.

GPT-4: What we (I) know about it

ZeroRelevance3y10

Apparently all OPT models were trained with a 2k token context length. So based on this, assuming basic O(n^2) scaling, an 8k token version of the 175B model would have the attention stage scale to about 35% of the FLOPS, and a 32k token version would scale to almost 90% of the FLOPS. 8k tokens is somewhat excusable, but 32k tokens is still overwhelmingly significant even with a 175B parameter model, costing around 840% more compute than a 2k token model. That percentage will probably only drop to a reasonable level at around the 10T parameter model level, provided O(n^2) scaling at least. And that's all assuming the other aspects of the model don't scale at all with the larger context length... A new approach is definitely going to be needed soon. Maybe H3?

GPT-4 solves Gary Marcus-induced flubs

ZeroRelevance3y33

I always get annoyed when people use this as an example of 'lacking intelligence'. Though it certainly is in part an issue with the model, the primary reason for this failure is much more likely the tokenization process than anything else. A GPT-4, likely even a GPT-3, trained with character-level tokenization would likely have zero issues answering these questions. It's for the same reason that the base GPT-3 struggled so much with rhyming for instance.

GPT-4

ZeroRelevance3y20

According to the Chinchilla paper, a compute-optimal model of that size should have ~500B parameters and have used ~10T tokens. Based on its GPT-4's demonstrated capabilities though, that's probably an overestimate.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments