## LESSWRONGLW

George

Old man 1: Life is one trouble after another. I'd be better off dead, better yet, I wish I was never born

Old man 2: True, true, but who has such luck ?.. maybe one in a thousand.

My blog: https://blog.cerebralab.com

I'm also building an open source generic ML library: https://github.com/mindsdb/mindsdb & https://github.com/mindsdb/lightwood .... which I guess might be of interest to some people here

George's Shortform

I've been thinking a lot about replacing statistics with machine learning and how one could go about that. I previously tried arguing that the "roots" of a lot of classical statistical approaches are flawed, i.e. they make too many assumptions about the world and thus lead to faulty conclusions and overly complex models with no real insight.

I kind of abandoned that avenue once I realized people back in the late 60s and early 70s were making that point and proposing what are now considered machine learning techniques as a replacement.

So instead I've decided to just focus any further anger at bad research and people using nonsensical constructs like p-value on trying to popularize better approaches based on predictive modeling.

Predictive Coding has been Unified with Backpropagation

There is no relationship between neurons and the "neurons" of an ANN. It's just a naming mishap at this point.

MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"

Sometimes, those tokens represent words and sometimes they represent single characters.

Hmh, ok, quick update to my knowledge that I should have done before: https://huggingface.co/transformers/tokenizer_summary.html

Seems to indicate that GPT-2 uses a byte-level BPE, though maybe the impl here is wrong, where I'd have expected it to use something closer to a word-by-wrod tokenizer with exceptions for rare words (i.e. a sub-word tokenizer that's basically acting as a word tokenizer 90% of the time). And maybe GPT-3 uses the same?

Also it seems that sub-word tokenizer split much more aggressively than I'd have assumed before.

Complaint retracted.

If you've learned from the best, you're doing it wrong

Wasn't Feynman basically known for:

1. His contribution to computing, formalizing problems into code, parallelizing, etc
2. His mathematical contributions (Feynman diagrams, Feynman integrals)
3. His contributions to teaching/reasoning methods in general.

I agree that I'd want to learn physics from him, I'm just not sure he was an exceptional physicist. Good, but not Von Neuman. He says as much in his biographies (e.g. pointing out one of his big contributions came from randomly point to a valve on a schematic and getting people to think about the schematic).

He seems to be good at "getting people to think reasonably and having an unabashedly open, friendly, mischievous and perseverant personality", which seems to be what he's famous for and the only thing he thinks of himself as being somewhat good at. Though you could always argue it's due to modesty.

To give a specific example, this is him "explaining magnets", except that I'm left knowing nothing extra about magnets, but I do gain a new understanding of concepts like "level of abstraction" and various "human guide to word"-ish insights about language use and some phenomenology around what it means to "understand".

If you've learned from the best, you're doing it wrong

But the use-case for learning from the best is completely different: you study the best when there are no other options. You study the best when the best is doing something completely different, so they're the only one to learn it from.

I feel like I do mention this when I say one ought to learn from similar people.

If you spent 10 years learning how to <sport> and you are nr 10 in <sport> and someone else is nr 1 in <sport>, the heuristic of learning from someone similar to you applies.

For instance, back in college I spent a semester on a project with the strongest programmer in my class, and I picked up various small things which turned out to be really important (like "choose a good IDE").

What you are describing here though is simply a category error, "the best in class" is not "the best programmer", there were probably hundreds of thousands better than him on all possible metrics.

So I'm not sure how it's relevant.

It might pay to hang out with him, again, based on the similarity criteria I point out: He's someone very much like you, that is somewhat better at the thing you want to learn (programming).

If you've learned from the best, you're doing it wrong

Maybe weird writing on my end, the working out example that I'm referring is the section on professional athletes (aka them never necessarily having learnt how to do casual health-focused workouts). While physics teacher might have forgotten how it is not to know physics 101, but she still did learn physics 101 at some point.

Hopefully that makes it more clear?

Fun with +12 OOMs of Compute

The aliens seem to have also included with their boon:

• Cheap and fast eec GPU RAM with minute electricity consumptions
• A space time disruptor that allows you to have CMOS transistors smaller than electrons to serve as the L1&L2
• A way of getting rid of electron tunneling at a very small scale
• 12 OOMs better SSDs and fiber optic connections and cures for the host of physical limitations plaguing mere possibility of those 2.