Comments

Trying to learn a language from scratch, just from text is a fun exercise for humans also. I recently tried this with Hindi after I had an disagreement with someone about the exact question of this post. I didn't get very far in 2 hours though.

Trydactyl is amazing. You can disable the mode on specific websites by running the blacklistadd command. If you have configured that already, these settings can also be saved in your config file. Here's my config (though careful before copying my config. It has fixamo_quiet enabled, a command that got Tridactyl almost removed when it was enabled by default. You should read what it does before you disable it.)

Here are my ignore settings:

autocmd DocStart https://youtube.com mode ignore
autocmd DocStart https://todoist.com mode ignore
autocmd DocStart mail.google.com mode ignore
autocmd DocStart calendar.google.com mode ignore
autocmd DocStart keyma.sh mode ignore
autocmd DocStart monkeytype.com mode ignore
autocmd DocStart https://www.youtube.com mode ignore
autocmd DocStart https://ilias.studium.kit.edu/ mode ignore
autocmd DocStart localhost:888 mode ignore
autocmd DocStart getguestimate.com mode ignore
autocmd DocStart localhost:8888 mode ignore

Juggling: Anthony Gatto's juggling routine from 2000. Anthony Gatto holds several juggling world records. This routine is infamous in the juggling world (here's a decent juggler commenting on it). As well as the fact that he gave up juggling to work with concrete instead (because it pays the bills). Here's more context on Gatto and his routine (the guy picking up the balls for him in the video is his father, for example):

Morpheus1mo-21

Agreed. Especially the “electoral college is good actually” part is where I started laughing. If you don't want tyranny by the majority, perhaps just not crippling your system by not using first-past-the-post voting would be a first step to a more sane system.

Absolutely love this essay! The green from the perspective of non-green thoughts really resonated with things I thought in the past and made me notice how I have been confused by green. Helpfull for AGI or not, this is is giving me a bunch of fresh thoughts about problems/confusing areas in my own life, so thanks!

A quick intuitive check for whether something is a natural latent over some parts of a system consists of two questions:

  • Are the parts (approximately) independent given the candidate natural latent?

I first had some trouble checking this condition intuitively. I might still not have got it correctly. I think one of the main things that got me confused first, is that if I want to reason about natural latents for “a” dog, I need to think about a group of dogs. Even though there are also natural latents for the individual dog (like fur color is a natural latent across the dog's fur). Say I check the independence condition for a set of sets of either cats or dogs. So if I look at a single animal's shoulder height in those sorted cluster, it tells me which of the two clusters it's in, but once I updated on that information, my guesses for the dog height's will not be able to improve.

An important example for something that is not a natural latent is the empirical mean in fat tailed distributions for real world sample sizes, while it is in thin-tailed ones. This doesn't mean that they don't have natural latents. This fact is what Nassim Taleb is harping on. For Pareto distributions (think: pandemics, earthquakes, wealth), one still has natural latents like the tail index (estimated from plotting the data on a log-log plot by dilettantes like me and more sophisticatedly by real professionals).

If I had more time I would have written a shorter letter.

TLDR: I looked into how much it would take to fine-tune gpt-4 to do Fermi estimates better. If you liked the post/paper on fine-tuning Language models to make predictions you might like reading this. I evaluated gpt-4 on the first dataset I found, but gpt-4 was already making better fermi estimates than the examples in the dataset, so I stopped there (my code).

First problem I encountered: there is no public access to fine-tuning gpt-4 so far. Ok, we might as well just do gpt-3.5 I guess.

First, I found this Fermi estimate dataset. (While doing this, I was thinking I should perhaps search more widely what kind of different AI benchmarks exist, since probably a dataset that is evaluating a similar capability is already out there, but I don't know its name.)

Next I looked at this paper, where people used among other gpt-3.5 and gpt-4 on this benchmark. Clearly these people weren't even trying, though, because gpt-4 does worse than gpt-3.5. One of the main issues I saw was that they were trying to make the LLM output the answer as a program in the domain specific language used in that dataset. They couldn't even get the LLM to output valid programs more than 60% of the time (their metric compares on a log scale, if the answer by the LLM is within 3 orders of magnitude of the real answer. 1 is best 0 is more than 3 orders of magnitude away: fp-score(x) = max(0,1-1/3 * | log_10(prediction/answer)|)).

image

My conjecture was that just using python instead should give you better results.(This turned out to be true). I get a mean score of ~0.57 on 100 sample problems, so as good results with gpt-4-turbo as they get when they first provide “context” by giving the llm the values for the key variables needed to compute the answer (why would this task even still be hard at all?).

When gpt-4 turned out to get a worse fp-score than gpt-4-turbo on my 10 samples. I got suspicious and after looking at samples gpt-4 got a bad score, it was clear this was mostly to blame on bad quality of the dataset. 2 answers were flat-out not using the correct variables/confused, while gpt-4 was answering correctly. Once, the question didn't make clear what unit to use. 2 of the samples gpt-4 gave a better answer. Once, using a better approach (using geometry instead of wrong figures of how much energy the earth gets from the sun, to determine the fraction of sun energy that the earth receives). Once, by having better numbers, input estimates like how many car miles are driven in total in the US.

So on this dataset, gpt-4 seems to be already at the point of data-saturation. I was actually quite impressed how well it was doing. When I had tried using gpt-4 for this task, I had always felt like it was doing quite badly. One guess I have is this is because when I ask gpt-4 for an estimate, it is often a practical question, which is actually harder than these artificial questions. In addition, the reason I ask gpt-4 is that the question is hard, and I expect to need to employ a lot of cognitive labor to do it myself.

Another data point with respect to this was the “Thinking physics exercises”. Which I tried with some of my friends. For that task, gpt-4 was better than people who were bad at this, but worse than people who were good at this (and given 5–10 minutes of thinking time) (although I did not rigorously evaluate that). GPT-4 is probably better than most humans at doing Fermi estimates given 10 minutes of time. Especially in domains one is unfamiliar with, since it has so much more breadth.

I would be interested to see what one would get out of actually making a high quality dataset by taking Fermi estimates from people I deem to produce high quality work in that area. 

Not exactly what you were looking for, but recently I noticed that there were a bunch of John Wentworth's posts that I had been missing out on that he wrote over the past 6 years. So if you get a lot out of them too, I recommend just sorting by 'old'. I really liked don't get distracted by the boilerplate (The first example made something click about math for me that hadn't clicked before, which would have helped me to engage with some “boilerplate” in a more productive way.). I also liked constraints and slackness, but I didn't go beyond the first exercise yet. There's also more technical posts that I didn't have the time to dig into yet.

bhauth doesn't have as long a track record, but I got some interesting ideas from his blog which aren't on his lesswrong account. I really liked proposed future economies and the legibility bottleneck.

This post warms my heart. Thank you.

The pdf linked by @CstineSublime is definitely towards the textbook. I’ve started reading it and it has been an excellent read so far. Will probably write a review later.

Load More