# crabman's Shortform

New Comment

A competition on solving math problems via AI is coming. https://imo-grand-challenge.github.io/

• The problems are from the International math olympiad (IMO)
• They want to formalize all the problems in Lean (theorem prover) language. They haven't figured out how to do that, e.g. how to formalize problems of form "determine the set of objects satisfying the given property", as can be seen in https://github.com/IMO-grand-challenge/formal-encoding/blob/master/design/determine.lean
• A contestant must submit a computer program that will take a problem's description as input and output a solution and its proof in Lean.

I would guess that this is partly a way to promote Lean. I think it would be interesting to pose questions about this on metaculus.

Any guesses at the difficulty? My first impression was that this is not going to be solved any time soon. I just don't think current techniques are good enough to write flawless lean code given a difficult objective.

I think grand challenges in AI are are sometimes useful, but when they are at this level I am a bit pessimistic. I don't think this is necessarily AI-complete, but it's perhaps close.

Can you quantify soon :) ? For example, I'd be willing to bet at 1/3 odds that this will be solved in the next 10 years conditional on a certain amount of effort being put in and more like 1/1 odds for the next 20 years. It's hard to quantify the conditional piece but I'd cash it out as something like if researchers put in the same amount of effort into this that they put into NLP/image recognition benchmarks. I don't think that'll happen, so this is purely a counterfactual claim, but maybe it will help ground any subsequent discussion with some sort of concrete claim?

By soon I mean 5 years. Interestingly, I have a slightly higher probability that it will be solved within 20 years, which highlights the difficulty of saying ambiguous things like "soon."

That is interesting! I should be clear that my odds ratios are pretty tentative given the uncertainty around the challenge. For example, I literally woke up this morning and thought that my 1/3 odds might be too conservative given recent progress on 8th grade science tests and theorem proving.

I created three PredictionBook predictions to track this if anyone's interested (5 years, 10 years, 20 years).

In my MSc courses the lecturer gives proofs of important theorems, while unimportant problems are given as homework. This is bad for me, because it makes me focus on actually figuring out not too important stuff. I think it works like this because the course instructors want to • make the student do at least something and • check whether the student has learned the course material.

Ideally I would like to study using interactive textbooks where everything is a problem to solve on my own. Such a textbook wouldn't show an important theorem's proof right away. Instead it would show me the theorem's statement and ask me to prove it. There should be hints available and, obviously, I should be able to see the author's proof when I want to see it.

Also, for textbooks about Turing machines, recursive functions, and stuff like that: having an interpreter of Turing machines would be very nice. (googling Turing machine interpreter and using whatever you find is a bad idea, because they all have different flavors)

I found this to vary by field. When I studied topology and combinatorics we proved all the big important things as homework. When I studied automata theory and measure theory, we did what your teacher is doing.

In my understanding, here are the main features of deep convolutional neural networks (DCNN) that make them work really well. (Disclaimer: I am not a specialist in CNNs, I have done one masters level deep learning course, and I have worked on accelerating DCNNs for 3 months.) For each feature, I give my probability, that having this feature is an important component of DCNN success, compared to having this feature to the extent that an average non-DCNN machine learning model has it (e.g. DCNN has weight sharing, an average model doesn't have weight sharing).

1. DCNNs heavily use transformations, which are the same for each window of the input - 95%
2. For any set of pixels of the input, large distances between pixels in the set make the DCNN model interactions between these pixels less accurately - 90% (perhaps usage of dilution in some DCNNs is a counterargument to this)
3. Large depth (together with the use of activation functions) lets us model complicated features, interactions, logic - 82%
4. Having a lot of parameters lets us model complicated features, interactions, logic - 60%
5. Given 3 and 4, SGD-like optimization works unexpectedly fast for some reason - 40%
6. Given 3 and 4, SGD-like optimization with early stopping doesn't overfit too much for some reason - 87% (I am not sure if S in SGD is important, and how important is early stopping)
7. Given 3 and 4, ReLu-like activation function works really well (compared to, for example, sigmoid).
8. Modern deep neural network libraries are easy to use compared to the baseline of not having specific well-developed libraries - 60%
9. Deep neural networks work really fast, when using modern deep neural network libraries and modern hardware - 33%
10. DCNNs find such features in photos, which are invisible to the human eye and to most ML algorithms - 20%
11. Dropout helps reducing overfitting a lot - 25%
12. Batch normalization improve quality of the model a lot for some reason - 15%
13. Batch normalization makes the optimization much faster - 32%
14. Skip connections (or residual connections, I am not sure if there's a difference) help a lot - 20%

Let me make it more clear how I was assigning the probabilities and why I created this list. I am trying to come up with a tensor network based machine learning model, which would have the main advantages of DCNNs, but which would not, itself, be a deep relu neural network. So I decided to make this list to see which important components my model has.

Often in psychology articles I see phrases like "X is associated with Y". These articles' sections often read like the author thinks that X causes Y. But if they had evidence that X causes Y, surely they would've written exactly that. And in such cases I feel that I want to punish them, so in my mind I instead read it as "Y causes X", just for contrarianism's sake. Or, sometimes, I imagine what variable Z can exist which causes both X and Y. I think the latter is a useful exercise.

Examples:

It appears that some types of humor are more effective than others in reducing stress. Chen and Martin (2007) found that humor that is affiliative (used to engage or amuse others) or self-enhancing (maintaining a humorous perspective in the face of adversity) is related to better mental health. In contrast, coping through humor that is self-defeating (used at one’s own expense) or aggressive (criticizing or ridiculing others) is related to poorer mental health.

The author says that non-self-defeating non-agressive humor helps reduce stress. But notice the words "related". For the first "related", it seems plausible that not having a good mental health causes you to lose humor. For the second "related", I think it's very probable that poor mental health, such as depression and low self esteem, causes self-defeating humor.

How does humor help reduce the effects of stress and promote wellness? Several explanations have been proposed (see Figure 4.7). One possibility is that humor affects appraisals of stressful events. Jokes can help people put a less threatening spin on their trials and tribulations. Kuiper, Martin, and Olinger (1993) demonstrated that students who used coping humor were able to appraise a stressful exam as a positive challenge, which in turn lowered their perceived stress levels.

Or it could be that students, who are well prepared for the exams or simply tend to not be afraid of them, will obviously have lower perceived stress levels, and maybe will be able to think about the exams as a positive challenge, hence they'' able to joke about them in this way.

It's possible in this example, that the original paper Kuiper, Martin, and Olinger (1993) actually did an intervention making students use humor, in which case the causality must go from humor to stress reduction. But I don't want to look at every source, so screw you author of Psychology Applied to Modern Life (both quotes are from it) for not making it clear whether that study found causation or only correlation.

A few examples would help - the academic papers I see often call out this problem, and suggest possible Zs themselves. Generally, X and Y are more easily or precisely measured than the likely Zs, so make for better publications.

I definitely see the problem in popular articles and policy justification.

How to download the documentation of a programming library for offline use.

1. On the documentation website, look for "downloads" section. Preferrably choose HTML format, because then it will be nicely searchable - I can even create a krunner web shortcut for searching it. Example: Numpy - find "HTML+zip".
2. If you need pytorch, torchvision, or sklearn - simply download https://github.com/unknownue/PyTorch.docs.
5. Use httrack to mirror the documentation website. In my experience it doesn't take long. Do it like $httrack https://click.palletsprojects.com/en/7.x/. This will download everything hosted in https://click.palletsprojects.com/en/7.x/ and will not go outside of this server directory. In this case the search field won't work. It turns out, Pytorch's pseudorandom number generator generates different numbers on different GPUs even if I set the same random seed. Consider the following file do_different_gpus_randn_the_same.py: import torchseed = 0 torch.manual_seed(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False foo = torch.randn(500, 500, device="cuda") print(f"{foo.min():.30f}") print(f"{foo.max():.30f}") print(f"{foo.min() / foo.max()=:.30f}")  On my system, I get the following for two runs on two different GPUs: $ CUDA_VISIBLE_DEVICES=0 python do_different_gpus_randn_the_same.py
-4.230118274688720703125000000000
4.457311630249023437500000000000
foo.min() / foo.max()=-0.949029088020324707031250000000
\$ CUDA_VISIBLE_DEVICES=1 python do_different_gpus_randn_the_same.py
-4.230118751525878906250000000000
4.377007007598876953125000000000
foo.min() / foo.max()=-0.966440916061401367187500000000


Due to this, I am going to generate all pseudorandom numbers on my CPU and then transfer them to GPU for reproducibility's sake like foo = torch.randn(500, 500, device="cpu").to("cuda").

You're going to need to do more than that if you want full reproducibility, because GPUs aren't even deterministic in the first place, and that is big enough to affect DRL/DL results.

Tbh what I want right now is a very weak form of reproducibility. I want the experiments I am doing nowadays to work the same way on my own computer every time. That works for me so far.

A new piece of math notation I've invented which I plan to use whenever I am writing proofs for myself (rather than for other people).

Sometimes when writing a proof, for some long property P(x) I want to write:

It follows from foo, that there exists x such that P(x). Let x be such that P(x). Then ...

I don't like that I need to write P(x) twice here. And the whole construction is too long for my liking, especially when the reason foo why such x exists is obvious. And if I omit the first sentence It follows from foo, that there exists x such that P(x). and just write

Let x be such that P(x). Then ...

then it's not clear what I mean. It could be that I want to show that such x exists and that from its existence some statement of interest follows. Or it could be that I want to prove some statement of form

For each x such that P(x), it holds that Q(x).

Or it could even be that I want to show that something follows from existence of such x, but I am not asserting that such x exists.

The new notation I came up with is to write L∃t in cases when I want to assert that such x exists and to bound the variable x in the same place. An example (an excerpt, not a complete proof):

• Suppose is a countably infinite set, suppose is a set of subsets of , suppose .
• L∃t be a bijection from onto .
• L∃t such that , if is in , otherwise .
• By recursion theorem, L∃t be such that , .
• L∃t .
• ...

Many biohacking guides suggest using melatonin. Does liquid melatonin spoil under high temperature if put in tea (95 degree Celcius)?

More general question: how do I even find answers to questions like this one?

When I had a quick go-ogle search I started with:

"melatonin stability temperature"

then

"N-Acetyl-5-methoxytryptamine"

A quick flick through a few abstracts I can't see anything involving temperatures higher than 37 C i.e. body temperature.

Melatonin is a protein, many proteins denature at temperatures above 41 C.

My (jumped to) conclusion:

No specific data found.

Melatonin may not be stable at high temperatures, so avoid putting it in hot tea.