crabman's Comments

The Zettelkasten Method

A question about your "don't sort often" advice. How do you deal with linking unsorted cards?

  1. At first, you create a card and put it in the unsorted pile of cards, and you don't give it an index. Is this correct? Or do you give the card an index, add some links, and then put it back into the unsorted pile of cards?
  2. At some point (which per your suggestion should not be too soon) you give it an index and put it in the sorted part. Do you only think of links at this point?
Mark Xu's Shortform

There are a bunch of explanations of logarithm as length on Arbital.

Small Data

“big data” refers to situations with so much training data you can get away with weak priors The most powerful recent advances in machine learning, such as neural networks, all use big data.

This is only partially true. Consider some image classification dataset, say MNIST or CIFAR10 or ImageNet. Consider some convolutional relu network architecture, say, conv2d -> relu -> conv2d -> relu -> conv2d -> relu -> conv2d -> relu -> fullyconnected with some chosen kernel sizes and numbers of channels. Consider some configuration of its weights . Now consider the multilayer perceptron architecture fullyconnected -> relu -> fullyconnected -> relu -> fullyconnected -> relu -> fullyconnected -> relu -> fullyconnected. Clearly, there exist hyperparameters of the multilayer perceptron (numbers of neurons in hidden layers) such that there exists a configuration of weights of the multilayer perceptron, such that the function implemented by the multilayer perceptron with is the same function as the function implemented by the convolutional architecture with . Therefore, the space of functions which can be implemented by the convolutional neural network (with fixed kernel sizes and channel counts) is a subset of the space of functions which can be implemented by the multilayer perceptron (with correctly chosen numbers of neurons). Therefore, training the convolutional relu network is updating on evidence and having a relatively strong prior, while training the multilayer perceptron is updating on evidence and having a relatively weak prior.

Experimentally, if you train the networks described above, the convolutional relu network will learn to classify images well or at least okay-ish. The multilayer perceptron will not learn to classify images well, its accuracy will be much worse. Therefore, the data is not enough to wash away the multilayer perceptron's prior, hence by your definition it can't be called big data. Here I must note that ImageNet is the biggest publically available data for training image classification, so if anything is big data, it should be.


Big data uses weak priors. Correcting for bias is a prior. Big data approaches to machine learning therefore have no built-in method of correcting for bias.

This looks like a formal argument, a demonstration or dialectics as Bacon would call it, which uses shabby definitions. I disagree with the conclusion, i.e. with the statement "modern machine learning approaches have no built-in method of correcting for bias". I think in modern machine learning people are experimenting with various inductive biases and various ad-hoc fixes or techniques with help correcting for all kinds of biases.


In your example with a non-converging sequence, I think you have a typo - there should be rather than .

Legends of Runeterra: Early Review

Nice review. I like CCGs in general, but I haven't heard about Legends of Runeterra and thanks to your review I decided not to play it.

Regarding Emergents, what platforms will it be on and can I be an alpha/beta tester?

crabman's Shortform

How to download the documentation of a programming library for offline use.

  1. On the documentation website, look for "downloads" section. Preferrably choose HTML format, because then it will be nicely searchable - I can even create a krunner web shortcut for searching it. Example: Numpy - find "HTML+zip".
  2. If you need pytorch, torchvision, or sklearn - simply download
  3. If you need the documentation hosted on in the bottom left press "Read the docs" a download type from "Downloads". Search field won't work in the HTML version, so feel free to download whatever format you like. Example: Elpy. Warning: for some libraries (e.g. more-itertools) the downloaded version is basically broken, so you should check if what you've downloaded is complete.
  4. In some weird cases ReadTheDocs documentation for the latest version might of a library might be unlisted in the downloads secion of ReadTheDocs. For example, if you click the readthedocs icon in the bottom right of, you won't find a download link for version 8.0. In this case copy the hyperlink or and replace pallets-click with the name of the project you want. It doesn't work for all projects, but it works for some.
  5. Use httrack to mirror the documentation website. In my experience it doesn't take long. Do it like $ httrack This will download everything hosted in and will not go outside of this server directory. In this case the search field won't work.
Michaël Trazzi's Shortform

Do you have any tips on how to make the downloaded documentation of programming languages and libraries searchable?

Btw here's my shortform on how to download documentations of various libraries:

The Zettelkasten Method

It turns out Staples index-cards-on-a-ring are not a thing in Russia. It might be the case in other countries as well, so here I am posting my solution which goes in the spirit of Abram's suggestions. A small A6 binder and pages for it on Aliexpress (archived version). In my opinion it looks nice and feels nice, although now I think A6 is too small and I would prefer A5.

Named Distributions as Artifacts

Let’s start with the application of the central limit theorem to champagne drinkers. First, there’s the distinction between “liver weights are normally distributed” and “mean of a sample of liver weights is normally distributed”. The latter is much better-justified, since we compute the mean by adding a bunch of (presumably independent) random variables together. And the latter is usually what we actually use in basic analysis of experimental data—e.g. to decide whether there’s a significant different between the champagne-drinking group and the non-champagne-drinking group. That does not require that liver weights themselves be normally distributed.

I think your statement in bold font is wrong. I think in cases such as champagne drinkers vs non-champagne-drinkers people are likely to use Student's two-sample t-test or Welch's two-sample unequal variances t-test. It assumes that in both groups, each sample is distributed normally, not that the means are distributed normally.

crabman's Shortform

Tbh what I want right now is a very weak form of reproducibility. I want the experiments I am doing nowadays to work the same way on my own computer every time. That works for me so far.

crabman's Shortform

It turns out, Pytorch's pseudorandom number generator generates different numbers on different GPUs even if I set the same random seed. Consider the following file

import torchseed = 0
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

foo = torch.randn(500, 500, device="cuda")
print(f"{foo.min() / foo.max()=:.30f}")

On my system, I get the following for two runs on two different GPUs:

foo.min() / foo.max()=-0.949029088020324707031250000000
foo.min() / foo.max()=-0.966440916061401367187500000000

Due to this, I am going to generate all pseudorandom numbers on my CPU and then transfer them to GPU for reproducibility's sake like foo = torch.randn(500, 500, device="cpu").to("cuda").

Load More