Wiki Contributions

Comments

If you go slower, you have more time to find desirable mechanisms. That's pretty much it I guess.

I'm convinced by the mainstream view on COVID origins and medicine.

I'm ambivalent on education - I guess if done well, it'd consistently have good effects, and that currently, it on average has good effects, but also the effect varies a lot from person to person, so simplistic quantitative reviews don't tell you much. When I did an epistemic spot check on Caplan's book, it failed terribly (it cited a supposedly-ingenious experiment that university didn't improve critical thinking, but IMO the experiment had terrible psychometrics).

I don't know enough about sleep research to disagree with Guzey on the basis of anything but priors. In general, I wouldn't update much on someone writing a big review, because often reviews include a lot of crap information.

I might have to read Jayman's rebuttal of B-W genetic IQ differences in more detail, but at first glance I'm not really convinced by it because it seems to focus on small sample sizes in unusual groups, so it's unclear how much study noise, publication bias and and sampling bias effects things. At this point I think indirect studies are getting obsolete and it's becoming more and more feasible to just directly measure the racial genetic differences in IQ.

However I also think HBDers have a fractal of bad takes surrounding this, because they deny the phenotypic null hypothesis and center non-existent abstract personality traits like "impulsivity" or "conformity" in their models.

  • The RLCT = first-order term for in-distribution generalization error and also Bayesian learning (technically the 'Bayesian free energy').  This justifies the name of 'learning coefficient' for lambda. I emphasize that these are mathematically precise statements that have complete proofs, not conjectures or intuitions. 

Link(s) to your favorite proof(s)?

Also, do these match up with empirical results?

  • Knowing a little SLT will inoculate you against many wrong theories of deep learning that abound in the literature. I won't be going in to it but suffice to say that any paper assuming that the Fischer information metric is regular for deep neural networks or any kind of hierarchichal structure is fundamentally flawed. And you can be sure this assumption is sneaked in all over the place. For instance, this is almost always the case when people talk about Laplace approximation.

I have a cached belief that the Laplace approximation is also disproven by ensemble studies, so I don't really need SLT to inoculate me against that. I'd mainly be interested if SLT shows something beyond that.

it can be estimated at scale.

As I read the empirical formulas in this paper, they're roughly saying that a network has a high empirical learning coefficient if an ensemble of models that are slightly less trained on average have a worse loss than the network.

But then so they don't have to retrain the models from scratch, they basically take a trained model, and wiggle it around using Gaussian noise while retraining it.

This seems like a reasonable way to estimate how locally flat the loss landscape is. I guess there's a question of how much the devil is in the details; like whether you need SLT to derive an exact formula that works.


I guess I'm still not super sold on it, but on reflection that's probably partly because I don't have any immediate need for computing basin broadness. Like I find the basin broadness theory nice to have as a model, but now that I know about it, I'm not sure why I'd want/need to study it further.

There was a period where I spent a lot of time thinking about basin broadness. I guess I eventually abandoned it because I realized the basin was built out of a bunch of sigmoid functions layered on top of each other, but the generalization was really driven by the neural tangent kernel, which in turn is mostly driven by the Jacobian of the network outputs for the dataset as a function of the weights, which in turn is mostly driven by the network activations. I guess it's plausible that SLT has the best quantities if you stay within the basin broadness paradigm. 🤔

Newton's Universal Law of Gravitation was the first highly accurate model of things falling down that generalized beyond the earth, and it is also the second-most computationally applicable model of things falling down that we have today.

Are you saying that singular learning theory was the first highly accurate model of breadth of optima, and that it's one of the most computationally applicable ones we have?

I would say "the thing that contains the inheritance particles" rather than "the inheritance particle". "Particulate inheritance" is a technical term within genetics and it refers to how children don't end up precisely with the mean of their parents' traits (blending inheritance), but rather with some noise around that mean, which particulate inheritance asserts is due to the genetic influence being separated into discrete particles with the children receiving random subsets of their parent's genes. The significance of this is that under blending inheritance, the genetic variation between organisms within a species would be averaged away in a small number of generations, which would make evolution by natural selection ~impossible (as natural selection doesn't work without genetic variation).

Isn't singular learning theory basically just another way of talking about the breadth of optima?

The tricky part is, on the margin I would probably use various shortcuts, and it's not clear where those shortcuts end short of just getting knowledge beamed into my head.

I already use LLMs to tell me facts, explain things I'm unfamiliar with, handle tedious calculations/coding, generate simulated data/brainstorming and summarize things. Not much, because LLMs are pretty bad, but I do use them for this and I would use them more on the margin.

isn't a reference frame; rather, if is a world then aka are the reference frames for .

Essentially when dealing with generalized reference frames that contain answers to questions such as "who are you?", the possible reference frames are going to depend on the world (because you can only be a real person, and which real people there are depends on what the world is). As such, "reference frames" don't make sense in isolation, rather one needs a (world, reference frame) pair, which is what I call an "interpretation".

An idea I've been playing with recently:

Suppose you have some "objective world" space . Then in order to talk about subjective questions, you need a reference frame, which we could think of as the members of a fiber of some function , for some "interpretation space" .

The interpretations themselves might abstract to some "latent space" according to a function . Functions of would then be "subjective" (depending on the interpretation they arise from), yet still potentially meaningfully constrained, based on . In particular if some structure in lifts homomorphically up through and down through , you get exactly the same structure in . (And these obviously compose nicely since they're just spans, so far.)

The key question is what kind of space/algebra to preserve. I can find lots of structures that work well for particular abstractions, but it seems like the theory would have to be developed separately for each type of structure, as I don't see any overarching one.

Load More