LESSWRONG
LW

680
Adam Scherlis
990Ω12220990
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
4Adam Scherlis's Shortform
Ω
3y
Ω
16
What can we learn about childrearing from J. S. Mill?
Adam Scherlis5d-10
  1. This is an April Fool's post.
  2. J S Mill's father was much less influential than either Mill or Bentham and the excerpt in the post does not describe him as having any particular intellectual abilities. (In fact he was a fairly successful writer in his day, but not remotely comparable to his son.)
  3. Utilitarianism is not passed down genetically.
Reply
How much progress actually happens in theoretical physics?
Adam Scherlis5mo60

I think you unfortunately can't really verify the recent epistemic health of theoretical physics, without knowing much theoretical physics, by tracing theorems back to axioms. This is impossible to do even in math (can I, as a relative layperson, formalize and check the recent Langlands Program breakthrough in LEAN?) and physics is even less based on axioms than math is.

("Even less" bc even math is not really based on mutually-agreed-upon axioms in a naive sense, cf. Proofs and Refutations or the endless squabbling over foundations.)

Possibly you can't externally verify the epistemic health of theoretical physics at all, post-70s, given the "out of low hanging empirical fruit" issue and the level of prerequisites needed to remotely begin to learn anything beyond QFT.

 

Speaking as a (former) theoretical physicist: trust us. We know what we're talking about ;)

Reply
How much progress actually happens in theoretical physics?
Adam Scherlis5mo41

I'm not Mitchell, but I think I agree with him here enough to guess: He probably means to say that production of new plausible theories has increased, production of experimentally verified theories has stalled, and the latter is not string theory's fault.

(And of course this whole discussion, including your question, is interpreting "physics" to means "fundamental physics", since theoretical and empirical work on e.g. condensed matter physics has been doing just fine.)

Reply1
How much progress actually happens in theoretical physics?
Adam Scherlis5mo52

I am not going to spend more than a few minutes here or there to give "speaking as a physicist" takes on random LW posts; I think convincing people that my views are correct in full detail would require teaching them the same things that convinced me of those views, which includes e.g. multiple years of study of QFT.

Instead, I tend to summarize what I think and invite people to ask specific questions about e.g. "why do you believe X" if they want to go further down the tree or test my beliefs more aggressively.

"That doesn't answer the question because I am not convinced by everything you said" is not really a helpful way to do that imo.

Reply
How much progress actually happens in theoretical physics?
Adam Scherlis5mo10

To spell out my views: there has been a bit of a real slow-down in theoretical physics, because exploring the tree of possible theories without experiment as a pruning mechanism is slower than if you do get to prune. I think the theory slowdown also looks worse to outsiders than it is, because the ongoing progress that does happen is also harder to explain due to increasing mathematical sophistication and a lack of experimental correlates to point to. This makes e.g. string theory very hard to defend to laypeople without saying "sorry, go learn the theory first".

This is downstream of a more severe slowdown in unexplained empirical results, which results from (imo) pretty general considerations of precision and energy scales, per the modern understanding of renormalization, which suggest that "low-hanging fruit gets picked and it becomes extremely expensive to find new fruit" is a priori pretty much how you should expect experimental physics to work. And indeed this seems to have happened in the mid 20th century, when lots of money got spent on experimental physics and the remaining fruit now hangs very high indeed.

And then there's the 90s/2000s LHC supersymmetry hype problem, which is a whole nother (related) story.

Reply
How much progress actually happens in theoretical physics?
Adam Scherlis5mo91

The main thing I'd add is that string theory is not the problem. All the experimental low hanging fruit was picked decades ago. There are very general considerations that suggest that any theory of quantum gravity, stringy or otherwise, will only really be testable at the Planck scale. What this means in practice is that theoretical high-energy physics doesn't get to check its answers anymore.

I think there's still progress, and still hope for new empirical results (especially in astrophysics and cosmology), but it's much harder without a barrage of unexplained observations.

Reply
Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis5mo*30

Great questions :)

The approach here is much faster than the SGLD approach; it only takes tens or hundreds of forward passes to get a decent estimate. Maybe that's achievable in principle with SGLD, but we haven't managed it.

I like KFAC but I don't think estimating the Hessian spectrum better is a bottleneck; in our experiments on tiny models, the true Hessian didn't even always outperform the ADAM moment estimates. I like the ideas here, though!

The big downside of our approach, compared to Timaeus's, is that it underestimates basin size (overestimates complexity) for two reasons:
1) Jensen bias: the "pancake" issue, which we can alleviate a bit with preconditioners
2) The "star domain" constraint we impose (requiring line-of-sight between the anchor point and the rest of the basin) is arguably pretty strict, although we think it holds by default for the "KL neighborhood" variant.
It's not clear that this is an obstacle in practice, though, in settings where you just want a metric of complexity that runs fast and has approximately the right theoretical and empirical properties to do practical work with.

We've been working on using SGLD and thermodynamic integration to get a more-trusted measurement of basin size, but we suspect the most naive version of our estimator (or the Adam-preconditioned version) will be most practical for downstream applications.

We use average KL divergence over a test set as our behavioral loss, and (for small models where it's tractable) we use the Hessian of KL, i.e. the Fisher.

Reply
Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis5mo10

I am not sure I agree :)

It is unimportant in the limit (of infinite data), but away from that limit, it is only unimportant by a factor of 1/log(data), which seems small enough to be beatable in practice in some circumstances.

The spectra of things like Hessians tend to be singular, yes, but also sort of power-law. This makes the dimensionality a bit fuzzy and (imo) makes it possible for absolute volume scale of basins to compete with dimensionality.

Essentially: it's not clear that a 301-dimensional sphere really is "bigger" than a 300-dimensional sphere, if the 300-dimensional sphere has a much larger radius. (Obviously it's true in a strict sense, but hopefully you know what I'm gesturing at here.)

Reply
Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis5mo10

I think this is correct but we're working on paper rebuttals/revisions, I'll take a closer look very soon! I think we're working along parallel lines.

In particular, I have been thinking of "measure volumes at varying cutoffs" as being more or less equivalent to "measure LLC at varying ε".

We choose expected KL divergence as a cost function because it gives a behavioral loss, just like your behavioral LLC, yes.

I can give more precise statements once I look at my notes.

Reply
Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis6mo50

If you're wondering if this has a connection to Singular Learning Theory: Yup!

In SLT terms, we've developed a method for measuring the constant (with respect to n) term in the free energy, whereas LLC measures the log(n) term. Or if you like the thermodynamic analogy, LLC is the heat capacity and log(local volume) is the Gibbs entropy.

We're now working on better methods for measuring these sorts of quantities, and on interpretability applications of them.

Reply
Load More
32Estimating the Probability of Sampling a Trained Neural Network at Random
Ω
6mo
Ω
10
10What can we learn about childrearing from J. S. Mill?
1y
2
43Two Percolation Puzzles
2y
14
123GPT-175bee
3y
14
7How to export Android Chrome tabs to an HTML file in Linux (as of February 2023)
3y
3
84Inner Misalignment in "Simulator" LLMs
Ω
3y
Ω
12
9Fun math facts about 2023
3y
6
83A hundredth of a bit of extra entropy
3y
4
44An exploration of GPT-2's embedding weights
Ω
3y
Ω
4
47A brainteaser for language models
3y
3
Load More