(...) the term technical is a red flag for me, as it is many times used not for the routine business of implementing ideas but for the parts, ideas and all, which are just hard to understand and many times contain the main novelties.
- Saharon Shelah
As a true-born Dutchman I endorse Crocker's rules.
For my most of my writing see my short-forms (new shortform, old shortform)
Twitter: @FellowHominid
Personal website: https://sites.google.com/view/afdago/home
Pockets of Deep Expertise
Why am I so bullish on academic outreach? Why do I keep hammering on 'getting the adults in the room'?
It's not that I think academics are all Super Smart.
I think rationalists/alignment people correctly ascertain that most professors don't have much useful to say about alignment & deep learning and often say silly things. They correctly see that much of AI congress is fueled by labs and scale not ML academia. I am bullish on non-ML academia, especially mathematics, physics and to a lesser extent theoretical CS, neuroscience, some parts of ML/ AI academia. This is because while I think 95 % of academia is bad and/or useless there are Pockets of Deep Expertise. Most questions in alignment are close to existing work in academia in some sense - but we have to make the connection!
A good example is 'sparse coding' and 'compressed sensing'. Lots of mech.interp has been rediscovering some of the basic ideas of sparse coding. But there is vast expertise in academia about these topics. We should leverage these!
Other examples are singular learning theory, computational mechanics, etc
re: 1. I agree these are very difficult conceptual puzzles and we're running out of time.
On the other hand, from my pov progress on these questions from within the LW community (and MIRI adjacent researcher specifically) has been remarkable. Personally, the remarkable breakthru of Logical Induction first convinced me that these people were actually doing interesting serious things.
I also feel that the number of serious researchers working seriously on these questions is currently small and may be scaled substantially.
re: metacognition I am mildly excited about Vanessa's metacognitive agent framework & the work following from Payor's lemma. The theory-practice gap is still huge but real progress is being made rapidly. On the question of metacognition the alignment community could really benefit trying to engage with academia more - similar questions have been investigated and there are likely Pockets of Deep Expertise to be found.
The linked post you wrote about classical learning theory states that the bounds PAC gives are far more loose than what we see in practice for Neural Networks. In the post you sketch some directions in which tighter bounds may be proven. It is my understanding that these directions have not been pursued further.
Given all that "Fully adequate account of generalization" seems like an overstatement, wouldn't you agree?
At best we can say that PAC gives a nice toy model for thinking about notions like generalization and learnability as far as I can tell. Maybe I'm wrong- I'm not familiar with the literature- and I'd love to know more about what PAC & classical learning theory can tell us about neural networks.
Abnormalised sampling?
Probability theory talks about sampling for probability distributions, i.e. normalized measures. However, non-normalized measures abound: weighted automata, infra-stuff, uniform priors on noncompact spaces, wealth in logical-inductor esque math, quantum stuff?? etc.
Most of probability theory constructions go through just for arbitrary measures, doesn't need the normalization assumption. Except, crucially, sampling.
What does it even mean to sample from a non-normalized measure? What is unnormalized abnormal sampling?
I don't know.
Infra-sampling has an interpretation of sampling from a distribution made by a demonic choice. I don't have good interpretations for other unnormalized measures.
Concrete question: is there a law of large numbers for unnormalized measures?
Let f be a measureable function and m a measure. Then the expectation value is defined . A law of large numbers for unnormalized measure would have to say something about repeated abnormal sampling.
I have no real ideas. Curious to learn more.
From reading your post it seems that classical VC theory gives vacuous bounds for NN learning behaviour. Correct me if I'm wrong. You say that the PAC formalism can be improved to be more realistic and suggest more non-vacuous bounds may be proved. Do you have a reference where non-vacuous bounds are proved?
"My point there was mainly that theorems about generalisation in the infinite-data limit are likely to end up being weaker versions of more general results from statistical and computational learning theory."
What general results from statistical and computational learning theory are you referring to here exactly?
I notice I am confused by this. Seems implausible that a LLM can execute a devious x-risk plan in a single forward-pass based on a wrong prompt.
Alignment work mostly looks like standard academic science in practice. Young people in regular academia are paid a PhD stipend salary not a Bay Area programmer salary...
A comment by Patrick Foré, professor from University of Amsterdam