I can do you one better. I think that the people who are good at learning math don't necessarily overlap much with the people who are good at using math.
More broadly, I think the big issue here is the reality of the 80/20 principle, where 80% of the alignment value is created by 20% of the alignment researchers. However, recent events have revealed that what we actually need is lots of minds pointed at random directions, and some of them will randomly get lucky and end up pointed in the right place at the right time. Some people will still have larger search spaces than others, but it's still vastly more egalitarian than what anyone expected ~4 years ago.
The problem with the 80/20 model is that it pushes a person (even those in the top 20%) towards the assumption that they, themself, are several times more likely to be in the bottom 80% than not. It also makes them think that any given person they meet are several times more likely to be in the bottom 80% than not, and this means that a person making just one big mistake will make the evaluator feel like the person they're talking to is in the bottom 30%. Heuristics like that conserve cognitive resources/energy, and they're still very traditional for networking and hiring.
However, recent events have revealed that what we actually need is lots of minds pointed at random directions, and some of them will randomly get lucky and end up pointed in the right place at the right time. Some people will still have larger search spaces than others, but it's still vastly more egalitarian than what anyone expected ~4 years ago.
What events?
The idea that we need many different people to "poke at the alignment problem in random directions" implicitly presupposes that alignment (technical, at least) is a sort of mathematical problem that could be "solved" with one theory of intelligence and rationality, which, in principle, someone could develop within a small group or even single-handedly, as Einstein developed general relativity.
I think this assumption is wrong.
Just one example: to really understand the characteristics and predict the implications of actual alignment proposals, like Open Agency Architecture, with any degree of confidence, requires integrating the perspectives of cognitive science (including epistemology, rationality, and ethics), social and political science, legal theory, theories of collective intelligence/cognition, game theory, mechanism design, network theory, dynamical systems theory, theories of evolution and development, distributed systems and control theories, computer science, machine learning, physics, and more.
This cannot possibly be pulled off by a sole researcher or even a small group. Alignment research should, therefore, be very collaborative and multi-disciplinary, rather than isolated researchers, developing "their own" theory of cognitive science.
"This cannot possibly be pulled off by a sole researcher or even a small group."
Maybe you're right about this. Or maybe you're not. When tackling a difficult problem with no clear idea how to find a solution, it's not a good idea to narrow the search space for no compelling reason.
The idea that alignment research constitutes some kind of "search for a solution" within the discipline of cognitive science is wrong.
Let's stick with the Open Agency Architecture (OAA), not because I particularly endorse it (I have nothing to say about it actually), but because it suits my purposes.
We need to predict the characteristics that civilisational intelligence architectures like OAA will have, characteristics such as robustness of alignment with humans, robustness/resilience in a more general sense (for example, in the face of external shocks, such as a supernova explosion close to the Solar system), scalability beyond Earth (and, perhaps, the Solar system), the flexibility of values (i. e., the characteristic opposite of value lock-in, Riedel (2021)
), the presence and the characteristics of global consciousness, and, perhaps, some other ethical desiderata. And we need to do this not just for OAA, but for many different proposed architectures.
Analysing all these candidate architectures requires a lot of multi-disciplinary work, most of which could be done right now, applying the current state-of-the-art theories in the respective disciplines (the disciplines which I enumerated in the comment above, and more) that we currently have. Furthermore, we shouldn't only apply a single theory of, for example, cognitive science or ethics or robust control in our analysis: we absolutely have to apply different "competing" theories in the respective disciplines, e.g., competing theories of intelligence/agency, to see what predictions these different theories will yield, to increase our down the line chances of survival (There are at least five promising general theories of intelligence/agency[1] apart from Infra-Bayesianism and other stuff developed by AI alignment researchers. Of course, there are even more theories of ethics. And there are at least multiple theories of robust control.)
It doesn't matter that these theories are currently all "half-baked" and sometimes "off": all naturalistic theories are "wrong", and will be wrong (Deutsch even suggested to call them "misconceptions" instead of "theories"). Ultimately, building aligned AGI is an engineering, and hence naturalistic endeavour, so creating any single mathematical theory, no matter how self-consistent, could be just a small part of the story. You also must have a naturalistic theory (or, realistically, a patchwork of many theories from many disciplines, again, see above) of how the mathematical construct is implemented in real life, whether in computers, people, their groups and interactions, etc. (Yes, I oppose Vanessa Kosoy's "cryptography" analogy here, I think it's methodologically confused.)
So, how much effort of alignment researchers should go into building a multi-disciplinary (and cross-theoretic, within some particularly "important" disciplines, such as cognitive science!) understanding of various alignment proposals/paradigms (a.k.a. civilisational intelligence architectures, I maintain that any alignment proposal that aims lower than that is confused about the real subject of what we are trying to do), and how much should go towards developing new theories of intelligence/cognition/agency, in addition to those at least five academic ones (and I count only serious and well-developed ones), and maybe five more that were proposed within the alignment community already[2]? What is ROI of doing this or that type of research? Which type of research attacks (buys down) larger chunks of risk? Which type of research is more likely to be superseded later, and which will likely remain useful regardless?
If you consider all these questions, I think you will arrive at the conclusion that much more effort should go into multi-disciplinary R&D of alignment proposals/paradigms/plans/architectures, than into shooting at creating new cognitive science theories. Especially considering that academics already do the second type of work (after all, academics came up with those "five theories"[1], and there is a lot of work behind all of them), but don't do the first kind of research. It's only up to alignment researchers to do it.
To sum up: it's not me who suggests "narrowing the search". The kind of search you were hinting at (within the disciplines of cognitive science and rationality) is already narrow by design. I rather suggest widening the perspective and the nomenclature of disciplines that AI alignment researchers are seriously engaging with.
These five theories, for reference, are Active Inference (Fields et al. (2022)
, Friston et al. (2022)
), MCR^2 (Ma et al. (2022)
), thermodynamic ML (Boyd et al. (2022)
), "Bengio's views on intelligence/agency" (see, for example, Goyal & Bengio (2022)
), and "LeCun's views on intelligence/agency" (LeCun (2022)
). Maybe I'm still missing important theories, please let me know.
Note that in this question, I don't consider how the total amount of effort that should go to either of these research types should compare with the amount of effort which goes into mechanistic interpretability research. I don't know.
The insights here generalize, and now I desire a post discussing this phenomenon in highly quotable, general terms.
Thanks for writing that.
Three thoughts that come to mind:
If I try to think about someone's IQ (which I don't normally do, except for the sake of this message above where I tried to think about a specific number to make my claim precise) I feel like I can have an ordering where I'm not too uncertain on a scale that includes me, some common reference classes (e.g. the median student of school X has IQ Y), and a few people who did IQ tests around me. I'd by the way be happy to bet on anyone if someone accepted to reveal their IQ (e.g. from the list of SERI MATS's mentors) if you think my claim is wrong.
Also, I think that it's fine to have less chances of being an excellent alignment research for that reason. What matters is having impact, not being an excellent alignment researcher. E.g. I don't go full-in a technical career myself essentially for that reason, combined with the fact that I have other features that might allow me to go further in the impact tail in other subareas that are relevant.
If I try to think about someone's IQ (which I don't normally do, except for the sake of this message above where I tried to think about a specific number to make my claim precise)
Thanks for clarifying that.
I feel like I can have an ordering where I'm not too uncertain on a scale that includes me, some common reference classes (e.g. the median student of school X has IQ Y), and a few people who did IQ tests around me.
I'm not very familiar with the IQ scores and testing, but it seems reasonable you could get rough estimates that way.
Also, I think that it's fine to have less chances of being an excellent alignment research for that reason. What matters is having impact, not being an excellent alignment researcher. E.g. I don't go full-in a technical career myself essentially for that reason, combined with the fact that I have other features that might allow me to go further in the impact tail in other subareas that are relevant.
Good point, there are lots of ways to contribute to reducing AI risk besides just doing technical alignment research.
We can't even reliably produce Gausses and von Neumanns two thousand years after we've learned that humans can achieve such heights of mathematical sophistication.
"Two thousand years"? Is that a typo? Did you mean "two hundred"?
When people talk about research ability, a common meme I keep hearing goes something like this:
Are you able to give examples? I don't immediately remember hearing anything like this. It's possible I just don't remember, or that I haven't looked in the right places. But another hypothesis I have in mind here, is that what you've been hearing is different from what people have been saying. (Or, of course, that you've been hearing people correctly and I haven't been.)
When people talk about research ability, a common meme I keep hearing goes something like this:
What is the point of telling anyone any of these? If I were being particularly uncharitable, I'd guess the most obvious explanation that it's some kind of barely-acceptable status play, kind of like the budget version of saying "Are you smarter than Paul Christiano? I didn't think so." Or maybe I'm feeling a bit more generous today so I'll think that it's Wittgenstein's Ruler, a convoluted call for help pointing out the insecurities that the said person cannot admit to themselves.
But this is LessWrong and it's not customary to be so suspicious of people's motivations, so let's assume that it's just an honest and pithy way of communicating the boundaries of hard-to-articulate internal models.
First of all, what model?
Most people here believe some form of biodeterminism. That we are not born tabula rasa, that our genes influence the way we are, that the conditions in our mother's womb can and do often snowball into observable differences when we grow up.
But the thing is, these facts do not constitute a useful causal model of reality. IQ, aka (a proxy for) the most important psychometric construct ever discovered and most often the single biggest predictor of outcomes in a vast number of human endeavours, is not a gears-level model.
Huh? Suppose it were, and it were the sole determinant of performance in any mentally taxing field. Take two mathematicians with the exact same IQ. Can you tell me who would go on to become a number theorist vs an algebraic topologist? Can you tell me who would over-rely on forcing when disentangling certain logic problems? Can you tell me why Terrence Tao hasn't solved the Riemann hypothesis yet?
There is so much more that goes into becoming a successful scientist than can be distilled in a single number it's not even funny. Even if said number means a 180 IQ scientist is more likely to win a Nobel than a 140 IQ nobody. Even if said number means it's a safer bet to just skim the top 10 of whatever the hell the modern equivalent of the SMPY is than to take a chance on some rando on the other side of the world who is losing sleep on the problem.
But okay, sure. Maybe approximately no one says that it's just IQ. Nobody on LessWrong is so naïve as to have a simple model with no caveats so: let's say it's not just IQ but some other combo of secret sauces. Maybe there's like eight variables that together form a lognormal distribution.
Then the question becomes: how the hell is your human behaviour predicting machine so precise that you're able to say with abject confidence what can exclude someone from doing important work? Do you actually have a set of values in mind for each of these sliders, for your internal model of which kinds of people alone can do useful research? Did you actually go out there and measure which of the people on this page have which combinations of factors?
I think the biggest harm that comes from making this kind of claim is that, like small penis discourse (WARNING: CW) there's a ton of collateral damage done when you say it out loud that I think far outweighs whatever clarity the listeners gain. I mean, what's the chain of thought gonna be for the other people in the room[1]?
True, some wiseass can probably respond at this point, "but wouldn't a Great Alignment Researcher have enough metacognition to ignore such statements" But could your archetype of alignment-Einstein have predicted that someone who has had persistent problems with fatigue for several decades and has a tested IQ of only 143 would go on to write a fanfic that would serve as a gateway drug for thousands of people into rationality, and by extension, the entire quagmire we're in? Hell, if you have such a crystal-clear model of what makes for great alignment researchers, why haven't you become one?
Thankfully I don't think a lot of the top brass believe the notion that we can tell at a glance who is destined to join their ranks. Because if so, we can replace those two-hour-long, alignment camp Google Forms with the following three-and-a-half-item questionnaire:
If we could tell at a glance who the destined ones are, if it were just a matter of exposing them to the problem and then giving them the resources to work on it, then SERI MATS and AGI Safety Fundamentals should just close up shop. Provided such a super-Einstein exists in this sea of eight billion poor souls we can just replace all such upskilling programmes with a three-minute spot in the next Superbowl.
But the fact of the matter is, we don't know what makes for a good alignment researcher. Human interpretability is barely better than that of the machine models we worry about so much and our ways of communicating such interpretations are even more suspect. It is a testament to how utterly inefficient we are as a species when it comes to using our sense-data that the thousands of petabytes we produce each day barely affects the granularity of our communicable scientific models. We can't even reliably produce Gausses and von Neumanns two thousand years after we've learned that humans can achieve such heights of mathematical sophistication.
When we set a social norm that it is okay and expected to exclude people who do not fit certain molds, we risk ruling out the really great ones[2]. We often chide academia for having been coopted by bureaucratic slowness and perverse incentives but at least they have a reproducible way of producing experts! We do not! By choosing to shortcut actually having a gears-level understanding of research performance[3], we are sweeping under the rug the fact that we don't have good supportive infrastructure that reliably produces good researchers.
And, well, another contender explanation might be that this is all just cope and that we really are just fodder for the giants that will loom over us. Hell, maybe the kind of person who would write a rant like this isn't the sort of person who would go on to solve the Hard Parts of the Problem, who would win all the Nobels, and siphon all the grant moneys.
Fair enough.
But if this rant saves the future Chosen One from having to doubt themselves because of some reckless schmuck who probably won't go on to do great things tells them a bunch of nonsense, then it would have been all worth it.
Sure, some people probably need to hear it because it's true and it's better to save them the heartbreak, but is that really your intention when you utter those words? Are you really doing the EV calculation for the other person? ↩︎
Another response to this could be that our resources aren't infinite, and so we should optimise for those who can contribute meaningfully on the margin. But again, are you really doing the EV calculation when you say such things in your mind's voice? ↩︎
A point I wanted to make that I couldn't fit anywhere else is that, it's plausible we aren't yet seeing a lognormal distribution when it comes to alignment research because we haven't optimised away all the inessential complexities of doing said research. Academia is old. They've had several dozen cycles of Goodharting and counter-Goodharting what makes for good science, and well, that's not even what we care about. Who knows what a fully optimised notkilleveryone-ist program would look like, if it can even be measured in things like h-indices? ↩︎