Commitments are hard. Based on my Research Principles for these 6 months of self-study, I'm writing up all my ideas regardless of level of polish. So here is the rough sketch of the idea I'm currently working on. Forgive the slightly over-confident style -- It's closer to the raw form of my cognition and not necessarily the level of nuance I reflectively endorse.
Will alignment be solved by a handful of geniuses or by the collective efforts of hundreds of the merely gifted? My understanding is that we are already trying our best to draw every genius out of the woodwork. Yet how much optimization pressure have we exerted on the collective efforts of the remaining 99% of alignment researchers?
Not much, I'm guessing?
So this is a short essay on what optimizing human collective intelligence may look like.
Known and Unknown Knowledge Space
First, some framing.
If you presume the solution to the alignment problem exists somewhere in knowledge space then what we need is a method to correctly and quickly navigate toward that solution. If we conceptualize knowledge space as a graph, where each node represents a modular piece of knowledge and edges are the relationships between these pieces of knowledge, then we could theoretically encode all of known and unknown knowledge in this form. Next, there is the individual skill of navigating across knowledge space as a researcher as well the collective skill of efficiently and completely searching knowledge space as a group of researchers. My proposal is that we need to improve both to find a reliable solution to the alignment problem before the development of AGI.
Discovering New Nodes
The search for new knowledge is what we refer to as 'learning'. You can discover a new knowledge node through three different methods:
Logic yields knowledge through interpolation and extrapolation from observations or assumptions. As such, it can be practiced entirely in theory. Mathematics is the foremost field that relies completely on logic, where I'm loosely using logic to refer to any form of theoretical reasoning.
Empiricism consists of observation. Experiments are controlled observations you perform to gather specific, generalizable knowledge from reality. It is challenging to run good experiments that tease out complex or precise relationships in reality, but it's generally easy to run small-scale experiments that generalize well within the limited-scope of your daily life. For instance, it is reasonably straightforward to figure out the average properties of males in your small hamlet of 100 people, but it is immensely challenging to figure out average properties of males in our species. Similarly, a child quickly learns through experimentation how gravity works as relevant to its daily life, but most humans don't end up with the ability to rediscover that the gravitational acceleration on Earth is roughly 9.81 .
Communication is the pathway we use to accrue collective knowledge. Through language we can encode all we learn and access all anyone else has ever learned. Much of the knowledge you acquire comes from communication because it is efficient - It is far faster and less error-prone to be told about photosynthesis than to derive it from scratch. Our human lives are too short to derive all collectively known knowledge. And thus we create knowledge speedruns for our children, and call them 'schools'. Education optimizes paths through collectively known knowledge space for individuals that do not have those knowledge nodes in their personally known knowledge space.
Now when it comes to research, we are at the frontier of our collectively known knowledge space and from there we attempt to forge out into the unknown. Discovering new nodes in knowledge space is achieved through logic and empiricism. And the only people who will be good at this process will be those that practiced their skill at developing proofs and designing experiments back in known knowledge space, where they could check their answers against the collective. Thus, research is a skill you practice against known facts.
Individual Search Algorithm
The scientific method is the current algorithm individual researchers run to discover new knowledge in unknown knowledge space. Researchers are traditionally trained in the scientific method during PhD's, which is basically an apprenticeship slathered in tradition, and sealed with a protected title. For any real world problem, the scientific method relies on iteration across empirical experiments, designed and analyzed using logic.
Now here comes the rub.
The alignment problem is a real world problem we need to solve in one-shot, on a deadline.
Real world problem - Alignment cannot be reliably reduced to a mathematical equation without testing that that mathematical equation holds up in the real world. Reality is cognitively uncontainable, and additionally the alignment problem specifically refers to entities who's intelligence will also be uncontainable to us. Thus we need some form of iteration in order to run experiments to gather data on which alignment solutions work and which don't.
One-Shot - Once an AGI exists it better be aligned, or we're dead. We currently have no way to turn off an AGI, pause an AGI, or redirect an AGI. The alignment problem is by it's very nature a one-shot problem. We need an empirical solution, but cannot currently iterate.
And to make matters worse...
A Deadline - Alignment needs to be solved before an AGI is created, but the alignment and capabilities branches of research and development are independent. It's like we're crawling along the axes of the Orthogonality thesis, where we are getting better and better at shaping intelligence, while hardly moving an inch in shaping motivations.
This is not how science is normally done!
Normally you run endless experiments, have all the time in the world, and only financial and prestige incentives to win or lose. Sometimes there is a problem where many lives are at stake like the Manhattan Project, COVID, or climate change. But still, that is not all of humanity, and being slower can mean more humans dying, yet that is not nearly as bad as all humans dying.
So what do we do?
We need alignment research to be faster and of higher predictive value. We need to navigate the unknown knowledge graph more quickly and with fewer missteps. And maybe that will still not be enough, but I suspect it's the direction in which "enough" can be found.
My proposal is thus that we need an expanded form of the scientific method and more rigorous methods to test and teach it. Currently, I'm guessing that this expansion consists of adding the properties "security mindset", "superforecaster", and "speed-learning".
Collective Search Algorithm
Once we have every individual researcher running a more optimal search algorithm in their minds, we can further optimize our collective search algorithm by improving the coordination of the researchers across unknown knowledge space. What properties would such an optimal coordination have?
- Encoding - Finding an encoding for unknown knowledge space allows us to map out where new knowledge nodes may be found or solution paths may be discovered.
- Distribution - Tracking the distribution of researchers across unknown knowledge space will allow us to see what areas are overpopulated and which may be comparatively neglected.
- Query-able - Ensuring the knowledge space encoding and the researcher distribution are query-able across multiple dimensions will allow researchers to easily connect with each other based on similarity of research topics and properties of researchers. This may lead to more effective research collaborations.
Achieving the Distribution and Query-able properties will presumably hinge on improving coordination tools between researchers (e.g., journals, search engines, research databases, etc). Encoding knowledge space in a searchable format would be more in the order of a paradigm shift for how research is done. It may not be achievable at all as it relies on discovering the underlying structure of knowledge space in such a way that we can predict where solution paths to problems may be found. However, I'd like to explore the question nonetheless, and I have one naive baseline proposal that I think is better than nothing, but far from optimal:
New knowledge nodes are discovered through expansion of existing nodes or recombining two or more existing nodes. Thus we can naively list all knowledge nodes related to alignment (proposed solutions, existing concepts, and well-defined problems) and invite researchers to explore the combination of any two nodes. The outcomes can be visualized in a table listing all current knowledge nodes as both row and column headers. Every cell in the table denotes one possible combination. Some combinations will be clearly dead ends, while other combinations may already be heavily researched. It's a naive and imperfect encoding of unknown knowledge space, but it does give us some limited access to the desired properties mentioned above:
- Encoding - It offers an overview of where new research directions may be found, though imperfectly and incompletely.
- Distribution - You can clearly visualize the number of researchers in each cell of the table.
- Query-Able - If you link this knowledge space encoding to a database of researchers then researchers and knowledge nodes can be queried for potentially high value collaborations.
Many senior (AIS) researchers probably already have an implicit map of unknown knowledge space in their mind (though it is likely trimmed to exclude low value areas). Formalizing this encoding in a tool that people can use to navigate unknown knowledge space offers mostly benefits for junior and medium experience researchers. Specifically, it allows new researchers to get an overview of where most of the work is happening and what areas may be promising. Second, it offers a structure to organize researcher databases around. Third, the 2 dimensional table could potentially be used to organize researcher publications, so the known knowledge base is coordinated more rigorously.
Now the encoding is too simplistic -- It will take a lot of upkeep, doesn't encode more complex relationships between knowledge nodes, and the dimensionality is far too low. Yet, it illustrates the type of coordination mechanics I think we should be exploring.
So what now?
But before I explore that, I think I need to determine if my native research algorithm is anywhere near optimal (and improve it if it is not). Specifically, I want to test myself on the three properties that I suspect are necessary for predictive and fast AIS research: Security mindset, superforecasting, and speed-learning. I will start by testing myself on security mindset and speed-learning by writing a critique on the OpenAI alignment plan. If I write a good critique then I will hopefully get feedback from OpenAI and MIRI, who seem to be relatively far apart in 'alignment paradigm space', and thus offer high value calibration data for my development. If I write a bad critique, then that's probably data in itself too -- And a trip back to the drawing board.
I'm not sure what this property looks like, and maybe it's already subsumed in the current scientific method. Either way, it's the property of navigating known knowledge space more quickly and efficiently, and is thus a meta-learning property. Conceptually it would be a subskill of audidactism.
True in some sense, but misleading, insofar as "empiricism" usually brings to mind experiments. In practice, people get far, far more bits from observation of the day-to-day world around them than from experiments. Even in experiments, I think most of the value is usually from observing lots of stuff, more than from carefully controlling things.
Careful here. There's one thing usually called "the scientific method", which is basically iteratively coming up with models, testing them experimentally, then updating the models based on the results. If you actually go look at how science is practiced, i.e. the things successful researchers actually pick up during PhD's, there's multiple load-bearing pieces besides just that. Some examples include:
Note that a much simpler first-pass on all these is just "spend a lot more time reading others' work, and writing up and distilling our own". Key background idea here is that the internet has already dramatically reduced the cost of finding or sharing information, and the internet is relatively new, so most people probably have not yet increased their information consumption and production as much as would be optimal given the much-reduced cost.
I think I mostly agree with you but have the "observing lots of stuff" categorized as "exploratory studies" which are badly controlled affairs where you just try to collect more observations to inform your actual eventual experiment. If you want to pin down a fact about reality, you'd still need to devise a well-controlled experiment that actually shows the effect you hypothesize to exist from your observations so far.
I agree, but if people were both good at finding necessary info as an individual and we had better tools for coordinating (e.g.,finding each other and relevant material faster) then that would speed up research even further. And I'd argue that any gains in speed of research is as valuable as the same proportional delay in developing AGI.
This reminded me of a technique I occasionally use to explore a new topic area via some version of “graph search”. I ask LLMs (or previously google) “what are topics/concepts adjacent to (/related to/ similar to) X”. Recursing, and reading up on connected topics for a while, can be an effective way of getting a broad overview of a new knowledge space.
Optimising the process for AIS research topics seems like it could be valuable. I wonder whether a tool like Elicit solves this (haven’t actually tried it though).
That makes a lot of sense! And was indeed also thinking of Elicit