Great post! With institutional design, would you have any advice on making it less abstract and increasing the value of such a proposal?
I cannot help but shrug off the feeling that just about anyone can whip up a structure/design which considers a couple of the stakeholders - what would a design which moves the needle have/be able to do?
Good question. The big barriers here are coming up with a design that is better than what would be implemented by default, and getting it adopted. This likely requires a ton of knowledge about existing frameworks like this (e.g., FDA drug approval, construction permitting, export licensing, etc.). Without this knowledge I expect what somebody whips up to be mostly useless. Without this knowledge, I don't have particularly good ideas about how to improve on the default plan.
Sorry I don't have more details for you, I think doing this job well is probably extremely hard. It's like asking somebody to do better than the whole risk-assessment industry on this specific dimension. A lot of that industry doesn't have strong incentives to improve these systems, so maybe this is an inadequacy that is relatively tractable, however.
Help information aggregators scale during takeoff
This doesn't appear to me that neglected: most mainstream media seem to do a fine job at covering AI, as it seems to me.
On this topic, I think it's more important now to innovate (social) media itself with the newly available power of LLMs: build new kinds of media, personal recommendation engines, etc.
Education, Review, and Information Material
The obligatory reference here should be http://quantumleap.education/ (described in the last minutes of this podcast with Michael Webb).
Just today I wrote a comment with a very different angle on the question and some intersection with the areas listed in this post:
- Politics: see https://cip.org/, Audrey Tang's projects
- Social systems: innovative LLM/AI-first social networks that solve the social dilemma? (I don't have a good existing examples of such projects, though)
- Psychotherapy, coaching: see Inflection
- Economics: see Verses, One Project, the Gaia Consortium
- Epistemic infrastructure: see Subconscious Network, Ought, the Cyborgism agenda, Quantum Leap (AI safety edtech)
- Authenticity infrastructure: see Optic, proof-of-personhood projects
- Cybersec/infosec: see various AI startups for cybersecurity, trustoverip.org
A common response is that “evaluation may be easier than generation”. However, this doesn't mean evaluation will be easy in absolute terms, or relative to one’s resources for doing it, or that it will depend on the same resources as generation.
I wonder to what degree this is true for the human-generated alignment ideas that are being submitted LessWrong/Alignment Forum?
For mathematical proofs, evaluation is (imo) usually easier than generation: Often, a well-written proof can be evaluated by reading it once, but often the person who wrote up the proof had to consider different approaches and discard a lot of them.
To what degree does this also hold for alignment research?
There is an argument that evaluating AI models should be formalised, i.e., turned into verification: see https://arxiv.org/abs/2309.01933 (and discussion on Twitter with Yudkowsky and Davidad).
Tldr: We identify five areas of work that should be further investigated:
Summary
We outline and discuss five areas of work that seem neglected. We only shallowly review and discuss these areas. Our proposals should be seen as a recommendation to further investigate these questions and figure out whether they are actually worth pursuing and how to pursue them.
For each of the five work areas, we discuss what they are, what one could do here, why they are important, and what our key uncertainties are.
Help information aggregators scale during takeoff
What is it: In a world of widespread AI deployment, many things in the world may move much quicker, and existing information aggregators and collectors (e.g., journalists, Twitter/X, think tanks) will likely be overwhelmed and be less successful at doing their job. Figuring out how to help them or step in as a novel organization aimed at tracking important things in the world seems important.
What to do: Build AI tools for better information filtering/aggregation, build expertise in relevant areas (e.g., AI hardware, ML subfields, AI safety, biosecurity), get experience and connections by doing the version of this that currently exists (e.g., Epoch AI, journalism, intelligence agencies, AGI labs). Build AI Journalism as a subfield (like environmental journalism). Figure out if existing institutions can be changed to sufficiently do this or if a new institution is needed. We’re not too sure what else to do here.
Why is it impactful: This is currently extremely neglected. Some people casually talk about how deep-fakes and LLM-generated content/spam could be a big deal for the information landscape, but nobody seems to be taking this sufficiently seriously and preparing, let alone planning for explosive growth scenarios: the current world is wholly unprepared.
Key uncertainties: It’s unclear if we’re going to get explosive growth before or after we face most of the misalignment risk; if it’s after, then this is probably much less important to work on. If take-off is sufficiently slow, existing institutions will adapt.
Additional discussion: Good information is important for pretty much all decision making. Information provision might be much worse in a world with advanced AI systems. It is currently very difficult to track all of the important things happening in the world. This will get even more difficult as the pace of tech R&D accelerates due to automated researchers and AI tools speeding up human researchers. Solving this problem might include building a new institute for understanding what’s going on, and relaying this to relevant decision makers. This institute would broadly be trying to understand and keep track of what is happening in the world with regard to AI development; some things that would be in its purview: reading current ML and AI safety research and trying to predict what is upcoming in those fields, tracking the location of GPU clusters, keeping a database of major AI developers and key x-risk related information about them (the quality of their models, large training runs they do, internal sentiment around AI risk, etc.), keeping tabs on the use of AI in particularly high stakes domains like warfare and biology, understanding how various countries are reacting to AI development and governing risks.
The general idea is that things could be moving super fast once we get substantial research speedups, and there are not currently institutions that are prepared for this speedup and help to improve the world. Various institutes would serve as a point of information aggregation and distillation, being able to support AI labs, governments, and other key decision makers by having already developed methods for keeping track of what’s true and important in this whacky landscape. If the world is able to develop an international agency on AI as suggested by Karnofsky, such an agency might absorb or closely partner with such institutes.
This institute might benefit from hiring people with experience doing complex information aggregation in difficult domains, such as people with a traditional intelligence background, finance analysts who specialize in AI-related fields, or journalists. There would also likely be a need to hire experts with an excellent understanding of various parts of the AI and AI safety research ecosystems, from chip design to fine-tuning.
Research and Consulting on AI Deployment in Governments
What is it: Decision-making bodies such as the government are facing the increasingly important question of how to deploy advanced AI systems internally. This decision can make societal decision-making much better but also much worse. Research and advising of policymakers can help avert various failure modes of internal AI deployment.
What to do: Individuals could advise governments on integrating Large Language Models (LLMs) into their operations or research AI integration in government (e.g., if a team starts using LLMs to write their daily briefs, how does this change the frequency of false things making their way into the brief?, do people trust LLMs to be factual more than is the case?). We’re not sure what to do in this area, but we expect it ranges from field experiments in the workplace to experimental testing with new AI products. While the impact in the short term might not be that big, it would help immensely to build relevant knowledge, trust, and networks to have future opportunities to feed into such processes.
Why is it impactful: We expect that the success of advanced AI development and deployment partly depends on how well-informed government decisions will be. Outsourcing work to AI systems when it comes to such decisions will be a hard balancing act with many possible failure modes: one might outsource too much, too little, or the wrong work. Work to improve this process will likely be underprovided because
Therefore, we expect there to be a lot of potential to improve AI deployment policies for high-stakes decision making.
Key uncertainties: Tractability: Anecdotal evidence from a former consultant seems to suggest that consulting the government to do digitization seems to mostly fail. If digitization of government is such a hard problem, we may expect 1) governments to err on the side of not using LLMs or 2) the tractability of government advising to be low.
Crowdedness: Perhaps, public policy researchers, digitization of government researchers, and organizational psychology researchers will be working on this. However, those researchers may not be motivated to move from basic to applied science fast enough.
The magnitude of failure from improperly integrating AI with one’s workflow scales with the available AI capabilities: failing to integrate GPT-4 or integrating it too much likely results in a couple of percentage points difference in productivity. For future AI systems, failing to integrate them could mean losing huge amounts of potential value, and integrating them in a bad way could result in major losses.
Focusing on this problem is particularly important if government decision-making matters a lot, e.g., in short-timeline slow-takeoff worlds or if governments are needed to make the post-AGI transition go well. On the other hand, if the major hurdles to existential security are near-term technical research, this would be less important to work on.
Additional Discussion: As AIs improve at solving long-horizon tasks, there will be strong incentives to delegate more of our human workflows to these AIs. For some domains, this will be perfectly safe, but for high-stakes domains it could cause major harms from AIs making mistakes or pursuing misaligned goals. One wants to track how various key societal roles are outsourcing tasks to the AI systems. This could involve studying how they are using it and what the effects are, e.g., is there too much outsourcing, too little outsourcing, are they outsourcing the wrong things, losing human expertise in the process, doing the kind of outsourcing where there is still meaningful human control? These questions could be studied by social scientists and, e.g., public sector consultancies. Researchers could interview people, do more anthropological studies, study the effects of various automation tools, and share best-practices.
If one understands the key failure modes, one would be able to make the technology safer, enhance human decision making, and avoid enfeeblement, that society willingly gives up control. The research could inform the decision of AI labs and product teams, AI regulation, or, e.g., organizations, such as the public sector or executive could just implement the guidelines and best-practices internally. To contribute to this, one could do research on AI workflow integration, outsourcing in the public sector, provide literature summaries, or consult and inform key decision-making organizations directly on responsible AI usage.
Institutional Design for Evaluating Alignment Plans
What is it: Many AI governance plans involve a public agency evaluating the alignment plan for an advanced AI project. The agency then approves or rejects the training run proposal. Designing such an evaluation process seems hard. Current institutional designs are likely insufficient. Somebody should figure out what this process should even look like.
What to do: Study the benefits and drawbacks of existing safety/risk assessment frameworks, scope the problem of evaluating alignment plans, and build relevant connections and credibility. This is about designing an institutional process for evaluating alignment research, not doing the evaluation itself. Learn from best practices of how to aggregate information, evaluations, and criticisms from various stakeholders and experts. Study existing expert consulting processes of big institutions (e.g., the European Commission), their success and pitfalls.
Why is it impactful: The success of establishing such institutions relies on the process effectively distinguishing between better and worse alignment plans.
Key uncertainties: Work on designing this institution may be intractable, too abstract, or too novel, especially given the strong precedent of existing risk assessment frameworks. Perhaps interested individuals should just work on improving current regulatory processes regarding AI regulation on the margin.
Additional discussion: Most governance plans for transformative Artificial Intelligence involve a common procedure: the AGI development team proposes their alignment plan to a governing body (e.g., their plan for the deployment of a newly trained model, their plan for whether to automate research, their plan for training a bigger model), which then approves the plan, requests revisions, or denies it. But what should happen procedurally in this agency before it makes such a decision?
While we might not know right now which alignment plans will be effective, we can probably already think about designing the procedure. Existing risk and safety assessment frameworks or public consultations as done for legal reviews seem insufficient. What can we learn from them, and what would the ideal procedure look like? We think most processes are either easily gameable or do not aggregate most of the relevant information. An exploration could include: How public should the procedure be? Should open comments be invited? How should they be incorporated? Who assesses them? How should disagreements about the safety of the alignment plan be handled? What would such a back-and-forth look like? How would the final decision be made? Who holds that group accountable? How to handle conflicts of interest (e.g., relevant experts working at AGI labs)?
Evaluating automated alignment research
What is it: Much of the alignment work might be conducted by AI systems. OpenAI, for instance, has plans to develop automated alignment researchers. However, the efficacy of this hinges upon a robust evaluation framework of alignment work, which is currently lacking. This has been discussed before, but we want to signal boost it.
What to do: We don’t really know what needs to be done here. Perhaps preparing for this role might just look like doing alignment research oneself and trying to do lots of peer review.
Why is it impactful: Evaluating certain tasks, especially hard-to-measure ones, will remain a human task, potentially causing bottlenecks. Certain types of alignment work are likely difficult to evaluate, such as conceptual and framing work — as evidenced by strong disagreement among researchers about the value of different research.
While some evaluations might be straightforward (e.g., using language models for neuron interpretability or improving self-critique strategies) or sometimes AIs could support human evaluators (by doing replication or coming up with critiques), determining actual progress towards alignment remains a hard-to-measure evaluation target.
A common response is that “evaluation may be easier than generation”. However, this doesn't mean evaluation will be easy in absolute terms, or relative to one’s resources for doing it, or that it will depend on the same resources as generation.
This means it would have to be important to i) know what exactly should be done by the humans and ii) how this could be tracked such that firms are not corner-cutting here.
Key uncertainties: Is this action relevant now? Is there any way one can feasibly prepare now?
Education, Review, and Information Material
What is it: If timelines are short and the world will become much more aware of the possibility of transformative AI, then many key decision-makers will need to quickly familiarize themselves with alignment and AI existential safety questions and develop a really good understanding. For that, there needs to be a really good source for summaries and literature reviews.
What to do: In contrast, to the other work areas we outlined in this post, there already exists more work in this area: YouTube videos on the risks, policy reports for governments, explainers, intro talks for various audiences, literature reviews, e.g., on AI timelines.
Such work could be expanded, improved, and professionalized. For instance, reviews on topics such as takeoff dynamics, alignment challenges, development timelines, and alignment strategies, ranging from a quick read to an in-depth analysis, are useful.
Why is it impactful: Such resources play a crucial role in shaping key decision-making processes and public opinion. Given the multiplicity of threat models, the speculativeness and inherent uncertainty in AI development, and the political incentives for simplification and polarisation, good information material and education material might be even more important than for other problem areas.
Key uncertainties: For videos, reports, and other high-quality or wide-reaching mediums, we usually have some winner-takes-all dynamics where only the best material is useful. This should have some implications for who and how people should work on this and what should be done. Even if winner-take-all dynamics exist, it may be unclear ex-ante who the "winners" will be, so investing in many projects might still be useful.
Crowdedness: it seems like this work has expanded a lot in the last 6 months. It’s not clear how many low-hanging fruits there will be in the future.