Great post! It's been almost a year since this was posted so I was curious if anyone has worked on these questions:
- Do you get any weird results from the pre-training data not being IID? Does this compromise capabilities in practice? Or does it lead to increased capabilities because the model cannot lean as much on memorization when it’s constantly getting trained on a previously-unseen future?
What if you want to run multiple epochs?[21] Then you have a conflict between wanting to fully update on the old data before you see new data vs. wanting to maximally spread out the points in time at which you repeat training data. How severe is this conflict? Are there any clever methods that could reduce it?
I did a quick lit review and didn't find much. Here's what I did find (not perfectly related to the above questions, though).
So, has anyone pursued the two quoted questions above? Super curious if anyone has good results!
I don't know of any work on these unfortunately. Your two finds look useful, though, especially the paper — thanks for linking!
This post gives my personal take on “AI for epistemics” and how important it might be to work on.
Some background context:
AI capabilities are advancing rapidly and I think it’s important to think ahead and prepare for the possible development of AI that could automate almost all economically relevant tasks that humans can do.[1]
So: How can we affect AI to contribute to better epistemic processes? When looking at concrete projects, here, I find it helpful to distinguish between two different categories of work:
Working to increase AIs’ epistemic capabilities, and in particular, differentially advancing them compared to other AI capabilities. Here, I also include technical work to measure AIs’ epistemic capabilities.[2]
I’d be very excited about thoughtful and competent efforts in this second category. However, I talk significantly more about efforts in the first category, in this post. This is just an artifact of how this post came to be, historically — it’s not because I think work on the second category of projects is less important.[3]
For the first category of projects: Technical projects to differentially advance epistemic capabilities seem somewhat more “shovel-ready”. Here, I’m especially excited about projects that differentially boost AI epistemic capabilities in a manner that’s some combination of durable and/or especially good at demonstrating those capabilities to key actors.
Durable means that projects should (i) take the bitter lesson into account by working on problems that won’t be solved-by-default when more compute is available, and (ii) work on problems that industry isn’t already incentivized to put huge efforts into (such as “making AIs into generally better agents”). (More on these criteria here.)
Two example projects that I think fulfill these criteria (I discuss a lot more projects here):
Separately, I think there’s value in demonstrating the potential of AI epistemic advice to key actors — especially frontier AI companies and governments. When transformative AI (TAI)[4] is first developed, it seems likely that these actors will (i) have a big advantage in their ability to accelerate AI-for-epistemics via their access to frontier models and algorithms, and (ii) that I especially care about their decisions being well-informed. Thus, I’d like these actors to be impressed by the potential of AI-for-epistemics as soon as possible, so that they start investing and preparing appropriately.
If you, above, wondered why I group “measuring epistemic capabilities” into the same category of project as “differentially advancing AI capabilities”, this is now easier to explain. I think good benchmarks could be both a relatively durable intervention for increasing capabilities, via inspiring work to beat the benchmark for a long time, and that they’re a good way of demonstrating capabilities.
In the rest of this post, I:
Here is an incomplete list of previous work on this topic:
I think there’s very solid grounds to believe that AI’s influence on epistemics is important. Having good epistemics is super valuable, and human-level AI would clearly have a huge impact on our epistemic landscape. (See here for more on importance.)
I also think there are decent plausibility arguments for why epistemics may be important: Today, we are substantially less epistemically capable than our technology allows for, due to various political and social dynamics which don’t all seem inevitable. And I think there are plausible ways in which poor epistemics can be self-reinforcing (because it makes it harder to clearly see what’s the direction towards better epistemics). And vice-versa that good epistemics can be self-reinforcing. (See here for more on path-dependence.)
That’s not very concrete though. To be more specific, I will go through some more specific goals that I think are both important and plausible path-dependent:
Let’s go through all of this in more detail.
I think there’s very solid grounds to believe that AI’s influence on epistemics is important.
Most AI takeover risk comes from “unforced errors”. A vast majority of powerful people don’t want AI to take over, but I think that many underestimate the risk. If I thought that people were going to have reasonable, well-calibrated beliefs about AI takeover risk, my subjective probability of AI takeover would more than halve.[5]
Most extinction risk comes from “unforced errors”. Just as above: A vast majority of powerful people don’t want extinction, and (I strongly suspect) would be capable of preventing the exceptions from being able to cause extinction.[6]
While less solid than the arguments for importance, I think there are decent plausibility arguments for why AI’s role in societal epistemics may be importantly path-dependent.
Feedback loops. People often choose to learn the truth when the choice is presented sufficiently clearly and unambiguously to them.[7] But with poor enough epistemic starting abilities, it won’t be clear what methods are more or less truth-seeking. So poor epistemic capabilities can be self-reinforcing, and vice versa.
Veil of ignorance. Conversely, people may be more enthusiastic to invest in novel, strong epistemic methods while they think that those methods will come to support their current beliefs (which would be the default, if they actually believe their current beliefs[8]). Whereas if they first learn that the methods are going to contradict their current beliefs, then they may oppose them.
Early investment. I can easily imagine both a future where frontier AI projects either (i) spend continuous effort on making their AIs strong forecasters and strategic analysts, and distributes those capabilities to other key institutions, or (ii) almost exclusively focus on using their AI systems for other tasks, such as technical R&D.[9] My being able to imagine both might just be a fact of my own ignorance — but it’s at least suggestive that both futures are plausible, and could come about depending on our actions.
Distribution of epistemic capabilities. Even without changing the pace at which powerful AI epistemics are developed, the question of whether important decisions are made with or without AI epistemic assistance may depend on how quickly different actors get access to those capabilities. It seems probably great for those epistemic capabilities to quickly be made widely available,[10] and if they’re powerful enough, it could be essential for multiple key players (such as AI companies, governments, and opposition parties) to get access to them at a similar time, so they can provide checks on each others’ new capabilities.
Now, let’s be more specific about what goals could be important to achieve in this area. I think these are the 3 most important instrumental goals to be working towards:
Let’s go through these in order.
Let’s talk about norms and practices for AIs as knowledge-producers. With this, I mean AIs doing original research, rather than just reporting claims discovered elsewhere. (I.e., AIs doing the sort of work that you wouldn’t get to publish on Wikipedia.)
Here are some norms/institutions/practices that I think would contribute to good usage of AI-as-knowledge-producers:
A government agency that is non-partisan (in practice and not only in name) and charged with using AI to inform government decision-making or to transparently review whether other knowledge-producing AIs in government are doing so in a truth-seeking manner.[11]
Now let’s talk about norms for AIs as communicators. This is the other side of the coin from “AI as knowledge producers”. I’m centrally thinking about AIs talking with people and answering their questions.
Here are some norms/institutions/practices that I think would enable good usage of AI-as-communicators:
You’re not allowed to pay other actors to program their AIs to be more positively inclined towards you.[12]
Finally: I want AIs to have high epistemic capabilities compared to their other capabilities. (Especially dangerous ones.) Here are three metrics of “epistemic capabilities” that I care about (and what “other capabilities” to contrast them with):
Asymmetric persuasion: How capable is AI at persuading people of true things vs. how capable is AI at persuading people of anything?[13]
It’s good for the former to be high relative to the latter, because I think it’s typically better for people to be convinced of true things than false things.
(The web of lies eval in Evaluating Frontier Models for Dangerous Capabilities tests for one version of this, where current models seem significantly better at persuading people of true things.[14])
I suspect (but am not confident) that it’s good for the former to be high relative to the latter, because I am scared of new technologies causing accidents (mainly AI takeover[15]) or being misused by the wrong people (mainly bioweapons), and think that better understanding could help reduce this risk.
There are exceptions to all four of these. But they hold often enough that I think they induce some important difference in what epistemic methods are most useful for “understanding” vs. “building”. Which may lead to some opportunities to differentially advance one over the other.[16]
In order for these distinctions to be decision-relevant, there needs to be ways of differentially accelerating one side of the comparison compared to the other. Here are two broad categories of interventions that I think have a good shot at doing so:
Having spelled-out what we want in the way of epistemic capabilities, practices for AI-as-knowledge-producers, and AI-as-communicators: Let’s talk about how we can achieve these goals. This section will talk about broad guidelines and heuristics, while the next section will talk about concrete interventions. I discuss:
One useful distinction is between direct and indirect strategies. While direct strategies aim to directly push for the above goals, indirect strategies instead focus on producing demos, evals, and/or arguments indicating that epistemically powerful AI will soon be possible, in order to motivate further investment & preparation pushing toward the above goals.
My current take is that:
Direct, competent efforts on Good practices for AI-as-knowledge-producers[17] and Good practices for AI-as-communicators[18] seem great.
One possible path-to-impact from building and iteratively improving capabilities on “understanding”-loaded tasks is that this gives everyone an earlier glimpse of a future where AIs are very epistemically capable. This could then motivate:
The core advantage of the indirect approach is that it seems way easier to pursue than the direct approach.
Core questions about the indirect approach: Are there really any domain-specific demos/evals that would be convincing to people here, on the margin? Or will people’s impressions be dominated by “gut impression of how smart the model is” or “benchmark performance on other tasks” or “impression of how fast the model is affecting the world-at-large”? I feel unsure about this, because I don’t have a great sense of what drives people’s expectations, here.
A more specific concern: Judgmental forecasting hasn’t “taken off” among humans. Maybe that indicates that people won’t be interested in AI forecasting? This one I feel more skeptical of. My best-guess is that AI forecasting will have an easier time of becoming widely adopted. Here’s my argument.
I don’t know a lot of why forecasting hasn’t been more widely adopted. But my guess would be that the story if something like:
For AIs, these problems seem smaller:
Overall, I feel somewhat into “indirect” approaches as a path-to-impact, but only somewhat. But it at least seems worth pursuing the most leveraged efforts here: Such as making sure that we always have great forecasting benchmarks and getting AI forecasting services to work with important actors as soon as (or even before) they start working well.
It seems straightforward and scalable to boost epistemic capabilities in the short run. But I expect a lot of work that leads to short-run improvements won’t matter after a couple of year. (This completely ruins your path-to-impact if you’re trying to directly improve long-term capabilities — but even if you’re pursuing an indirect strategy, it’s worse for improvements to last for months than for them to last for years.)
So ideally, we want to avoid pouring effort into projects that aren’t relevant in the long-run. I think there’s two primary reasons for why projects may become irrelevant in the long run: either due to the bitter lesson or due to other people doing them better with more resources.
That said, even if we mess this one up, there’s still some value in projects that temporarily boost epistemic capabilities, even if the technological discoveries don’t last long: The people who work on the project may have developed skills that let them improve future models faster, and we may get some of the indirect sources of value mentioned above.
Ultimately, key guidelines that I think are useful for this work are:
To better understand which ones of today’s innovations will be more/less helpful for boosting future epistemics, it’s helpful to try to envision what the systems of the future will look like. In particular: It’s useful to think about the systems that we especially care about being well-designed. For me, these are the systems that can first provide a very significant boost on top of what humans can do alone, and that get used during the most high-stakes period around TAI-development.
Let’s talk about forecasting in particular. Here’s what I imagine such future forecasting systems will look like:
Medium-horizon forecasters can reference all kinds of evidence, including the forecasts of short-horizon forecasters (we should have good statistics on how reliable these are).[19]
For long-horizon forecasting (e.g. what will happen in >1 year) we won’t have any ground-truth data to train on, so we’ll have to rely on human feedback.[20] In order to know what kind of feedback to give, here, we’ll want to use medium-horizon forecasting as a “lab” to test different hypotheses about what sort of AI-human interactions tend to lead to accurate forecasts, and what types of arguments tend to work well in practice.
In order to generalize sensibly from the medium-horizon-forecasting case, we’ll want the learnings from this to be as human-interpretable as possible. E.g. “arguments from analogy tends to work well/poorly”, not “this 100k word long prompt tends to give good results, and no human can understand why”.
Long-horizon forecasters can reference all kinds of evidence, including the forecasts of short- and medium-horizon forecasters, insofar as they’re relevant.
When using medium-horizon forecasting as a “lab”: We’ll want to run both (i) studies where we try to get as good forecasting abilities as we can, including by relying substantially on good generalization from AIs, and (ii) studies where a red-team tries to make the AIs maximally subtly misleading, and see whether humans who are getting AI advice can notice this, or whether they get tricked into believing terrible forecasts.
If the latter tests lead to humans making terrible forecasts, then we should assume that scheming AIs would be able to mislead us about both medium-term and long-term forecasts. (And probably also short-term forecasts in recognizably rare, high-stakes situations.)
C.f. control evaluations.
Now, let’s talk about concrete projects for differentially advancing epistemic capabilities, and how well they do according to the above criteria and vision.
Here’s a summary/table-of-contents of projects that I feel excited about (no particular order). More discussion below.
Effort to provide AI forecasting assistance (or other ambitious epistemic assistance) to governments is another category of work that I’d really like to happen eventually. But I’m worried that there will be more friction in working with governments, so that it’s better to iterate outside them first and then try to provide services to them once they’re better. This is only a weakly held guess, though. If someone who was more familiar with governments thought they had a good chance of usefully working with them, I would be excited for them to try it.
In the above paragraph, and the above project titles, I refer to AI forecasting or “other ambitious epistemic assistance”. What do I mean by this?
Now for more detail on the projects I’m most excited about.
What if you want to run multiple epochs?[21] Then you have a conflict between wanting to fully update on the old data before you see new data vs. wanting to maximally spread out the points in time at which you repeat training data. How severe is this conflict? Are there any clever methods that could reduce it?
For really long-range experiments (where we avoid spoiling AIs on the past 100+ years) we would need to be able to do pretraining with mostly synthetic data. “How to usefully pre-train models on synthetic data) is not something I recommend working on, because I think it would be very useful for AI capabilities. So I expect capabilities researchers to be good at exploring it on their own.[22]
However, it might be useful to consider how you would prevent leaking information from the present if you could usefully pre-train models on synthetic data.
In particular, the synthetic data would probably be constructed by models that have a lot of knowledge about the present. So you would have to prevent that knowledge from leaking into the synthetic data.
(This research project may be easier to do once we understand more about good methods of training on synthetic data. I’m personally not sure what the SOTA is, here.)
The development of AI systems with powerful epistemic capabilities presents both opportunities and significant challenges for our society. Transformative AI will have a big impact on our society’s epistemic processes, and how good or bad this impact is may depend on what we do today.
I started out this post by distinguishing between efforts to differentially increase AI capabilities and efforts to enable the diffusion and appropriate trust of AI-discovered information. While I wrote a bit about this second category (characterizing it as good norms & practices for AI as knowledge producers and communicators), I will again note that the relative lack of content on it doesn’t mean that I think it’s any less important the first category.
On the topic of differentially increasing epistemic AI capabilities, I’ve argued that work on this today should (i) focus on methods that will complement rather than substitute for greater compute budgets, (ii) prioritize problems that industry isn’t already trying hard to solve, and (iii) be especially interested to show people what the future have in store by demonstrating what’s currently possible and prototyping what’s yet to come. I think that all the project ideas I listed do well according to these criteria, and I’d be excited to see more work on them.
Personally, I focus a lot on the possibility of this happening within the next 10 years. Because I think that’s plausible, and that our society would be woefully underprepared for it. But I think this blog post is relevant even if you’re planning for longer timelines.
I explain why below.
Feel free to reach out to me via DM or email at [my last name].[my first name]@gmail.com if you’re considering working on this and would be interested in my takes on what good versions could look like.
Here, I’m using a definition of “transformative AI” that’s similar to the one discussed in this note.
Other than underestimates of AI takeover risk, another significant reason I’m worried about AI takeover is AI races where participants think that the difference in stakes between “winning the race” and “losing the race” is on the same scale as the difference in stakes between “losing the race” and “AI takeover”. Assuming that no important player underestimated the probability of AI takeover, I expect this sort of race to happen between nation states, because if a state thought there was a significant probability of AI takeover, I would expect them to stop domestic races. On the international scene, it’s somewhat less obvious how a race would be stopped, but I’m decently optimistic that it would happen if everyone involved estimated, say, ≥20% probability of AI takeover.
Even for extinction-risk that comes from “rational” brinksmanship, I suspect that the world offers enough affordances that countries could find a better way if there was common knowledge that the brinkmanship route would lead to a high probability of doom. It’s plausible that optimal play could risk a small probability of extinction, but I don’t think this is where most extinction-risk comes from.
I think there’s two mutually reinforcing effects, here. One is that people may try to learn the truth, but make genuine mistakes along the way. The other is that people may (consciously or sub-consciously) prefer to believe X over Y, and the ambiguity in what’s true gives them enough cover to claim to (and often actually) believe X without compromising their identity as a truth-seeker. Note that there’s a spectrum, here: Some people may be totally insensitive to what evidence is presented to them while some people are good at finding the truth even in murky areas. I think most people are somewhere in the middle.
Though this has exceptions. For example, Alex may already be skeptical of an existing epistemic method M’s ability to answer certain types of questions, perhaps because M contradicts Alex’s existing beliefs on the topic. If a new epistemic method is similar to M, then Alex may suspect that this method, too, will give unsatisfying answers on those questions — even if it looks good on the merits, and perhaps even if Alex will be inclined to trust it on other topics..
I don’t think this would permanently preclude companies from using their AIs for epistemic tasks, because when general capabilities are high enough, I expect it to be easy to use them for super-epistemics. (Except for some caveats about the alignment problem.) But it could impose delays, which could be costly if it leads to mistakes around the time when TAI is first developed.
If necessary: After being separated from any dangerous AI capabilities, such as instructions for how to cheaply construct weapons.
One analogy here is the Congressional Budget Office (CBO). The CBO was set up in the 1970s as a non-partisan source of information for Congress and to reduce Congress’ reliance on the Office of Management and Budget (which resides in the executive branch and has a director that is appointed by the currently sitting president). My impression is that the CBO is fairly successful, though this is only based on reading the Wikipedia page and this survey which has >30 economists “Agree” or “Strongly agree” (and 0 respondents disagree) with “Adjusting for legal restrictions on what the CBO can assume about future legislation and events, the CBO has historically issued credible forecasts of the effects of both Democratic and Republican legislative proposals.”
I.e.: It would be illegal for Walmart to pay OpenAI to make ChatGPT occasionally promote/be-more-positive on Walmart. But it would be legal for Walmart to offer their own chatbot (that told people about why they should use Walmart) and to buy API access from OpenAI to run that chatbot.
C.f. the discussion of “asymmetric” vs “symmetric” tools in Guided By The Beauty Of Our Weapons.
I was uncertain about whether this might have been confounded by the AIs having been fine-tuned to be honest, so I asked about this, and Rohin Shah says “I don't know the exact details but to my knowledge we didn't have trouble getting the model to lie (e.g. for web of lies).”
Which is an accident in the sense that it’s not intended by any human, though it’s also not an accident in the sense that it is intended by the AI systems themselves.
I think the most important differences here are 1 & 2, because they have big implications for what your main epistemic strategies are. If you have good feedback loops, you can follow strategies that look more like "generate lots of plausible ideas until one of them work" (or maybe: train an opaque neural network to solve your problem). If your problem can be boiled down to math, then it's probably not too hard to verify a theory once it's been produced, and you can iterate pretty quickly in pure theory-land. But without these, you need to rely more on imprecise reasoning and intuition trained on few data points (or maybe just in other domains). And you need these to be not-only good enough to generate plausible ideas, but good enough that you can trust the results.
Such as:
Write-ups on what type of transparency is sufficient for outsiders to trust AI-as-knowledge-producers, and arguments for why AI companies should provide it.
Write-ups or lobbying pushing for governments (and sub-parts of governments, such as the legislative branch and opposition parties) to acquire AI expertise. To either verify or be directly involved in the production of key future AI advice.
Evaluations testing AI trustworthiness on e.g. forecasting.
Such as:
Write-ups on what type of transparency is sufficient to trust AI-as-communicators, and arguments for why AI companies should provide it.
Setting up an independent organization for evaluating AI truthfulness.
Developing and advocating for possible laws (or counter-arguments to laws) about AI speech.
This could include asking short-horizon forecasters about hypothetical scenarios, insofar as we have short-term forecasters that have been trained in ways that makes it hard for them to distinguish real and hypothetical scenarios. (For example: Even when trained on real scenarios, it might be important to not give these AIs too much background knowledge or too many details, because that might be hard to generate for hypothetical scenarios.)
Significantly implemented via AIs imitating human feedback.
I.e., use each data point several times.
Indeed, it seems useful enough for capabilities that it might be net-negative to advance, due to shorter timelines and less time to prepare for TAI.