AI will be able to think much more cheaply and quickly than humans. Partly this will mean that we can reach many more insights with much less effort. Partly this will make it possible to understand things that are currently infeasible for us to understand (because it would take too many humans too long to figure it out).
AI can ‘know’ much more than any human. Right now, a lot of information is siloed in specific expert communities, and it’s slow to filter out to other places even when it would be very useful there. AI will be able to port and apply knowledge much more quickly to the relevant places.
I think 5 or 10 years ago it was reasonable to hope for this (and some people explicitly did, e.g. Paul Christiano's IDA assumed that you could train a base model to reason like a human and reach arbitrary intelligence/capabilities levels by iterating it enough times), but today we already have AIs that can think much more cheaply and quickly than humans, and know a lot more than any human, but outside of certain domains like coding and math (where RLVR is possible), this is proving surprisingly unhelpful. For example, I can't take a post or paper in a field that I'm not familiar with, ask an AI to think a lot about it and tell me if its conclusions are reasonable or not, and get a trustworthy answer back. (And it seems like this has been improving at an imperceptible rate, if at all, while math/coding makes leaps and bounds.)
From my perspective the world is already going down my more pessimistic scenario, of AI differentially accelerating fields with fast/cheap feedback signals, like math, coding, manipulating humans (e.g. via sycophancy and plausible-sounding arguments), while not helping much with fields like philosophy, long-horizon strategy, even soft science fields like economics, that lack such signals. The default AI development trajectory at this point probably involves expanding RLVR training to all the fields where it's feasible, like hard sciences, technology R&D, short-horizon agency, while other areas lag behind.
Do you agree with this view, or perhaps have your own explanation of why AI reasoning/knowledge has been of rather limited use so far, and why that might change before it's too late? (Or perhaps think AI reasoning is generally more useful/helpful than it appears to me?)
I guess I do still have some hope that the current AI trajectory is actually a dead end. I.e., maybe RLVR won't work so well outside of math/coding, and when we go back to the drawing board and come up with actual AGI, it will be able to reason in these low-feedback domains at least as well as humans. Are you mostly thinking along these lines?
I have specific reasons to expect LLM general reasoning to improve. They are behind on metacognition, but they have been improving and there are multiple routes people are trying to catch this up.
Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities.
I appreciate this discussion a lot. Two things which stand out to me as deserving more emphasis.
First though, quickly framing 'good epistemic outcomes' as something like a product of 'people trying to understand clearly' and 'people can do that effectively'. (Of course these are interrelated, because people's willingness is obviously affected by the practicalities - more on that in point 2.)
OK, the things:
It looks to me like most of the object-level task of collective epistemics is the checking and generally piecing together good 'secondary research' (broadly construed). i.e. looking at provenance, tracking the evidence and reasoning dependencies for a claim, proactively gathering the best arguments for and against, reasons to downweight certain testimony etc.
Most of the overall task of collective epistemics may be in the motivating i.e. having more people more of the time actually trying to understand things with accuracy, rather than retreating into one or other alternative cognitive mode
Intro
For better or worse, AI could reshape the way that people work out what to believe and what to do. What are the prospects here?
In this piece, we’re going to map out the trajectory space as we see it. First, we’ll lay out three sets of dynamics that could shape how AI impacts epistemics (how we make sense of the world and figure out what’s true):
Then we’ll argue that feedback loops could easily push towards much better or worse epistemics than we’ve seen historically, making near-term work on AI for epistemics unusually important.
The stakes here are potentially very high. As AI advances, we’ll be faced with a whole raft of civilisational-level decisions to make. How well we’re able to understand and reason about what’s happening could make the difference between a future that we’ve chosen soberly and wisely, and a catastrophe we stumble into unawares.
The good
There are lots of ways that AI could help improve epistemics. Many kinds of AI tools could directly improve our ability to think and reason. We’ve written more about these in our design sketches, but here are some illustrations:
Structurally, AI progress might also enable better reasoning and understanding, for example by automating labour such that people have more time and attention, or by making people wealthier and healthier.
These changes might enable us to approach something like epistemic flourishing, where it’s easier to find out what’s true than it is to lie, and the world in most people’s heads is pretty similar to the world as it actually is. This could radically improve our prospects of safely navigating the transition to advanced AI, by:
A Philosopher Lecturing on the Orrery, by Joseph Wright of Derby (1766)
What’s driving these potential improvements?
The bad
AI could also make epistemics worse without anyone intending it, by making the world more confusing and degrading our information and processing.
There are a few different ways that AI could unintentionally weaken our epistemics:
The ugly
We’ve just talked about ways that AI could make epistemics worse without anyone intending that. But we might also see actors using AI to actively interfere with societal epistemics. (In reality these things are a spectrum, and the dynamics we discussed in the preceding section could also be actively exploited.)
What might this look like?
But maybe this is all a bit paranoid. Why expect this to happen?
There’s a long history of powerful actors trying to distort epistemics,[1] so we should expect that some people will be trying to do this. And AI will probably give them better opportunities to manipulate other people’s epistemics than have existed historically:
It’s also worth noting that many of these abuses of epistemic tech don’t require people to have some Machiavellian scheme to disrupt epistemics or seek power for themselves (though these might arise later). Motivated reasoning could get you a long way:
So what should we expect to happen?
With all these dynamics pulling in different directions, should we expect that it’s going to get easier or harder for people to make sense of the world?
We think it could go either way, and that how this plays out is extremely consequential.
The main reason we think this is that the dynamics above are self-reinforcing, so the direction we set off in initially could have large compounding effects. In general, the better your reasoning tools and information, the easier it is for you to recognise what is good for your own reasoning, and therefore to improve your reasoning tools and information. The worse they are, the harder it is to improve them (particularly if malicious actors are actively trying to prevent that).
We already see this empirically. The Scientific Revolution and the Enlightenment can be seen as examples of good epistemics reinforcing themselves. Distorted epistemic environments often also have self-perpetuating properties. Cults often require members to move into communal housing and cut contact with family and friends who question the group. Scientology frames psychiatry’s rejection of its claims as evidence of a conspiracy against it.
And on top of historical patterns, there are AI-specific feedback loops that reinforce initial epistemic conditions:
There are self-correcting dynamics too, so these self-reinforcing loops won’t go on forever. But we think it’s decently likely that epistemics get much better or much worse than they’ve been historically:
Given the real chance that we end up stuck in an extremely positive or negative epistemic equilibrium, our initial trajectory seems very important. The kinds of AI tools we build, the order we build them in, and who adopts them when could make the difference between a world of epistemic flourishing and a world where everyone’s understanding is importantly distorted. To give a sense of the difference this makes, here’s a sketch of each world (among myriad possible sketches):
The world we end up in is the world from which we have to navigate the intelligence explosion, making decisions like how to manage misaligned AI systems, whether to grant AI systems rights, and how to divide up the resources of the cosmos. How AI impacts our epistemics between now and then could be one of the biggest levers we have on navigating this well.
Things we didn’t cover
Whose epistemics?
We mostly talked about AI impacts on epistemics in general terms. But AI could impact different groups’ epistemics differently — and different groups’ epistemics could matter more or less for getting to good outcomes. It would be cool to see further work which distinguishes between scenarios where good outcomes require:
‘Weird’ dynamics
We focused on how AI could impact human epistemics, in a world where human reasoning still matters. But eventually, we expect more and more of what matters for the outcomes we get will come down to the epistemics of AI systems themselves.
The dynamics which affect these AI-internal epistemics could therefore be enormously important. But they could look quite different from the human-epistemics dynamics that have been our focus here, and we didn’t think it made sense to expand the remit of the piece to cover these.
Thanks to everyone who gave comments on drafts, and to Oly Sourbut and Lizka Vaintrob for a workshop which crystallised some of the ideas.
This article was created by Forethought. Read the original on our website.
Think of things like:
Though it’s possible that this dynamic will be more pronounced for epistemics getting extremely bad than for them getting extremely good. Consider these two very simplistic sketches:
In the first sketch, it’s straightforwardly the case that adaptive mechanisms are too slow. In the latter, it’s more that the tech is inherently defence-favoured.
We haven’t explored this area deeply, and think more work on this would be valuable.
Alternatively, these elites might retain very good epistemics for themselves, and choose to indefinitely maintain a situation where everyone else has a very distorted understanding, to further their own ends. It’s unclear to us which of these scenarios is more likely or concerning.