AI for epistemics: the good, the bad and the ugly

owencb; rosehadshar

Intro

For better or worse, AI could reshape the way that people work out what to believe and what to do. What are the prospects here?

In this piece, we’re going to map out the trajectory space as we see it. First, we’ll lay out three sets of dynamics that could shape how AI impacts epistemics (how we make sense of the world and figure out what’s true):

The good: there’s huge potential for AI to uplift our ability to track what’s true and make good decisions
The bad: AI could also make the world harder for us to understand, without anyone intending for that to happen
The ugly: malicious actors could use AI to actively disrupt epistemics

Then we’ll argue that feedback loops could easily push towards much better or worse epistemics than we’ve seen historically, making near-term work on AI for epistemics unusually important.

The stakes here are potentially very high. As AI advances, we’ll be faced with a whole raft of civilisational-level decisions to make. How well we’re able to understand and reason about what’s happening could make the difference between a future that we’ve chosen soberly and wisely, and a catastrophe we stumble into unawares.

The good

“If I have seen further, it is by standing on the shoulders of giants.” (Isaac Newton)

There are lots of ways that AI could help improve epistemics. Many kinds of AI tools could directly improve our ability to think and reason. We’ve written more about these in our design sketches, but here are some illustrations:

Tools for collective epistemics could make it easy to know what’s trustworthy and reward honesty, making it harder for actors to hide risky actions or concentrate power by manipulating others’ views.
- Imagine that when you go online, “community notes for everything” flag content that other users have found misleading, and “rhetoric highlighting” automatically flags persuasive but potentially misleading language. With a few clicks, you can see the epistemic track record of any actor, or access the full provenance of a given claim. Anyone who wants can compare state-of-the-art AI systems using epistemic virtue evals, which also exert pressure at the AI development stage.
Tools for strategic awareness could deepen people’s understanding of what’s actually going on around them, making it easier to make good decisions, keep up with the pace of progress, and steer away from failure modes like gradual disempowerment.
- Imagine that superforecaster-level forecasting and scenario planning are available on tap, and automated OSINT gives people access to much higher quality information about the state of the world.
Technological analogues to angels-on-the-shoulder, like personalised learning systems and reflection tools, could make decision-makers better informed, more situationally aware, and more in touch with their own values.
- Imagine that everyone has access to high-quality personalised learning, automated deep briefings for high-stakes decisions, and reflection tools to help them understand themselves better. In the background, aligned recommender systems promote long-term user endorsement, and some users enable a guardian coach system which flags any actions the person might regret taking in real time.

Structurally, AI progress might also enable better reasoning and understanding, for example by automating labour such that people have more time and attention, or by making people wealthier and healthier.

These changes might enable us to approach something like epistemic flourishing, where it’s easier to find out what’s true than it is to lie, and the world in most people’s heads is pretty similar to the world as it actually is. This could radically improve our prospects of safely navigating the transition to advanced AI, by:

Helping us to keep pace with the increasing speed and complexity of the situation, so we’re able to make informed and timely decisions.
Ensuring that key decision-makers don’t make catastrophic unforced errors through lack of information or understanding.
Making it harder for malicious actors to manipulate the information environment in their favour to increase their own influence.

A Philosopher Lecturing on the Orrery, a painting by Joseph Wright of Derby. It depicts a lecturer giving a demonstration of an orrery – a mechanical model of the Solar System – to a small audience.

What’s driving these potential improvements?

AI will be able to think much more cheaply and quickly than humans. Partly this will mean that we can reach many more insights with much less effort. Partly this will make it possible to understand things that are currently infeasible for us to understand (because it would take too many humans too long to figure it out).
AI can ‘know’ much more than any human. Right now, a lot of information is siloed in specific expert communities, and it’s slow to filter out to other places even when it would be very useful there. AI will be able to port and apply knowledge much more quickly to the relevant places.

The bad

“A wealth of information creates a poverty of attention.” (Herbert Simon)

AI could also make epistemics worse without anyone intending it, by making the world more confusing and degrading our information and processing.

There are a few different ways that AI could unintentionally weaken our epistemics:

The world gets faster and more complex. As AI progresses, our information-processing capabilities are going to go up — but so is the complexity of the world. Technological progress could become dramatically faster than today, making the world more disorienting and harder to understand than it is today. If tech progress reaches fast enough speeds, it’s possible that we won’t be able to keep up, and even the best AI tools available won’t help us to see through the fog.
The quality of the information we’re interacting with gets worse, because of:
- Faster memetic evolution. As more and more content is generated by and mediated through AI systems working at machine speeds, the pace of memetic and cultural change will probably get a lot faster than it is today. As the pace quickens, memes which are attention-grabbing could increasingly outcompete those which are truthful.
- More difficult verification. This could happen through a combination of:
- - AI slop. In hard-to-verify domains, AI could massively increase the quantity of plausible-looking but wrong information, without also being able to help us to verify which bits are right.
  - AI-generated ‘evidence’. As the quality of AI-generated video, audio, images, and text continues to improve, it may become pretty difficult to tell which bits of evidence are real and which are spurious.
We get worse at processing the information we get, because:
- Our emotions get in the way. AI progress could be very disorienting, generate serious crises, and cause people a lot of worry and fear. This could get in the way of clear thinking.
- Using AI to help us with information processing degrades our thinking, via:
- - Adoption of low-quality AI tools for epistemics: In many areas of epistemics, it’s hard to say what counts as ‘good’. This makes epistemic tools harder to assess, and could lead to people trusting these tools either too much or too little. Inappropriately high levels of trust in epistemic tools could take various forms, including:
  - - First mover advantages for early but imperfect systems, which are then hard to replace with better systems because people trust the earlier systems more.
    - The use of epistemically misaligned systems, which aren’t actually truth-tracking but it’s not possible for us to discern that.
  - Fragmentation of the information environment: AI will make it easier to create content (potentially interactive content) that pulls people in and monopolises their attention. This could reduce attention available for important truth-tracking mechanisms, and make it harder to coordinate groups of people to important actions. In the extreme, some people might end up in effectively closed information bubbles, where all of their information is heavily filtered through the AI systems they interact with directly. The more fragmented the information environment becomes, the harder it could get for people to make sense of what’s happening in the world around them, and to engage with other people and other information bubbles.
  - Epistemic dependence: if people increasingly outsource their thinking to AI systems, they may lose the ability to think critically for themselves.

Allegory of Error by Stefano Bianchetti. An engraving depicting a blindfolded figure with donkey ears staggering forward holding a staff. — *, Stefano Bianchetti (1801)*

The ugly

“The ideal subject of totalitarian rule is not the convinced Nazi or the convinced Communist, but people for whom the distinction between fact and fiction (i.e., the reality of experience) and the distinction between true and false (i.e., the standards of thought) no longer exist.” (Hannah Arendt, The Origins of Totalitarianism)

We’ve just talked about ways that AI could make epistemics worse without anyone intending that. But we might also see actors using AI to actively interfere with societal epistemics. (In reality these things are a spectrum, and the dynamics we discussed in the preceding section could also be actively exploited.)

What might this look like?

Automated propaganda and persuasion: AI could be used to generate high-quality persuasive content at scale. This could take the form of highly tailored, well-written propaganda. If this content were then used as training data for next generation models, biases could get even more entrenched. Additionally, AI persuasion could come in the form of models which are subtly biased in a particular direction. Particularly if many users are spending large amounts of time talking to AI (e.g. AI companions), the persuasive effects could be much larger than is scalable today via human-to-human persuasion.
Using AI to undermine sense-making: AI could be used to generate high-quality content which casts doubt on institutions, individuals, and tools that would help people understand what’s going on, or to directly sabotage such tools. More indirectly, actors could also use AI to generate content which adds to complexity, for example by wrapping important information in complex abstractions and technicalities, and generating large quantities of very readable reports and news stories which distract attention.
Surveillance: AI surveillance could monitor people’s communications in much more fine-grained ways, and punish them when they appear to be thinking along undesirable lines. This could be abused by states, or could become a tool that private actors can wield against their enemies. In either case, the chilling effect on people’s thinking and behaviour could be significant.

The Card Sharp with the Ace of Diamonds, an oil-on-canvas painting by Georges de La Tour. It depicts a card game in which a young man is being fleeced of his money by the other players, including a card sharp who is retrieving the ace of diamonds from behind his back. — *, by Georges de La Tour (~1636-1638)*

But maybe this is all a bit paranoid. Why expect this to happen?

There’s a long history of powerful actors trying to distort epistemics,^[1] so we should expect that some people will be trying to do this. And AI will probably give them better opportunities to manipulate other people’s epistemics than have existed historically:

It’s likely that access to the best AI systems and compute will be unequal, which favours abuse.
If people end up primarily interfacing with the world via AI systems, this will create a big lever for epistemic influence that doesn’t exist currently. It could be much easier to influence the behaviour of lots of AI systems at once than lots of people or organisations.

It’s also worth noting that many of these abuses of epistemic tech don’t require people to have some Machiavellian scheme to disrupt epistemics or seek power for themselves (though these might arise later). Motivated reasoning could get you a long way:

Legitimate communications and advertising blur into propaganda, and microtargeting is already a common strategy.
It’s easy to imagine that in training an AI system, a company might want to use something like its own profits as a training signal, without explicitly recognising the potential epistemic effects of this in terms of bias.

So what should we expect to happen?

With all these dynamics pulling in different directions, should we expect that it’s going to get easier or harder for people to make sense of the world?

We think it could go either way, and that how this plays out is extremely consequential.

The main reason we think this is that the dynamics above are self-reinforcing, so the direction we set off in initially could have large compounding effects. In general, the better your reasoning tools and information, the easier it is for you to recognise what is good for your own reasoning, and therefore to improve your reasoning tools and information. The worse they are, the harder it is to improve them (particularly if malicious actors are actively trying to prevent that).

We already see this empirically. The Scientific Revolution and the Enlightenment can be seen as examples of good epistemics reinforcing themselves. Distorted epistemic environments often also have self-perpetuating properties. Cults often require members to move into communal housing and cut contact with family and friends who question the group. Scientology frames psychiatry’s rejection of its claims as evidence of a conspiracy against it.

And on top of historical patterns, there are AI-specific feedback loops that reinforce initial epistemic conditions:

Unlike previous information tech, AI has a tight feedback loop between content generated, and data used for training future models. So if models generate in/accurate content, future models are more likely to do so too.
How early AI systems behave epistemically will shape user expectations and what kinds of future AI behaviour there’s a market for.

There are self-correcting dynamics too, so these self-reinforcing loops won’t go on forever. But we think it’s decently likely that epistemics get much better or much worse than they’ve been historically:

One self-correcting mechanism historically has just been that it takes (human) effort to sustain or degrade epistemics. Continuing to improve epistemics requires paying attention to ways that epistemics could be eroded, and this isn’t incentivised in an environment that’s currently working well. Continuing to degrade epistemics requires willing accomplices — but the more an actor distorts things, the more that can galvanise opposition, and the fewer people may be willing to assist. By augmenting or replacing human labour with automated labour, AI could make it much cheaper to keep pushing in the same direction.
Another self-correcting mechanism is just that people and institutions adapt to new epistemic tech: as epistemics improve, deception becomes more sophisticated; and if epistemics worsen, people lose trust and create new mechanisms for assessing truth. But this adaptation happens at human speed, and AI will increasingly be changing the epistemic environment at a much faster pace. This creates the potential for self-reinforcing dynamics to drive to much more extreme places before adaptation has time to kick in.^[2]
There’s a limit to how good epistemics can get before hitting fundamental problems like complexity and irreducible uncertainty. But there seems to be a lot of room for improvement from where we’re currently standing (especially as good AI tools could help to handle greater amounts of complexity), and it would be a priori very surprising if we’d already reached the ceiling.
There’s also a limit to how bad epistemics can get: people aren’t infinitely suggestible, and often there are external sources of truth that limit how distorted beliefs can get (ground truth, or what gets said in other countries or communities). But as we discussed above, access to ground truth and to other epistemic communities might get harder because of AI, so the floor here may lower.

Given the real chance that we end up stuck in an extremely positive or negative epistemic equilibrium, our initial trajectory seems very important. The kinds of AI tools we build, the order we build them in, and who adopts them when could make the difference between a world of epistemic flourishing and a world where everyone’s understanding is importantly distorted. To give a sense of the difference this makes, here’s a sketch of each world (among myriad possible sketches):

In the first world, we basically understand what’s going on around us. It’s not like we can now forecast the future with perfect accuracy or anything — there’s still irreducible uncertainty, and some people have better epistemics tools than others. But it’s gotten much cheaper to access and verify information. Public discourse is serious and well-calibrated, because epistemic infrastructure has made it quite hard to deceive or manipulate people — which in turn incentivises honesty. AI-assisted research and synthesis mean that knowledge which used to be siloed in specialist communities is now accessible and usable by anyone who needs it. And governments are able to make much more nuanced decisions far faster than they are today.
In the second, it’s no longer really possible to figure out what’s going on. There’s an awful lot of persuasive but low-quality AI content around, some of it generated with malicious intent. In response to this, people withdraw into their own AI-mediated epistemic bubbles — and unlike today’s filter bubbles, these can be comprehensive enough that people rarely encounter friction with outside perspectives at all. Meanwhile, companies and nations with a lot of compute find it pretty easy to distract the public’s attention from anything that would be inconvenient, and to outmaneuver the many actors who are trying to hold them to account. But their own reasoning also gets degraded by all this information pollution, as their AI systems are trained on the same corrupted public information.^[3] Even the people who think they’re shaping the narrative are increasingly unable to see clearly.

The world we end up in is the world from which we have to navigate the intelligence explosion, making decisions like how to manage misaligned AI systems, whether to grant AI systems rights, and how to divide up the resources of the cosmos. How AI impacts our epistemics between now and then could be one of the biggest levers we have on navigating this well.

Things we didn’t cover

Whose epistemics?

We mostly talked about AI impacts on epistemics in general terms. But AI could impact different groups’ epistemics differently — and different groups’ epistemics could matter more or less for getting to good outcomes. It would be cool to see further work which distinguishes between scenarios where good outcomes require:

Interventions that raise the epistemic floor by improving everyone’s epistemics.
Interventions that raise the ceiling by improving the epistemics of the very clearest thinking.

‘Weird’ dynamics

We focused on how AI could impact human epistemics, in a world where human reasoning still matters. But eventually, we expect more and more of what matters for the outcomes we get will come down to the epistemics of AI systems themselves.

The dynamics which affect these AI-internal epistemics could therefore be enormously important. But they could look quite different from the human-epistemics dynamics that have been our focus here, and we didn’t think it made sense to expand the remit of the piece to cover these.

Thanks to everyone who gave comments on drafts, and to Oly Sourbut and Lizka Vaintrob for a workshop which crystallised some of the ideas.

This article was created by Forethought. Read the original on our website.

^{^}
Think of things like:
- Propaganda states like Nazi Germany and the USSR.
- Corporate lobbying like the tobacco and sugar lobbies and climate science doubt campaigns.
- CIA operations to spread doubt and confusion.
^{^}
Though it’s possible that this dynamic will be more pronounced for epistemics getting extremely bad than for them getting extremely good. Consider these two very simplistic sketches:
1. People start living in increasingly closed AI filter bubbles. Institutions are slow to adopt similar bubbles at a corporate level, but they also don’t have a mandate to change what their employees are doing. People’s filter bubbles tend to be pretty correlated with the people they work and interact with, so institutions end up with pretty distorted pictures of what’s going on even though they don’t actively start using harmful tech. Government regulation is too slow and reactive to stop this from happening.
2. People start to use provenance tracing and rhetoric highlighting by default when browsing, in response to an increasingly polarised memetic environment. There is adaptation to this — politicians start using subtler language and so on. But the net effect is still strongly positive: it’s hard to fake provenance, and removing overt rhetoric is already a big win, even if it means that more slippery language proliferates.
In the first sketch, it’s straightforwardly the case that adaptive mechanisms are too slow. In the latter, it’s more that the tech is inherently defence-favoured.
We haven’t explored this area deeply, and think more work on this would be valuable.
^{^}
Alternatively, these elites might retain very good epistemics for themselves, and choose to indefinitely maintain a situation where everyone else has a very distorted understanding, to further their own ends. It’s unclear to us which of these scenarios is more likely or concerning.

[-]Wei Dai2mo50

AI will be able to think much more cheaply and quickly than humans. Partly this will mean that we can reach many more insights with much less effort. Partly this will make it possible to understand things that are currently infeasible for us to understand (because it would take too many humans too long to figure it out).

AI can ‘know’ much more than any human. Right now, a lot of information is siloed in specific expert communities, and it’s slow to filter out to other places even when it would be very useful there. AI will be able to port and apply knowledge much more quickly to the relevant places.

I think 5 or 10 years ago it was reasonable to hope for this (and some people explicitly did, e.g. Paul Christiano's IDA assumed that you could train a base model to reason like a human and reach arbitrary intelligence/capabilities levels by iterating it enough times), but today we already have AIs that can think much more cheaply and quickly than humans, and know a lot more than any human, but outside of certain domains like coding and math (where RLVR is possible), this is proving surprisingly unhelpful. For example, I can't take a post or paper in a field that I'm not familiar with, ask an AI to think a lot about it and tell me if its conclusions are reasonable or not, and get a trustworthy answer back. (And it seems like this has been improving at an imperceptible rate, if at all, while math/coding makes leaps and bounds.)

From my perspective the world is already going down my more pessimistic scenario, of AI differentially accelerating fields with fast/cheap feedback signals, like math, coding, manipulating humans (e.g. via sycophancy and plausible-sounding arguments), while not helping much with fields like philosophy, long-horizon strategy, even soft science fields like economics, that lack such signals. The default AI development trajectory at this point probably involves expanding RLVR training to all the fields where it's feasible, like hard sciences, technology R&D, short-horizon agency, while other areas lag behind.

Do you agree with this view, or perhaps have your own explanation of why AI reasoning/knowledge has been of rather limited use so far, and why that might change before it's too late? (Or perhaps think AI reasoning is generally more useful/helpful than it appears to me?)

I guess I do still have some hope that the current AI trajectory is actually a dead end. I.e., maybe RLVR won't work so well outside of math/coding, and when we go back to the drawing board and come up with actual AGI, it will be able to reason in these low-feedback domains at least as well as humans. Are you mostly thinking along these lines?

[-]Seth Herd1mo20

I have specific reasons to expect LLM general reasoning to improve. They are behind on metacognition, but they have been improving and there are multiple routes people are trying to catch this up.

Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities.

[-]Oliver Sourbut2mo30

I appreciate this discussion a lot. Two things which stand out to me as deserving more emphasis.

First though, quickly framing 'good epistemic outcomes' as something like a product of 'people trying to understand clearly' and 'people can do that effectively'. (Of course these are interrelated, because people's willingness is obviously affected by the practicalities - more on that in point 2.)

OK, the things:

It looks to me like most of the object-level task of collective epistemics is the checking and generally piecing together good 'secondary research' (broadly construed). i.e. looking at provenance, tracking the evidence and reasoning dependencies for a claim, proactively gathering the best arguments for and against, reasons to downweight certain testimony etc.
- Why? Almost all our information about our environment beyond our direct sensory access is mediated through highly iterated message passing, reinterpretation, aggregation, and so on - especially in the heights of science and the depths (!) of political/influence goings-on
- AI enables this (The Good) not so much (directly) by 'knowing' more or having 'more insights', but rather by hugely expanding the availability of clerical checking, tracing, and knowledge mapping work!
- You kind of talk about this in the collective epistemics discussion, but I think it warrants more
Most of the overall task of collective epistemics may be in the motivating i.e. having more people more of the time actually trying to understand things with accuracy, rather than retreating into one or other alternative cognitive mode
- The usual label I use for alternative cognitive modes is 'tribal cognition', where most of what's said and recounted (and even believed), especially (but not even only) about what's outside of the immediate sensory environment, is in service of building and maintaining allegiances and coalitions
- When is 'tribal cognition' incentivised? I don't fully know, but it has to do with
  - When people are/feel threatened, they reach for affiliations which offer (perhaps passing or merely apparent) security
    - Abusers can play on this by a combination of bigging up threats and presenting as a effective and sympathetic
  - When the epistemic environment is difficult true perception is more difficult and less rewarded
    - Abusers can push this. In politics: flood the zone, firehose of falsehoods, FUD. In science: p-hacking, importance-hacking, conflating/obscuring methodologies.
  - Generally adding noise and more convincing fake content undermines The Good above, the ability to check and trace, not by making people believe the fake stuff but by making them correctly recognise that it's hard to tell at all (thus 'retreat')
  - Certain coalition norms can encourage epistemic insularity and discourage (genuine) scrutiny
- I think you're touching on this in The Ugly, 'undermine sense-making'. To me it's possibly 'most of the problem'! Or at least, understanding under what conditions people mobilise one or other cognitive intents in sensemaking, and how those conditions can be influenced is a really big part of the picture here.

19