Wiki AI

abramdemski

A thought that has been bouncing around in my head for the last couple years is wikipedia as a motivating metaphor for what LLMs should become. This is particularly in contrast to social media. I owe this idea to Tom Everitt, who conveyed it to me at an Agent Foundations conference at Wytham Abbey.

LLMs are """the new Wikipedia""" in some ways: it is a new tech-enabled knowledge source, which we can consult for anything from quick questions to long information-binge sessions. It has questions of reliability, similar to those Wikipedia faced at first. The advice teachers give about it is similar: "It can be a useful resource, but you should use it to find primary sources, rather than citing it directly."

However, there is a reasonable fear that LLMs will become (or are becoming) """the new social media""" instead: maximizing engagement, spreading misinformation, addicting users, etc. X and Meta are certainly trying to figure out how to do this. However, this looks like an extremely-not-good overall trajectory for machine intelligence.^[1]

How might LLMs be steered in a desirable direction? I'm going to take a somewhat cargo-cult strategy, imagining LLMs being closer to wikis in as many dimensions as possible.

Co-Agentic

One aspect is co-agency, as discussed in What, if not agency? This is closest to what Tom Everitt had in mind: Wikipedia empowers people, while social media addicts people. This intuition, formalized, may lead to better alignment/corrigibility targets. If you've got to aim superintelligence at a goal, then human empowerment seems safer than trying to directly specify human values; it keeps humans in control.^[2] More optimistically, the co-agency line of thinking could articulate a better alternative (to agentic systems oriented towards goals).

Navigable

A related dimension that might be relevant is paternal vs navigable formats. "The Algorithm" running social media chooses what you get to see in an opaque way, while Wikipedia focuses on easy navigation to the content you want. I'm not entirely clear on how to analogize this to LLMs, however. On the one hand, LLMs are very much an opaque algorithm deciding what to show you. On the other hand, in some sense they're intensely navigable; they show you whatever you ask.

One idea is moving more in the direction of Microscope AI. What if, when you ask a question, instead of concocting a summary for you, AI showed you statistics of what percentages and types of people might answer in specific ways? Base models are extremely good statistical models of human-written text -- they have that sort of information. A human-navigable view of the raw statistics like this feels a lot more informative and trustworthy than something fine-tuned for """truth""" by an opaque corporation.

Correctable

One thing LLMs lack, when contrasted with Wikipedia, is a good story for why it might become increasingly trustworthy over time. Wikipedia's problems can be corrected by anyone who notices them. Malicious """corrections""" can be filtered out.

I've fantasized for some time about an LLM interface with a heavy-duty correction tool:

When the AI does something you don't like or you think could be better, you can highlight the thing and go into a corrections wizard.
The corrections wizard guides you through several steps, something like:
- You can either correct the thing by typing yourself, or you can tell the AI what is wrong and it can suggest corrections to check its understanding.
- Next you can try to improve the system prompt, again either manually or AI-assisted, to instruct the AI to do the better thing in the first place next time.
  - The modified system prompt can take the form of a "skill" (as in Claude Skills), IE, a file that is fetched under specified conditions so that you're not taking up context with it all the time.
- Next, the modified system prompt can be tested against other scenarios to see if it has any undesirable consequences on the LLM's behavior. This piece of the interface helps with iterating versions of the prompt until it works well.
As you work with the AI the system prompt might eventually get too large, so you can fine-tune the model to bake in all the changes.
- There could be a nice UI for testing and revising prompts for deliberative fine-tuning, too, to help ensure this step goes well.

This certainly isn't a perfect vision, but it gestures in a direction of what Wiki AI might look like. This sort of interface might be used for personal Wiki AIs (maybe without the expensive fine-tuning component), and also for a collective project (similar to Wikipedia).

Collective

Maybe a Wikipedia-like AI has to be managed in a Wikipedia-like way, down to the detailed governance mechanisms Wikipedia uses for moderators (appropriately adapted).

Wikis came before Wikipedia, however. Maybe similarly, we have to first invent Wiki AI, and the -pedia comes later?

Academic

Here, I mean the ordinary academic virtues, chief among them being clear citation trails. Modern LLMs can search the web and provide citations that way, but it would be even nicer if relevant parts of the training data could be referenced in order to give some idea of how an LLM generated a particular response. (This might be too hard and not important enough, but I'm fantasizing here. The idea does seem to fit somewhat with my earlier comment about microscope AI.)

Is there any possibility of LLMs becoming a respectable academic publishing model? (At least in the way a well-written wiki could be?)

To put it a different way: can you envision someone taking responsibility for an LLM as an intellectual product (in a field other than machine learning)? A "live theory"?

One mental image I had: suppose I have my personal LLM running on my personal academic webpage. I've fine-tuned this LLM to represent my personal ideas (a supplement to my published writing).

People can come to my website and talk to my LLM.^[3]
People can use it as a proxy for me, asking pointed questions about my ideas or even trying to convince the LLM out of some position they see as mistaken.
I can see all the conversations people have with my personal LLM.
- I can use the kind of correction wizard I described previously to improve its responses, revising it to reflect my own opinions more accurately.
  - Maybe I change my mind when I see an argument someone has had with my LLM, so I revise my own opinion and likewise revise the LLM to reflect the update.
  - Maybe I'm not persuaded by the user's argument but my LLM was, so I revise the LLM to not be pursuaded by similar arguments next time.
- The LLM ranks the conversations by what it thinks I'll want to see, and I can also revise this until it is good enough to filter out the garbage and surface what is important to me.
- The LLM can also learn to anticipate whether and how I'll respond to a conversation in the queue, so it can pre-generate revisions for itself. The actual revisions I implement also count as training data to revise these projected revisions.
  - If the projected revisions get good enough, perhaps some degree of self-correction is permitted. Maybe the web interface allows users to choose between the "latest stable release" (all revisions approved by me) vs "unstable" (includes auto-approved revisions to my views, which I have yet to personally review).

Editable UI?

Maybe the best UI for what I'm envisioning is not a chat interface. Wikis were innovative UIs: websites with the editor built-in. I'm just spitballing here, but perhaps a Wiki AI is more directly like that: a website with vibe-coding built in? A "live interface"?

^{^}
Neither the "social" nor the "media" parts of social media are bad in themselves. The bad part is how these algorithms are out to get you, justifying paranoia.
If, as many people expect, we never solve the problem of how to ensure that superintelligent AI systems don't kill everyone, then this is a rearranging-chairs-on-Titanic type concern. However, if humans do indeed solve a version of the alignment problem adequate to deliberately point superintelligence in a desired direction, then things could go terribly anyway if superintelligence is pointed at grabbing human attention or something similar.
^{^}
There are things that could go wrong. A clever human-empowerment-maximizer would first modify human values to want power more, then empower the humans. So, closer to a good idea would be to maximize human-empowerment-while-not-changing-human-values... but this could also go wrong...
^{^}
One thing that would really help here would be micropayments for compute. Imagine if someone has an OpenRouter account, for example, and when they visit my webpage, there's somehow an easy way to connect their OpenRouter account, and use the credits they have there to pay for talking to my personal LLM. This does not make sense with the way OpenRouter currently works; I'm just saying, something like this would help support the sort of ecosystem I'm imagining here. (This is an example of a concrete thing one could try and build in order to help support this vision, although there might not be enough demand for it to work as a business!)

It's another draft post sitting in my queue for embarrassingly long ↩︎

It's another draft post sitting in my queue for embarrassingly long ↩︎

LESSWRONG
LW