Open Source Search (Summary)
Below post may be an older version of this document. Click link for latest version. 2025-06-20 Open Source Search (Summary) Disclaimer * Quick note * I support a complete ban on AI R&D. This app requiring AI doesn't change that. Summary * This document describes how to build an open source search engine for the entire internet, that runs on a residential server * As of 2025 it'll cost between $100k-$1M to build and host this server. This cost will reduce with every passing year, as GPU, RAM and disk prices reduce. * Most expensive step is GPU capex to generate embeddings for the entire internet. * Most steps can be done using low-complexity software such as bash scripts (curl --multi, htmlq -tw, curl -X "$LLM_URL", etc) Main Why? * I realised my posts on this topic are sprawling all over the place, without one post to summarise it all. Hence this post. * If someone donates me $1M I might consider building this. I've written code for more than half the steps, and no step here seems impossibly hard. Use cases of open source search * Censorship-resistant backups * aka internet with no delete button aka Liu Cixin's dark forest * Any data that reaches any server may end up backed up by people across multiple countries forever. * You can read my other posts for more on the implications of censorship-resistant backups and discovery. * Censorship-resistant discovery * Any data that reaches any server may end up searchable by everyone forever. * Currently each country's govt bans channels and websites that they find threatening. It is harder to block a torrent of a qdrant snapshot, than to block a static list of IP addresses and domains. Will reduce cost-of-entry/exit for a new youtuber. * Since youtubers can potentially run for govt, subscribing to a youtuber is a (weak) vote for their govt. * Privacy-preserving search * In theory, it will become possible to run searches on an airgapped tails machine. Search indices can be stored o
Can you give an example in the real world? (Prefer historical examples if you dont wanna be too controversial) Both your comments are abstract so I'm unclear what you have in mind.