samuelshadrach

Open Source Search (Summary)

Below post may be an older version of this document. Click link for latest version. 2025-06-20 Open Source Search (Summary) Disclaimer * Quick note * I support a complete ban on AI R&D. This app requiring AI doesn't change that. Summary * This document describes how to build an open source search engine for the entire internet, that runs on a residential server * As of 2025 it'll cost between $100k-$1M to build and host this server. This cost will reduce with every passing year, as GPU, RAM and disk prices reduce. * Most expensive step is GPU capex to generate embeddings for the entire internet. * Most steps can be done using low-complexity software such as bash scripts (curl --multi, htmlq -tw, curl -X "$LLM_URL", etc) Main Why? * I realised my posts on this topic are sprawling all over the place, without one post to summarise it all. Hence this post. * If someone donates me $1M I might consider building this. I've written code for more than half the steps, and no step here seems impossibly hard. Use cases of open source search * Censorship-resistant backups * aka internet with no delete button aka Liu Cixin's dark forest * Any data that reaches any server may end up backed up by people across multiple countries forever. * You can read my other posts for more on the implications of censorship-resistant backups and discovery. * Censorship-resistant discovery * Any data that reaches any server may end up searchable by everyone forever. * Currently each country's govt bans channels and websites that they find threatening. It is harder to block a torrent of a qdrant snapshot, than to block a static list of IP addresses and domains. Will reduce cost-of-entry/exit for a new youtuber. * Since youtubers can potentially run for govt, subscribing to a youtuber is a (weak) vote for their govt. * Privacy-preserving search * In theory, it will become possible to run searches on an airgapped tails machine. Search indices can be stored o

21Jun 18, 2025

samuelshadrach

Message

DMs open

My website

Support the movement against AI extinction risk

My views on Lesswrong

Donate Monero

Most of my documents are living documents, so the version posted on lesswrong may be an older version. Latest version is on my website.

257

380

Support the Movement against AI extinction risk

2025-11-16 Disclaimer * Contains politically sensitive info Summary * Start social media channel around AI risk * Cyberattack AI companies * Run for US or UK election with AI pause as an agenda Background assumption * Assuming I and people like me do nothing, the most likely scenario I forecast...

Nov 16, 20252

MtG Colour Wheel applied to Politics

2025-11-10 Disclaimer * Quick Note * Target audience - Anyone curious about this topic. I've tried making myself more comprehensible here. * I have never spoken to Duncan Sabien, and I might end up reinterpreting his concepts in ways he doesn't like or agree with. If he's reading this, he...

Nov 10, 2025-5

Theory of Change for US Govt Whistleblower Database and Guide

2025-11-05 Disclaimer * I have quickly copy-pasted sections from an older document. Minor errors possible. Existing projects and weaknesses * EA projects * Most EA funding currently goes to: * technical AI research * I am supportive of technical AI alignment research but bearish on most of it working out...

Nov 5, 20252

Retrospective on US govt whistleblower guide and DB

2025-11-05 Incomplete resource This resource is incomplete because I haven't studied opsec mistakes of every single previous whistleblower in detail. You may have noticed some sections marked as "to do" in the DB. I don't think studying those past cases changes the advice for future whistleblowers much, hence I didn't...

Nov 4, 20254

US Govt Whistleblower Guide

2025-10-28 Disclaimer * Incomplete. Work-in-progress. Why this guide? * I continue to think there isn't a single whistleblower guide on the internet that's good enough for this scenario. Some guides avoid talking about important details due to chilling effects. Other guides prioritise interests of journalists or lawyers. Summary of the...

Nov 4, 20251

US Govt Whistleblower Database

2025-10-17 Disclaimer * Incomplete * All information here is based on public record What? * Collecting (mostly) fact-checked of previous US govt whistleblowers. Who? * Primary target audience: Potential whistleblowers in future. Especially focussed on those working at AI companies. Why? * Might aid future whistleblowers, or people directly working...

Nov 4, 20256

Samuel x Bhishma - Superintelligence by 2030?

Did an AI timelines debate with a friend who works at Google. Link to debate Preview "In general, the trend has been once we see something happening, we as human researchers think - how do we put a human expert into this problem but the way it actually ends up...

Oct 21, 20256

Load More (7/36)

LESSWRONG
LW

LESSWRONG
LW

samuelshadrach

samuelshadrach

samuelshadrach

Open Source Search (Summary)

Advice for tech nerds in India in their 20s

Do you consider perfect surveillance inevitable?

Day #8 Hunger Strike, Protest Against Superintelligent AI

samuelshadrach

Support the Movement against AI extinction risk

MtG Colour Wheel applied to Politics

Theory of Change for US Govt Whistleblower Database and Guide

Retrospective on US govt whistleblower guide and DB

US Govt Whistleblower Guide

US Govt Whistleblower Database

Samuel x Bhishma - Superintelligence by 2030?

Open Source Search (Summary)

Advice for tech nerds in India in their 20s

Do you consider perfect surveillance inevitable?

Day #8 Hunger Strike, Protest Against Superintelligent AI

Support the Movement against AI extinction risk

MtG Colour Wheel applied to Politics

Theory of Change for US Govt Whistleblower Database and Guide

Retrospective on US govt whistleblower guide and DB

US Govt Whistleblower Guide

US Govt Whistleblower Database

Samuel x Bhishma - Superintelligence by 2030?