Huh, I want to talk to you about this - I've been working on something similar, seems like our taste is overlapping but a bit different. Would love to merge efforts a bit.
You and your agent can now query this rich dataset with the full expressive power of SQL + vector algebra.
Some example usage: > what is Eliezer's most Eliezer post?
> find the 4 posts over 200 karma that are most distant from each other in every way (not the average of them). we want to create 4 quadrants.
> I need posts with the seriousness and quality of list of lethalities, but that's maybe not AI AND doom pilled (one or the other is okay).
As you can see, this is a very powerful paradigm of search. Structured Query Language is a real OG, embeddings and arbitrary vector composition takes it to the next level, and agents are very good at working with this stuff.
Some cool aspects of this project:
hardening up a SQL database enough to let the public run queries. There's so much collective trauma about SQL injection attacks that most people have forgotten that this is possible.
I've built on syntactic sugar for using custom vectors. Agents can embed arbitrary queries and refer to them with @vector_handle syntax. This compactness helps agents reason efficiently, and let's us not have to pass around 8kb vectors.
Opus 4.5 and GPT-5.2 allowed me to ship this in a couple weeks. The software intelligence explosion is here.
product-as-a-prompt, agent-copilot-targeted UX as a paradigm. It was pretty cool realizing I could e.g. just describe my /feedback API endpoint in the prompt, to open up the easy communication channel with users and help me iterate on the project better.
The affordability of Hetzner dedicated machines is worth mentioning. I was really feeling constrained with my very limited budget trying to build something real with DigitalOcean. I discovered Hetzner late Nov and just bought (started renting) a monster machine before I knew what to do with it, knowing something had to happen. The breathing room with the machine specs has really allowed the project to expand scope, with currently over 400 GB of indexes (for query performance), able to ingest and embed basically every interesting source I've been able to think of. If I were a VC, I would be using a tool like Scry and visiting universities to find cash-strapped neurodivergent builders and maybe offer them a Hetzner machine and Claude Max and GPT Pro subscription just to see what happens.
there's also an alerts functionality. Like we're ingesting thousands of papers, posts, articles, comments a day, so you can just specify an arbitrary SQL query that we'll run daily or more often, and get an email when the output changes. Google Alerts on steroids.
Happy to take any feedback! I'll likely be releasing a Mac App in the next few days to provide a smoother sandboxed experience.