With great pleasure! The experience was so revealing that led me to codifying the process and prepare the Stream Coding Manifesto (available on GitHub here: Stream Coding), just launched last month btw.
I've also created the relative Claude Skill in order to make it immediately actionable, downloadable as well via GitHub.
The "attack dog" metaphor is definitely sticky but personally I've found that you can build the fence directly INTO the task definition, so the dog runs free but in a "directed" and "constrained" path.
When context is ambiguous and objective (the "why") isn't clear enough, it doesn't matter how much the model is smart, problems are going to arise and technical debt is certain. If you invest in complete and crystal-clear specs your chances that the plans come out right increases in a significant way (the so called Context Engineering).
I ran this as an experiment: I've invested about 80% of my time into specs building, then code generation became literally automatic. As a... (read more)
Hi @Dhruv Trehan, thanks for the honest breakdown. I've personally experienced how complex and definitely not straightforward this topic could be. For example, even the allegedly more trivial parts (e.g authors metadata) have seen deep changes over time and are sometimes full of peculiarities or clear errors (due to sloppy formatting or when converted from LaTeX).
I've prepared in the last weeks an online platform that parses ar5iv HTML (and ArXiv HTML for recent papers) in order to create a "meaningful skeleton" for each paper and cross-reference the context (the distilled paper's text + claims + equations) against figures. It's still in beta but the first outputs are interesting. The idea is to... (read more)
TL;DR: We keep optimizing retrieval, but are the documents we feed to LLMs safe to chunk without losing crucial qualifying context?
When users bulldoze files blindly to RAG, hallucinations can start even before retrieval runs.
Concrete example: in this biology paper, the parameter β for prokaryotes appears as 0.33, 0.73, and 1.68: they are all correct, but for different scaling regimes. Some explanations are nearby, others are fourteen pages away in the supplementary material.[1]Give this to a RAG, and the reconciliation context disappears. A user asks "What's β for prokaryotes?" Your RAG answers with confidence. Too bad it's wrong.
I built Clarity Gate, an open-source pre-ingestion system that verifies documents and adds clear epistemic markers.... (read 2530 more words →)
TL;DR: A published biology paper reports β=0.33, β=0.73, and β=1.68 for the same parameter: all correct, different scaling regimes, properly explained fourteen pages later. But when this paper gets chunked for a RAG knowledge base, the reconciliation context disappears. A user asking "What's β for prokaryotes?" gets confidently wrong information. I built Clarity Gate: an open-source system that verifies epistemic quality before documents enter RAG knowledge bases. Early experiments show promise; I'm looking for collaborators to validate it.
The Problem
I tested Clarity Gate on a randomly selected scientific paper: "Metabolic scaling in small life forms" by Ritchie & Kempes (arXiv:2403.00001).
In under a minute, it flagged numerical tensions for the scaling exponent β:
A quick practical question: you said that the agent behind idea generation processed more than 135 papers and I presume that most of them are equation-heavy. How did your agents handle the math?
I've been working on the same problem: when papers are transformed in PDF equations aren’t always available as structured math. If they're treated as images in extraction they become hard to understand and to put in context by LLMs.
Have you used LaTeX source or ar5iv HTML? Or just hope that the surrounding text carried enough signal?
With great pleasure! The experience was so revealing that led me to codifying the process and prepare the Stream Coding Manifesto (available on GitHub here: Stream Coding), just launched last month btw.
I've also created the relative Claude Skill in order to make it immediately actionable, downloadable as well via GitHub.