My understanding is that the major chatbots have been tuned to be very conservative with direct quotes to avoid copyright lawsuits. If true that would mean this can basically only be a 3rd party feature/scaffold and is never going to be the default or get upstreamed.
GPT-5.5-Thinking on ChatGPT.com
- You may not quote more than 25 words verbatim from any single non-lyrical source, unless the source is reddit.
[...]
- Each webpage source in the sources has a word limit label formatted like "[wordlim N]", in which N is the maximum number of words in the whole response that are attributed to that source. If omitted, the word limit is 200 words.
- Non-contiguous words derived from a given source must be counted to the word limit.
- The summarization limit N is a maximum for each source. The assistant must not exceed it
**COPYRIGHT HARD LIMITS - APPLY TO EVERY RESPONSE:**
- Paraphrasing-first. Claude avoids direct quotes except for rare exceptions
- Reproducing fifteen or more words from any single source is a SEVERE VIOLATION
- ONE quote per source MAXIMUM—after one quote, that source is CLOSED
These limits are NON-NEGOTIABLE. See [CRITICAL_COPYRIGHT_COMPLIANCE] for full rules.
Enforce a strict 125-character maximum for quotes from any source document.
(In Claude Code your main model e.g. Opus 4.7 calls the WebFetch tool passing an URL and a prompt, makes an HTTP request and passed the response + prompt + WebFetch tool system prompt to a smaller model eg Haiku 4.5; the above is an excerpt from the system prompt passed to Haiku 4.5 as of the Claude Code source code leak from 1.5 months ago.)
Indeed, Claude (on claude.ai - not on Claude Code, as of a week ago) has a system prompt forbidding it from including quotes longer than 15 words in length. I asked:
Do you have copyright limits that cap quotes at 15 words per source?
Yes. When I'm working with content from web search results or other copyrighted sources, I follow some hard limits:
Oh huh, that explains why this was a fair amount of annoying prompt iteration to get it working.
And for the same copyright reasons the labs will never allow the users to see the pretraining data in any way
Not 100% sure I parsed – you have experience doing the excerpt-only approach (if so, curious to hear more about how it's gone), or, just set it up based on this post?
Yes, I also used it on occasion. They have very bad summarization skills (many people disagree, but I think they have poor taste), they turn stuff very vague and slippery. Asking for excerpts helps to counteract that.
E.g. "Can you summarize this text by pulling excerpts out of it with crucial information and dropping some passages etc?"
This seems like a neat idea, but I'd like to flag that this strategy only seems applicable when the fact you are looking for already exists explicitly and at an appropriate explainer level. I'm not sure you can do anything equivalent if you want the LLM to explain, synthesize, summarize, or do original reasoning.
Yeah. I'm imagining that when you need that sort of thing (which to be clear, I acknowledge is "most of what people want LLMs to do", not like an edge case), the citations are retrieved, the LLM does a synthetization step, but then checks the results of that against the original source.
(and, the source is still provided in a place that's optimized for "easy to look at without requiring more than 1 and preferably 0 extra steps")
I just added this question to the post. I rattled this post off quickly, but seems good to amend it to answer more detailed FAQs in response to comment.
I think I missed the point. Why do the various 'deep research' features in chat UIs not implement something sufficiently similar?
I'm not actually sure what the title here should be. A few variations:
"AI claims could [should?] come bundled with a responsible human"
"Verification-Centric AI"
"Citation-Centric AI"
Right now, some AI chatbots provides little citation-links. That's better than not-having-them. But, those are a pain to open and investigate. Probably you very rarely do so.
One small UI annoyance is the friction introduced by these popups:

On Claude.ai and the Claude Desktop app this can be circumvented by pressing Ctrl when clicking on the link.
Or by injecting a userscript that claims that the Ctrl key was pressed when this click event happened:
// bypass "Open external window" popup
// MouseEvent::ctrlKey indicates whether the ctrl key was pressed at the time of the event
// We overwrite it in specific cases
const MouseEvent_prototype_ctrlKey = Object.getOwnPropertyDescriptor(MouseEvent.prototype, 'ctrlKey')
Object.defineProperty(PointerEvent.prototype, 'ctrlKey', {
get() {
// if (click event on an external link in the chat) claim Ctrl key was pressed
if (this.type === 'click' && this.srcElement?.tagName === 'A' && (this.srcElement.href.startsWith('https://') || this.srcElement.href.startsWith('http://')) && this.srcElement.closest('.contents'))
return true
// fallback to the default
return MouseEvent_prototype_ctrlKey.get.apply(this)
},
enumerable: true,
configurable: true,
})
"Sometimes the AI just makes stuff up" is a problem I don't really expect to go away. In the nearterm, AI is going to keep occasionally hallucinating, or misinterpreting information. Eventually, AI will be powerful enough we need to be worried if it's presenting misleading information on purpose.
There might be a nice window where the AI is powerful enough to not make things up but non-agentic enough that we don't have to worry about deliberate manipulation. But, even then, interpreting data is tricky.
I'm worried about this for my own use, but, I'm more worried about this on the global scale. I'm worried about people trusting things AI made up, and I'm worried about the internet proliferating with slop that makes it harder to even find original statements that are a human's real testimony.
An approach that might help is to make AI reports more "Verification centric." i.e. designed from the ground up to make it as easy and frictionless to verify as possible.
Right now, some AI chatbots provides little citation-links. That's better than not-having-them. But, those are a pain to open and investigate. Probably you very rarely do so.
So, imagine a world where when you answer a question, the AI doesn't guess. Instead:
"Quote" here can mean both "a paragraph of prose" or "a table of data." The AI is allowed to truncate text to make it easier to read, but not to make up text.
In this world, "hunt down primary sources" would be one of the main skills AIs are trained to do. (Maybe, pretraining data would be labeled such primary sources are more directly "available" to it's fast-response intuitions).
I've built myself a Claude skill aimed at this, but part of the point here is, as we deploy AIs at scale, to end up in a world where AI claims always come bundled with a human who's responsible for them.
(In worlds where the source is one human quoting another human, it'd say "By Alice Foo, according to Bob Bar.")
The AI scaffolding and UI would be designed to make it hard to end up accidentally using confabulated info. The automatic fast/dumb checker AI can verify the links in the chain that require no judgment. The UI makes it as frictionless as possible for you to sanity check that the results make sense in context.
"Okay Ray, but often I want it to summarize or explain things in a way that's never been explained before."
Often the original source for a fact is not written in a way that's easy for most people to interpret. For this sort of case, I'm imagining:
Doing this at scale efficiently is legit engineering problem. I think pieces of this could be done by dumb python scripts without. But, this feels like a better world to be aiming for.
Bonus points for cross-checking data
When stakes are higher (anyone trying to use AI to do novel intellectual work), you'd also want to combine the "fast dumb AI who checks the quote is literally accurate" with "smart AI then tries looking for corroborating or disproving evidence (which also comes in verified exact quotes).
For now...
Meanwhile, if you're interested in using the Direct Quotes skill, here's the current version I'm using:
"Direct Quotes" complete skill file
Direct Quotes
This skill changes how you present information when the user wants direct evidence from original sources. Instead of summarizing or paraphrasing, you lead with the source material itself so the user can read the evidence firsthand.
Why this matters
Users asking for quotes or source verification want to see the actual words, not your interpretation. They're trying to evaluate evidence themselves — your job is to find the best source and present it cleanly, not to editorialize. Think of yourself as a research librarian pulling the exact passage someone needs.
Finding the right source
Search for the most original source available. This means:
Use web search to find sources. Don't rely on memory for quote text — search and verify. If you find the original source URL, use web_fetch to pull the full text so you can quote accurately.
Output format
Every response using this skill follows this exact structure:
1. Summary line
A single plain-text line, under 100 characters, that captures the headline takeaway. No markdown formatting on this line. Think of it as a tweet-length thesis.
2. Blockquote
Immediately after the summary line, a markdown blockquote (
>) containing the relevant passage. Guidelines for the quote:[...]to indicate the gap. But prefer finding a continuous passage over stitching fragments.>on each line, with>(angle bracket + space on an empty line) between paragraphs to maintain the blockquote.3. Finding sources
Do not use paywalled sources. If you can only find an abstract, look for a public source, potentially on sci-hub or Anna's Archive, or find a pdf of the source material.
Try to search and find original sources.
3. Source attribution
Below the blockquote, a single line with:
[DD Mon YYYY — Source Name](URL)DD Mon YYYYformat (e.g.,23 Aug 2021). If only a year is known, use just the year. If no date is findable, omit it but keep the dash.#:~:text=...) so the link jumps to the relevant section. Encode spaces as%20in the fragment.When you can't find a reliable source
If you search and can't find a trustworthy original source for the claim:
Multiple claims in one request
If the user asks about several claims or quotes at once, repeat the format (summary → blockquote → source) for each one, separated by a horizontal rule (
---).What NOT to do
Example
User prompt: "Read this and summarize the key desiderata for what McCulloch was looking for" (with a link to McCulloch's 1960 paper "What Is a Number, that a Man May Know It, and a Man, that He May Know a Number?")
Correct output:
McCulloch sought a minimal mental event ("psychon") with four properties
1960 — Warren S. McCulloch, "What Is a Number, that a Man May Know It, and a Man, that He May Know a Number?", General Semantics Bulletin No. 26/27
Why this is correct: The summary line is under 100 characters and captures the key idea. The blockquote is verbatim from the source, with the four desiderata bolded as the most important content. Context sentences are included so the reader understands what a "psychon" is and where it led (neurons). The source attribution has the year, full title, and a direct link to the PDF.