It's fascinating how my default posture relative to the veracity or lack thereof of LLM responses has changed over time and how different it is depending on my own expertise on a subject.
In 2023 and 2024 I developed a sort of cognitive muscle memory that led me to second guess almost anything an LLM output that I didn't have enough expertise in myself to verify.
At some point in 2025 that shifted, especially as hallucination rates noticeably declined and my use of LLMs evolved from tinkering with their capabilities to productive utility. Specifically, as I use LLMs more and more for tasks in which I have little expertise, I find myself checking their sources much less frequently and trusting their output unless something has a sort of smell that I can only describe as intuition derived from extended use.
Even in somewhat high-stakes situations, I take more of a mixture-of-experts model ensemble approach by asking several different models the same question rather than independently verifying information.
It reminds me of the early days of Wikipedia when it took time for me to gain confidence that what was there was real, eventually shifting from frequently double-checking citations to assuming that most of the time it's correct.
(edit to correct mischaracterization of MoE)
Despite what the word seems to suggest, MoE doesn't actually work that way ("experts" are just small parts of one layer in a multi-layer transformer; the term predates deep learning by a couple of decades so you can't really blame its authors).
A better wording would be LLM ensembling, as in https://en.wikipedia.org/wiki/Ensemble_learning
I tested this, and if you push Grok 4.1 to explain its claim in more detail, it'll admit that a second treaty, signed by the same delegation on the same day, includes the opium ban:
Grok's response
There was a separate treaty signed in 1880 between the United States and China that did address opium, but it is not the Angell Treaty.
Key article (Article II of the 1880 opium treaty):
“The Governments of the United States and China mutually agree that no citizen or vessel of the United States shall be permitted to import opium into any of the open ports of China, or to transport it from one open port to another open port of China, or to buy and sell opium in China…”
Full title: “Agreement between the United States and China relative to the opium trade, concluded November 17, 1880”
This is published separately in the Statutes at Large (22 Stat. 829) and in Bevans Vol. 6, pp. 685–686.
Prompt (written by Claude 4.5 Sonnet)
I need you to verify a specific historical claim for me. I've been told that the Angell Treaty of 1880 between the US and China included provisions that banned or regulated opium trade between the two countries.
I'm skeptical because when I look at the 1880 Angell Treaty, it appears to only deal with Chinese immigration - it's the precursor to the Chinese Exclusion Act.
Here's what I need from you:
Please be very precise - I'm specifically asking about treaties signed in 1880, not 1888, 1903, or other years.
Note that Grok confabulates the name of the treaty, some of the wording, the title, and the places where it is published.
This is also mentioned in passing in the Wikipedia article on the 1880 Angell Treaty:
The treaty was concluded along with another treaty, also negotiated by the Angell commission, that imposed restrictions on trade in opium.
Interesting test, thanks. Also hm I checked that Wikipedia page but missed that line apparently. It seems the auction site I found first actually took that exact line from Wikipedia as well.
There's a degree to which my confusion on the matter was my own fault (there are sources about this if you dig enough/use better queries), but that's the point—the AIs knew better than me. Though, the mistakes on my part do lower the competence level the AIs had to beat...
I would be shocked if an english speaking American human got that right unless they were an expert who was digging into the details on their own for a class or a PhD thesis or something like that.
If LLMs are bringing "the level an imperfect expert would" on EVERY topic then it means AI is now weakly superhuman.
When people push back on my use of the word "weakly superhuman" now, they push back by saying that AI was going to be called superhuman by some when it could do EVERY job better than us. That is: they aren't comparing "an AI" to "a human" but comparing "AI as a concept" with "the entire human economy and culture"?
Its like "which is weaker and which is stronger, the Weak Effificient Markets Hypothesis or the Weakly Superintelligent AI?"
I think the markets are smarter than the AI for now, still, but also I think that might be true forever-ish, considering that AI are being deployed very fast into trading contexts and so "the market" itself is also changing (to take their intelligence into account when approximating the Correct prices).
It is interesting that somehow the LLMs had accurate knowledge of obscure facts like that yet was unable to say the exact source… but isnt the most obvious place "Treaties and Other International Agreements of the United States of America: Volume 6 (Bilateral treaties, 1776-1949: Canada-Czechoslovakia)" the one to check first? (for anyone that had to dig it up manually)
Which other places could be more comprehensive or authoritative?
I didn't know it existed earlier, and that took a bit of googling to find. I actually found a different list of US treaties first which didn't include the opium treaty.
Recently I've been accumulating stories where I think an LLM is mistaken, only to discover that I'm the one who's wrong. My favorite recent case came while researching 19th century US-China opium trade.
It's a somewhat convoluted history: opium was smuggled when it was legal to sell and when it wasn't, and the US waffled between banning and legalizing the trade. I wanted to find out how it was banned the second time, and both Claude Research and Grokipedia told me it was by the Angell Treaty of 1880 between the US and China. Problem is, I've read that treaty, and it only has to do with immigration—it's a notable prelude to the infamous Chinese Exclusion Act of 1882. Claude didn't cite a source specifically for its claim, and Grok cited "[internal knowledge]", strangely, and googling didn't turn up anything, so I figured the factoid was confabulated.
However, doing more research about the Angell mission to China later, I came across an offhand mention of a second treaty negotiated by James Angell with Qing China in 1880 (on an auction website of all places[1]). Eventually I managed to find a good University of Michigan source on the matter, as well as the actual text of the second treaty in the State Department's "Treaties and Other International Agreements of the United States of America: Volume 6 (Bilateral treaties, 1776-1949: Canada-Czechoslovakia)".
Anyway, Claude and Grok were right. Even though opium wasn't even in the remit of the Angell mission, when Li Hongzhang surprised the American delegation by proposing a second treaty banning it, James Angell agreed on the spot. It was later ratified alongside the main immigration treaty. The opium treaty doesn't appear to have a distinct name from its more famous brother; the State Department merely lists the immigration treaty under the title "Immigration", and the opium treaty under the title "Commercial Relations and Judicial Procedure", so I can't entirely fault the LLMs for not specifying, though they ought to have done so for clarity. I suspect they were confused by the gap between the US government records they were trained on and the lack of sources they could find online?
(An aside: by 1880 US opium trade was in decline, while British opium trade was peaking, just about to be overtaken by the growth of domestic Chinese production. Angell judged correctly that the moral case overwhelmed the limited remaining Bostonian business interests and made the ban good politics in the US, particularly because it was reciprocal—he could claim to be protecting Americans from the drug as well. Though, that's a harsh way of putting it; Angell personally stuck his neck out, mostly upon his own convictions, and both he and the US deserve credit for that.[2])
If all that doesn't convince you to doublecheck your own assumptions when dealing with LLMs, well, there have been more boring cases too: I asked Claude to perform a tiresome calculation similar to one I had done myself a month before, Claude got a very different answer, I assumed it made a mistake, but actually it turns out I did it wrong the first time! Claude made a change in my code, I reverted it thinking it was wrong, but actually it had detected a subtle bug! I think by now we're all aware that LLMs are quite capable in math and coding, of course, but I list these examples for completeness in my argument: the correct update to make when an LLM contradicts you is not zero, and it's getting bigger.
Apparently there's a decent market for presidential signatures of note? They managed to sell President Garfield's signature ratifying the Angell Treaty of 1880 for ten grand, partly off the infamy of the treaty and partly because Garfield's presidential signature is rare, him having been assassinated 6 months into the job.
Fun bit of color from the UMich source: