My experience is that Claude and ChatGPT are tuned to be very agreeable in a way that means they never stand up to you if you ask them to defend something that's probably false but uncertain. The only times they stand up to you is if you ask them about something they're trained not to agree with or talk about, or if you ask something obvious false, like asking it to prove that 2 + 2 = 5.
On way to deal with this to either ask very open questions that don't have a predetermined answer or to ask the LLM to argue both pro and con positions and then write a conclusion. Even when the conclusion isn't perfect, it gives you access to relevant arguments on both sides which you can then use to think about the issue yourself.
One thing I like to do is regularly assert something I know to be false to keep track of whether the LLM is ready to push back.
Maybe it is helpful to be more specific. When I am uncertain about something, I rarely ask the LLM "Is <most likely answer> true?". Instead I assert "<Less likely answer> is true.".
LLMs hallucinate studies/papers so regularly you're lucky to get a real one. That doesn't have an impact on the truth of the facts they claimed beforehand. (Also, yes, Claude 3 Haiku is significantly less intelligent than 3.5 Sonnet.)
It doesn't directly impact the truth of the facts they claimed, no—but the facts are produced by the same process that produces the studies. It's easy to check whether or not a study exists; less-clearcut domains aren't necessarily more reliable.
Yes, but that very same process has a high probability probability of producing correct facts (today's LLMs are relatively reliable) and a very low probability of producing correct studies or papers.
The reliability of general facts could be checked by various benchmarks. The unreliability of specific studies and papers by personal experience, and by experiences of people I've read online.
I don't understand why, except maybe rephrasing a true fact keeps it true, but rephrasing a study title and a journal title makes it false.
My logs, asking 3.5 sonnet on the Claude website:
U: has ready made clothing declined in quality over time?
C: There are arguments on both sides of this question, but I'll provide a concise overview of some key points:
Reasons some argue quality has declined:
1. Fast fashion: …
2. Cost-cutting measures: …
3. Changing consumer expectations: …
Reasons others argue quality has not necessarily declined:
1. Technological advancements: …
2. Increased variety: …
3. Better regulations: …
4. Survivorship bias: …
It's worth noting that quality can vary significantly between brands and price points. Would you like me to elaborate on any specific aspect of this topic?
U: If you have any studies to back up either claim, that’d be lovely.
C: I appreciate your interest in studies on this topic. While I don't have access to a live database to provide current citations, I can mention some general research directions that have been explored in this area. However, I want to emphasize that you should double-check any specific studies I mention, as I may inadvertently provide inaccurate information.
Some research areas that have been explored include:
1. Textile durability studies: …
2. Consumer perception surveys: …
3. Supply chain analyses: …
4. Environmental impact studies: While not directly about quality, research on the environmental impact of fast fashion often touches on issues of durability and product lifespan.
5. Historical pricing analyses: …
One specific study that I believe has been conducted in this area (but please verify this) is a comparison of seam strength and fabric quality between vintage and modern t-shirts by researchers at a textile engineering department. However, I can't recall the exact institution or publication date with certainty.
Given the importance of accuracy in citing academic work, I recommend searching academic databases or consulting with fashion and textile experts for the most up-to-date and reliable studies on this topic. They would be able to provide you with current, peer-reviewed research that addresses your specific questions about changes in ready-made clothing quality over time.
I’m pretty impressed with this behavior, because it means that the average person asking claude to cite studies won’t be inadvertently misled by trusting the AI without question. I’ve used 3.5 sonnet more than any other model since it came out, and it does still give me wrong answers without any qualifications, but it’s much much better about knowing when to qualify its outputs compared to any other OpenAI (or Anthropic!) model I’ve used.
This was originally posted in the comments of You don't know how bad most things are nor precisely how they're bad.; I've broken it out into a post because I think it might be a useful corrective more generally for people inclined to cite LLM remarks as fact.
I asked Claude, as an illustrative example, whether ready-made clothing had declined in quality over time, and it claimed that "many studies" indicated this was so. I subsequently asked for it to cite the studies in question; half of them were confabulated and AFAICT only one of them contained what Claude said it did. (Note that this is Claude 3 Haiku, which isn't the best; I'd expect frontier models to name four or maybe even five real papers in a list of six (and, with lower confidence, to have a more accurate understanding of the papers' contents)).
Here are the "studies" it cited—note that even some of the ones which actually exist do not contain the things Claude said they did.
"Apparel and Footwear Benchmarking Analysis" - Report by the American Apparel & Footwear Association- Confabulated (there are reports with similar titles from organizations which are not this one, but this specific report does not exist)"The Apparel Sourcing Caravan's New Route to the Market" - Report by the Boston Consulting Group- Confabulated"Clothing Durability and the Creation of Value" - Study by the University of Cambridge Institute for Manufacturing (2018)- ConfabulatedPosting this list of sources without annotations would have been of negative value, leading unwary readers to change their minds based on studies that don't exist and forcing somewhat cannier ones to spend what could be several hours checking the bullshit I produced in a few minutes.
Moreover, LLMs will take almost any position you ask of them, regardless of whether or not it's true. Just for fun, here's Claude's list of five studies proving clothing's actually gotten better over time (veracity left as an exercise for the reader):
For these reasons, I do not think it is appropriate to cite the statements of LLMs as though they are evidence in favor of your position.