My experience is that Claude and ChatGPT are tuned to be very agreeable in a way that means they never stand up to you if you ask them to defend something that's probably false but uncertain. The only times they stand up to you is if you ask them about something they're trained not to agree with or talk about, or if you ask something obvious false, like asking it to prove that 2 + 2 = 5.

Reply

[-]ChristianKl1y70

On way to deal with this to either ask very open questions that don't have a predetermined answer or to ask the LLM to argue both pro and con positions and then write a conclusion. Even when the conclusion isn't perfect, it gives you access to relevant arguments on both sides which you can then use to think about the issue yourself.

Reply

[-]Mo Putera1y10

I have a similar experience. Do you know of any LLMs that aren't as agreeable in a useful way?

Reply

[-]tailcalled1y40

One thing I like to do is regularly assert something I know to be false to keep track of whether the LLM is ready to push back.

Reply

[-]tailcalled1y20

Maybe it is helpful to be more specific. When I am uncertain about something, I rarely ask the LLM "Is <most likely answer> true?". Instead I assert "<Less likely answer> is true.".

Reply

[-]green_leaf1y21

LLMs hallucinate studies/papers so regularly you're lucky to get a real one. That doesn't have an impact on the truth of the facts they claimed beforehand. (Also, yes, Claude 3 Haiku is significantly less intelligent than 3.5 Sonnet.)

Reply

1

[-]dirk1y10

It doesn't directly impact the truth of the facts they claimed, no—but the facts are produced by the same process that produces the studies. It's easy to check whether or not a study exists; less-clearcut domains aren't necessarily more reliable.

Reply

[-]green_leaf1y-1-2

Yes, but that very same process has a high probability probability of producing correct facts (today's LLMs are relatively reliable) and a very low probability of producing correct studies or papers.

Reply

1

[-]eukaryote1y20

Source?

Reply

[-]green_leaf1y10

The reliability of general facts could be checked by various benchmarks. The unreliability of specific studies and papers by personal experience, and by experiences of people I've read online.

I don't understand why, except maybe rephrasing a true fact keeps it true, but rephrasing a study title and a journal title makes it false.

Reply

[-]Capybasilisk1y10

According to Claude: green_leaf et al, 2024

Reply

1

[-]darinkishore1y10

My logs, asking 3.5 sonnet on the Claude website:


U: has ready made clothing declined in quality over time?

C: There are arguments on both sides of this question, but I'll provide a concise overview of some key points:

Reasons some argue quality has declined:

1. Fast fashion: …
2. Cost-cutting measures: … 
3. Changing consumer expectations: …

Reasons others argue quality has not necessarily declined:

1. Technological advancements: …
2. Increased variety: …
3. Better regulations: …
4. Survivorship bias: …

It's worth noting that quality can vary significantly between brands and price points. Would you like me to elaborate on any specific aspect of this topic?

U: If you have any studies to back up either claim, that’d be lovely.

C: I appreciate your interest in studies on this topic. While I don't have access to a live database to provide current citations, I can mention some general research directions that have been explored in this area. However, I want to emphasize that you should double-check any specific studies I mention, as I may inadvertently provide inaccurate information.

Some research areas that have been explored include:

1. Textile durability studies: …
2. Consumer perception surveys: …
3. Supply chain analyses: …
4. Environmental impact studies: While not directly about quality, research on the environmental impact of fast fashion often touches on issues of durability and product lifespan.
5. Historical pricing analyses: …

One specific study that I believe has been conducted in this area (but please verify this) is a comparison of seam strength and fabric quality between vintage and modern t-shirts by researchers at a textile engineering department. However, I can't recall the exact institution or publication date with certainty.

Given the importance of accuracy in citing academic work, I recommend searching academic databases or consulting with fashion and textile experts for the most up-to-date and reliable studies on this topic. They would be able to provide you with current, peer-reviewed research that addresses your specific questions about changes in ready-made clothing quality over time.

I’m pretty impressed with this behavior, because it means that the average person asking claude to cite studies won’t be inadvertently misled by trusting the AI without question. I’ve used 3.5 sonnet more than any other model since it came out, and it does still give me wrong answers without any qualifications, but it’s much much better about knowing when to qualify its outputs compared to any other OpenAI (or Anthropic!) model I’ve used.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

26

Just because an LLM said it doesn't mean it's true: an illustrative example

26

26