112

LESSWRONG
LW

111

artkpv's Shortform

by artkpv
12th Oct 2025
1 min read
7

2

This is a special post for quick takes by artkpv. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
artkpv's Shortform
3artkpv
2ChristianKl
1artkpv
2ChristianKl
1StanislavKrym
2artkpv
2StanislavKrym
7 comments, sorted by
top scoring
Click to highlight new comments since: Today at 8:11 AM
[-]artkpv7d3-3

The question of whether LLMs are a dead end, as discussed by R. Sutton, Y. LeCun, and T. Ord, among many others, is hundreds of years old. Currently, we see that the chance of an LLM agent failing a task rises with the number of steps taken. This was observed even before the era of LLMs, when agents were trained with imitation learning. The crux is whether further training of LLMs leads to the completion of longer tasks or if these agents hit a wall. Do LLMs indeed build a real-world model that allows them to take the right actions in long time horizon tasks? Yet, models might only build a model from predicting what humans would say as their next token. Then the question is whether humans possess the necessary knowledge.

Questions like "what is knowledge, and do we have it?" are hundreds of years old. Aristotle wrote that the basis for every statement, and thus reasoning and thinking (1005b5), is that "A or not A" (the law of identity). In other words, it is impossible for something to be and not be what it is simultaneously. This is the beginning of all reasoning. This law opposes the view of sophists like Protagoras, who claimed that what we sense or our opinions constitute knowledge. Sophists held that something can both be and not be what it is at the same time, or "everything flows" (panta rhei, Heraclitus). Plato and Aristotle opposed this view. The law of identity suggests that ground truth is essential for correct reasoning and action. And it’s not about mathematical problems where LLMs show impressive results; it’s about reasoning and acting in the real world. So far, LLMs are taught mainly based on predicting what people would say next—opinions rather than real-world experience.

Why are LLMs trained on opinions? Their pre-training corpus is over 99% composed of people’s opinions and not real-world experience. The entire history of knowledge is a struggle to find the truth and overcome falsehoods and fallacies. The artifacts remaining from this struggle are filled with false beliefs. Even our best thinkers were wrong in some sense, like Aristotle, who believed slavery wasn’t a bad thing (see his Politics). We train LLMs not only on the artifacts from our best thinkers but, in 99.95% of cases, on web crawls, social media, and code. The largest bulk of compute is spent on pre-training, not on post-training for real-world tasks. Whether the data is mostly false or can serve as a good foundation for training on real-world tasks remains an open question. Can a model trained to predict opinions without real experience behave correctly? This is what reinforcement learning addresses.

Reinforcement learning involves learning from experience in the search for something good. Plato depicted this beautifully in his allegory of the cave, where a seeker finds truth on the path to the Sun. A real-world model is built from seeking something good. The current standard model of an intelligent agent reflects what Aristotle described about human nature: conscious decisions, behavior, and reasoning to achieve good (Nicomachean Ethics, 1139a30). LLMs are mostly trained on predicting the next token, not achieving something good. Perhaps Moravec's paradox results from this training; models don’t possess the general knowledge or reasoning. General reasoning might be required to build economically impactful agents. General reasoning is the thinking using the real world knowledge in novel situations. Will models learn it someday?

Reply
[-]ChristianKl6d20

We train LLMs not only on the artifacts from our best thinkers but, in 99.95% of cases, on web crawls, social media, and code. 

Concluding from a paper that says in it's abstract "commercial models rarely detail their data" that you know what makes up 99.95% of cases of training data, is a huge reasoning mistake. 

Given public communication it's also pretty clear that synthetic data is more than 0.05% of the data. Elon Musk already speaks about training a model that's 100% synthetic data.

Reply
[-]artkpv6d10

I agree that commercial models don't detail their data, the point is to have an estimate. I guess, Soldaini et al., ‘Dolma’, made their best to collect the data, and we can assume commercial models have similar sources.

Reply
[-]ChristianKl6d20

I agree that commercial models don't detail their data, the point is to have an estimate. 

That's I searched the key's under the streetlight. The keys are not under the streetlight. 

I guess, Soldaini et al., ‘Dolma’, made their best to collect the data, and we can assume commercial models have similar sources.

Soldaini et al have far less capital to collect data than the big companies building models. On the other hand the big model companies can pay billions for their data. This means that they can license data sources that Soldaini et al can't. It also means that they can spend a lot of capital on synthetic data.

Soldaini et al does not include libgen/Anna's Archive but it's likely that all of the big companies besides Google that has their own scans of all books that they use do. Antrophic paid out over a billion in the settlement for that copyright violation.

Even outside of paying for data and just using pirated data, the big companies have a lot of usage data. The most common example for syncopancy in AI models is that it's due to the models optimizing for users clicking thumbs-up. 

Reply
[-]StanislavKrym7d10

Except that LLMs do possess at least general knowledge which any model obtained from all the texts that it has read. And their CoT is already likely the way to do some reasoning. In order to assess whether SOTA models have actually learned reasoning, it might be useful to consider benchmarks like ARC-AGI. As I detailed in my quick take, nearly every big model was tested on the benchmark and it looks as if there exist scaling laws, significantly slowing progress down as the cost increases beyond a threshold.  

Reply
[-]artkpv6d20

Not sure I understand if you disagree or agree with something. The point of the post above was that LLMs might stop showing the growth as we see now (Kwa et al., ‘Measuring AI Ability to Complete Long Tasks’), not that there is no LLM reasoning at all, general or not.

Reply
[-]StanislavKrym6d20

I guess that you should have done something like my attempt to analyse three benchmarks and their potential scaling laws. The SOTA Pareto frontier of LLMs on the ARC-AGI-1 benchmark and the straight line observed on the ARC-AGI-2 benchmark in high-cost mode could imply that the CoT architecture has its limitations ensuring that using CoT-based LLMs for working with large codebases or generating novel ideas is likely impractical.

 I made a conjecture that this is due to problems with attention span where ideas are hard to fit. And another conjecture is that researchers might find out that it's easy to give neuralese models big internal memory.

See also Tomas B.'s experiment with asking Claude Sonnet 4.5 to write fiction and Random Developer's comment about Claude's unoriginal plot and ideas. 

Reply
Moderation Log
More from artkpv
View more
Curated and popular this week
7Comments