ChristianKl - LessWrong

Your LLM-assisted scientific breakthrough probably isn't real

LLMs still do their best to give projects the benefit of the doubt.

There the saying that the key to doing a successful startup is to find an idea that looks stupid but that isn't. A startup is successful when it pursues a path that other people reject to pursue but that's valuable.

In many cases it's probably the same for scientific breakthroughs. The ideas behind them are not pursued because the experts in the field believe that the ideas are not promising on the surface.

A lot of the posts that you find on r/LLMPhysics and rejected LW posts have the feature of sounding smart on the surface to some lay people (the person interacting with the LLM), but that don't work. LLMs might have the feature of giving the kind of idea that sounds smart to lay people at the surface the benefit of the doubt but the kind of idea that sounds stupid to everyone on the surface evaluation no benefit of doubt.

I think it's really important that people who feel they've made a breakthrough should be thinking hard about whether their ideas are able to make falsifiable predictions that existing theories don't.

You can get a PHD in theoretical physics without developing ideas that allow you to make falsifiable predictions.

Making falsifiable predictions is one way to create value for other scientists but it's not the only one. Larry brings the example of "There are 20 people in this classroom" as a theory, that can be novel (nobody in the literature said anything about the amount of people in this classroom) and makes falsifiable predictions (everyone who counts, will count 20 people) but is completely worthless.

Your standard has both the problem that people whom the physics community gives PHDs don't meet it and also that plenty of work that does meet it is worthless.

I think the general principle should be that before you try to contact a researcher with your idea of a breakthrough, you should let the LLM simulate the answer of that researcher beforehand and iterate based on the objections that the LLM predicts to come from the researcher.

Your LLM-assisted scientific breakthrough probably isn't real

ChristianKl1d41

I agree, but I think it's out of scope for what I'm doing here — the validity and novelty of an attempted contribution can at least in principle be analyzed fairly objectively, but the importance seems much fuzzier and more subjective.

The idea of seeking objectivity here is not helpful if you want to contribute to the scientific project. I think that Larry McEnerney is good at explaining why that's the case, but you can also read plenty of Philosophy and History of Science on why that is.

If you want to contribute to the scientific project thinking about how what you doing relates to the scientific project is essential.

I'm not sure what you mean with "validity" and whether it's a sensible thing to talk about. If you try to optimize for some notion of validity instead of optimizing for doing something that's valuable to scientists, you doing something like trying to guess the teachers password. You are optimizing for form instead of optimizing for actually creating something valuable.

If you innovate in the method you are using in a way that violates some idea of conventional "validity" but you are providing value, you are doing well. Against Method wasn't accidently chosen as title. When Feynman was doing his drawings the first reaction of his fellow scientists was that they weren't "real science". He ended up getting his Nobel Prize for them.

As far as novelty goes, the query you are proposing isn't really a good way to determine novelty. To check novelty a better way is not to ask "Is this novel?" but "Is there prior art here?" Today, a good way to check that to run deep research reports. If your deep research request comes back with "I didn't find anything" that a better signal for novelty than an question asking whether something is novel being answered with "yes". LLMs don't like to answer "I didn't find anything if you let them run deep research request, they are much more willing to say something is novel when you ask them whether it's novel.

It's no good if you change your mind about the meaning of the experiment after you run it :)

Actually, a lot of scientific progress happens that way. You run experiments that they have results that you surprise you. You think about how to explain the results that you got and that brings you a better understanding of the problem domain you are interacting with.

If you want to create something intellectually valuable you need to go through the intellectual work of engaging with counter arguments to what you are doing. If an LLM provides a criticism of your work, that criticism might be valid or it isn't. If what you are doing is highly complex, the LLM might now understand what you are doing and that doesn't mean that your idea is doomed. Maybe, you can flesh out your idea more clearly. Even if you can't and the idea provides value it's still a good idea.

Detection of Asymptomatically Spreading Pathogens

ChristianKl2d20

It's unclear to me why the numbers you are report are only absolute numbers. I would expect that there are a bunch of factors that can make the wastewater more or less concentrated and thus have an effect on the reads but that you would want to filter out.

It's my understanding that when doing influenza vaccines for a session the specific strain of the influenza virus matters a lot. Currently, the dashboard doesn't seem to answer questions like "How does the dominant Influenza B strain differ today from last year?" and "Did the predictions they make at the beginning this year when they formulated the vaccine for this session actually pick the strains that are now dominant?"

Your LLM-assisted scientific breakthrough probably isn't real

ChristianKl3d21

I think this essay leaves out an important factor. To contribute to a scientific discourse you not only need to say something that's correct and novel but you also need to tackle problems that the scientific discourse finds important.

If you are working on a problem that nobody finds important it's a lot easier to make correct and novel findings than if you are working on a problem where an existing scientific field invest a lot into solving the problem. As a result, I would expect that cases where someone feels like they make a breakthrough finds something novel and correct but that interests nobody is happening frequently.

If I go through the reject post lists, plenty of those try to present an idea that the author thinks is clever without trying to establish a problem that they try to solve that's actually considered a problem by other people.

I like Larry McEnerney's talks about scientific writing. Instead of asking the LLM "To what extent is this project scientifically valid?" it's probably better to ask something like "Is this project solving problems any scientific field considers useful to solve?" Further queries: What field? Who are the experts in the field working on this problem? What would those experts say about my project? (one query per expert)

One key aspect of LLMs is that instead of mailing famous scientists with your ideas and asking them for opinions, the LLM can simulate the scientists. While that doesn't give you perfect results, you can get a lot of straightforward objections to your project that way.

Visit a frontier LLM that you haven't talked to about this breakthrough (at present: GPT-5-Thinking, Claude Opus, Gemini-2.5-Pro).

It's unclear to me why you don't list Grok in there. It's on the top of the benchmarks and it's less focused on sugar coating people's feelings. Grok4 gives you two queries every two hours for free.

Instead of precommitting how to react to any LLM answer, I would expect it's better to engage with the actual arguments the LLM makes. If an LLM criticizes a part of a project, thinking about how to fix that aspect is a good idea instead of just trying to take the outside view.

If you ask such a question, asking GPT-5-Thinking, Claude Opus, Gemini-2.5-Pro and Grok4 might be better instead of just asking one of them.

Before LLM Psychosis, There Was Yes-Man Psychosis

ChristianKl4d40

A central example: on my current models, yes-man psychosis was why Putin thought it was a good idea to invade Ukraine. Before the invasion, I remember reading e.g. RAND’s commentary (among others), which was basically “invading Ukraine would be peak idiocy, so presumably this is all sabre-rattling for diplomatic purposes”. Alas, in hindsight it seems Putin legitimately thought the invasion would be over in a few days, and Western powers wouldn’t do much about it. After all, the bureaucratic and advisory structures around him told him what he wanted to hear: that Russia’s military advantage would be utterly conclusive, Western leaders had no willingness to help, presumably nobody even mentioned the issues of endgame/exit strategy, etc.

This seems wrong. Putin likely also thinks in retrospect that invading Ukraine was a good idea with the information available to him now. While he might have thought that it would be easier, from his perspective this is likely still a good outcome. He managed to overcome historic low approval ratings. He managed to consolidate power in many other ways.

While the war is more expensive than initially assumed, Russia seems to be advancing. The Russian public largely doesn't believe that Putin made a mistake to decide to start the war.

Musings from a Lawyer turned AI Safety researcher (ShortForm)

ChristianKl9d124

On the flip side, this means that if we do know people who are AI experts but not from the US/EU/China, forwarding this information to them so that they can apply with a higher chance of being accepted might be valuable.

Banning Said Achmiz (and broader thoughts on moderation)

ChristianKl13d4-5

Man, posting on LessWrong seems really unrewarding. You show up, you put a ton of effort into a post, and at the end the comment section will tear apart some random thing that isn't load bearing for your argument, isn't something you consider particularly important, and whose discussion doesn't illuminate what you are trying to communicate, all the while implying that they are superior in their dismissal of your irrational and dumb ideas.

You could run an LLM every time someone tries to post a comment. If a top level reply tries to nitpick something that isn't key to the post, the LLM could say "It seems like you are tying to nitpick a point that's not central? Do you really want to write post this comment?"

While I hope it's gotten less, I do think I have written some comment myself in the past criticizing posts for minor issues that aren't central to the post. For me, I think a gentle nudge from an LLM asking "It seems like you are nitpicking something minor, do you really want to do that?" would seem like it would reduce posts that fall into that bucket when I'm in the mode of "something said something wrong on the internet, it's not central to their post but it's wrong, so let's write a comment pointing out that it's wrong".

The same mechanism could also be used for other classes of comments that you want to have less of. An LLM can easily analyze whether a comment falls into that bucket and then ask the user whether they really want to post the comment.

Jimrandomh's Shortform

ChristianKl15d30

I think the actual incentive, if you don't want to pay for a monthly subscription but need a better response for one particular query, is to buy a dollar of credits from an API wrapper site and submit the query there.

I think only highly technical users would do that. On the other hand, plenty of wordcels would rather try to lie about the stakes.

Viliam's Shortform

ChristianKl15d20

Startups quite often pay less than the person might make working elsewhere and justify that with the promise of equity. The founder then tells the employees a story about the likely value of that oversells the chance that the equity is worth a lot.

Jimrandomh's Shortform

ChristianKl15d20

For GPT-5, smart model means that the model is using more time to answer the query. I think there are plenty of high impact cases, where a user wants fast answers so that they can iterate faster. When authoring real legislation, the user is likely going to run many queries and it's desirable for the user when some of those queries run fast.

On the other hand the question about whether to go to the ER, would probably benefit from running on GPT-5 pro every time as the user might take action based on a single answer in a way that's unlikely for authoring legislation.

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments

Sequences

Posts

Wikitag Contributions

Comments