LESSWRONG
LW

79
eggsyntax
2475Ω20495510
Message
Dialogue
Subscribe

AI safety & alignment researcher

In Rob Bensinger's typology: AGI-alarmed, tentative welfarist, and eventualist (or variabilist for sufficiently long values of 'soon').

I have signed no contracts or agreements whose existence I cannot mention.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
General Reasoning in LLMs
The Eldritch in the 21st century
eggsyntax2d30

Even if the Evil People do not make all the bad things go away, it would still be reassuring.

Do you mean something like, 'Even if defeating the evil people does not make all the bad things go away'?

Reply
MAGA speakers at NatCon were mostly against AI
eggsyntax4d2619

I wholeheartedly agree that it can be worth allying with groups that you don't personally like. That said, I think there's still hope that AI safety can avoid being a strongly partisan-coded issue. Some critical safety issues manage to stay nonpartisan for the long term — eg opposition to the use of chemical weapons and bioweapons is not very partisan-coded in the US (in general, at least; I'm sure certain aspects of it have been partisan-coded at one time or another).

So while I agree that it's worth allying with partisan groups in some ways (eg when advocating for specific legislation), it seems important to consistently emphasize that this is an issue that transcends partisan politics, and that we're just as happy to ally with AI-skeptical elements of the left (eg AI ethics folks) as we are with AI-skeptical elements of the right.

Of course, some individual people may be strongly partisan themselves and only care about building allyships with one side or the other. That's fine! There's no reason why the AI safety community needs to be monolithic on anything but the single issue we're pushing for, that humanity needs to steer clear of catastrophic and existential outcomes from AI.

Reply
MAGA speakers at NatCon were mostly against AI
eggsyntax4d30

It's not paywalled for me. The version I see is this one.

Reply
Your LLM-assisted scientific breakthrough probably isn't real
eggsyntax5d20

Thanks for the reply!

LLMs can't produce or verify novel ideas? 

I think your view here is too strong. For example, there have been papers showing that LLMs come up with ideas that human judges rate as human-level or above in blind testing. I've led a team doing empirical research (described here, results forthcoming) showing that current LLMs can propose and experimentally test hypotheses in novel toy scientific domains.

So while the typical claimed breakthrough isn't real, I don't think we can rule out real ones a priori.

If the target demographic is only LW, I worry that it's trying to have too many audience.

I'm not sure what that means, can you clarify?

Someone coming to this for advice would see the comments from people like me who were critiquing the piece itself, and that would certainly make it less effective.

Maybe? I would guess that people who feel they have a breakthrough are usually already aware that they're going to encounter a lot of skepticism. That's just my intuition, though; I could be wrong.

I'm certainly open to posting it elsewhere. I posted a link to it to Reddit (in r/agi), but people who see it there have to come back here to read it. Suggestions are welcome, and I'm fine with you or anyone else posting it elsewhere with attribution (I'd appreciate getting a link to versions posted elsewhere).

Reply
Your LLM-assisted scientific breakthrough probably isn't real
eggsyntax6d44

this phenomenon has been around for a while

I think that's true, but the addition of LLMs at their current level of capability has added some new dynamics, resulting in a lot of people believing they have a breakthrough who previously wouldn't. For people who aren't intimately familiar with the failure modes of LLMs, it's easy to believe them when they say your work is correct and important — after all, they're clearly very knowledgeable about science. And of course, confirmation bias makes that much easier to fall for. Add to that the tendency for LLMs to be sycophantic, and it's a recipe for a greatly increased number of people (wild guess: maybe an order of magnitude more?) believing they've got a breakthrough.

Reply
Your LLM-assisted scientific breakthrough probably isn't real
eggsyntax6d42

They come up with a clever idea that they can't tell how it fails. So it seems amazing...And intuitively it really really feels like it should work. You can't see any flaw.

I do think this is an important aspect. Turing award winner Tony Hoare once said,

There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.

and I think there's a similar dynamic when people try to develop scientific theories.

Reply
Bogdan Ionut Cirstea's Shortform
eggsyntax9d50

Got it! I misunderstood you.

Reply
Bogdan Ionut Cirstea's Shortform
eggsyntax9d*40

And signaling that they are trying to become the biggest new hyperscaler would be risky for their existing sales: big tech and frontier labs will go even harder for custom chips than they are now.

In the world that I see Bogdan as pointing to, 'when AI becomes much more valuable because it can replace larger portions of human workers', I'd expect revenue from supplying AGIs to strongly outweigh revenue from chip sales.

EDIT: I misunderstood Josh's point.

Reply
Your LLM-assisted scientific breakthrough probably isn't real
eggsyntax9d20

Thanks! I agree with most of what you're saying to one extent or another, but relative to the fairly narrow thing I'm trying to do, I still maintain it's out of scope.

It seems possible that we're imagining very different typical readers. When I look at rejected LW posts that were co-written with LLMs, or posts on r/LLMPhysics, I see problems like values of totally different units being added together ('to the current level of meaningness we add the number of seconds since the Big Bang'). While it's difficult to settle on a fully satisfying notion of validity, I think most people who have done any work in the sciences are likely to agree that something like that is invalid. My main goal here is to provide a first-pass way of helping people identify whether they're doing something that just doesn't qualify as science under any reasonable notion of that. The idea of discouraging a future Feynman is horrifying, but my experience has been that with my suggested prompt, LLMs still do their best to give projects the benefit of the doubt.

Similarly, while my step 2 uses a simplified and limited sense of the scientific method, I think it's really important that people who feel they've made a breakthrough should be thinking hard about whether their ideas are able to make falsifiable predictions that existing theories don't. While there may be some cases around the edges where that's not exactly true — eg as Charlie Steiner suggests, developing a simpler theory that makes the same predictions — the author ought to have least given the issue serious consideration, whereas in many of the instances I've seen that's not the case.

I do strongly encourage people to write better posts on this topic and/or better prompts, and I'll gladly replace this post with a pointer to those when they exist. But currently there's nothing (that I could find), and researchers are flooded with claimed breakthroughs, and so this is my time-bounded effort to improve on the situation as it stood.

Reply
How To Become A Mechanistic Interpretability Researcher
eggsyntax10d20

It was very striking to me how much better my math scholars were six months ago than 12 months ago

s/math/mats/ I presume.

Reply
Load More
3eggsyntax's Shortform
2y
227
137Your LLM-assisted scientific breakthrough probably isn't real
12d
36
95On the functional self of LLMs
Ω
2mo
Ω
35
112Show, not tell: GPT-4o is more opinionated in images than in text
5mo
41
71Numberwang: LLMs Doing Autonomous Research, and a Call for Input
Ω
8mo
Ω
30
94LLMs Look Increasingly Like General Reasoners
10mo
45
30AIS terminology proposal: standardize terms for probability ranges
Ω
1y
Ω
12
219LLM Generality is a Timeline Crux
Ω
1y
Ω
119
159Language Models Model Us
Ω
1y
Ω
55
26Useful starting code for interpretability
2y
2
3eggsyntax's Shortform
2y
227
Load More
Logical decision theories
3 years ago
(+5/-3)