LESSWRONG
LW

777
Gabriel Alfour
1942Ω510330
Message
Dialogue
Subscribe

@gabe_cc

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
The Eldritch in the 21st century
Gabriel Alfour3d30

Nice comment.

This deals with a lot of the themes from the follow-up essay, which I expect you may be interested in.

Reply
We are likely in an AI overhang, and this is bad.
Gabriel Alfour14d2-2

Curious what makes you think this.

Because there is a reason for why Cursor and Claude Code exist. I'd suggest looking at what they do for more details.


METR is not in the business of building code agents. Why is their work informing so much of your views on the usefulness Cursor or Claude Code?

This is literally the point I make above.

Either you fail to capture the relevant capabilities and build unwarranted confidence that things are ok, or you are doing public competitive elicitation & amplification work.

Reply
We are likely in an AI overhang, and this is bad.
Gabriel Alfour15d20

(I can't really tell if this post is trying to argue the overhang is increasing or just that there is some moderately sized overhang ongoingly.)

It has increased on some axes (companies are racing as fast as they can and the capital and research is by far LONG scaling), and reduced on some others (low-hanging fruits get plucked first).

 

The main point is that it is there and consistently under-estimated.

For instance, there are still massive returns to spending an hour on learning and experimenting with prompt engineering techniques. Let alone more advanced approaches.

This thus leads to a bias of over-estimating the safety of our systems, except if you expect that our evaluators are better elicitators than not only existing AI research engineers, but like, the ones over the next two, five or ten years.

Reply
We are likely in an AI overhang, and this is bad.
Gabriel Alfour15d149

Code Agents (Cursor or Claude Code) are much better at performing code tasks than their fine-tune equivalent, mainly because of the scaffolding.


When I told you that we should not put 4% of the global alignment spending budget in AI Village, you asked me if I thought METR should also not get as much funding as it does.

It should now be more legible why.

From my point of view, both of AI Village and METR, on top of not doing the straightforward thing of advocating for a pause, are bad on their own terms.

Either you fail to capture the relevant capabilities and build unwarranted confidence that things are ok, or you are doing public competitive elicitation & amplification work.

Reply
We are likely in an AI overhang, and this is bad.
Gabriel Alfour15d31

What is an example of something useful you think could in theory be done with current models but isn't being elicited in favor of training larger models?

Better prompt engineering, fine-tuning, interpretability, scaffolding, sampling.

Fast-forward button

I think you are may be failing to make an argument along the line of "But people are already working on this! Markets are efficient!"

To which my response is "And thus 3 years from now, we'll know what to do with models much more than we do, even if you personally can't come up with an example now. The same way we now know what to do with models much more than 3 years ago."

Except if you expect this not be the case, like by having directly worked on juicing models or followed people who have done so and failed, you shouldn't really expect your failure to come up with such examples to be informative.

Reply
A Lens on the Sharp Left Turn: Optimization Slack
Gabriel Alfour20d71

If you are on Paul/Quentin side, "lots of slack" would be enough to concede, but they do not think there's lots of slack.

If you are on Eliezer/Nate side, "little slack" is far from enough to concede: it's about whether humanity can and will do something with that slack.

So this is not a crux.

Nevertheless, this concept could help prevent a very common failure mode in the debate.

Namely, at any point in the debate, either side could ask "Are you arguing that there is lots/little slack, that we are willing/unwilling to use that slack, or that we are able/unable to use that slack?", which I expect could clear some amount of talking past each other.

Reply1
The Eldritch in the 21st century
Gabriel Alfour20d20

A simple example where understanding an underlying problem doesn't solve the problem: I understand fairly well why I'm tempted to eat too many potato chips, and why this is bad for me, and what I could do instead. And yet, sometimes I still eat more potato chips than I intend.

This is a great example.

Some people, specifically thanks to their better understanding of themselves, do not find themselves eating more potato chips than they intend.

There is more.

Reply
The Eldritch in the 21st century
Gabriel Alfour1mo32

I believe...

  • People and society are largely well calibrated. People who are deemed (by themselves or society) to be bad at maths, at sports, at arts, etc. are usually bad at them.
  • People and society are not perfectly calibrated.
  • People are sometimes under-confident in their abilities. This is often downstream of them lacking confidence.
  • People are sometimes over-confident in their abilities. This is often downstream of them being too confident.

Our society does seem to inculcate in its members the idea that certain things are only for super-smart people to do, and whoever you are, you are not smart enough to do an impactful thing.

Most people would fail at passing the bar and the USMLE. This is why most people do not attempt them, and this is why our society tells them not to.

I believe it is load bearing, but in the straightforward way: it would be catastrophic if everyone tried to study things far beyond their abilities and wasted their time.

Reply
The Eldritch in the 21st century
Gabriel Alfour1mo50

Most people have no hope of understanding complex topics.

No. Strongly disagree. "Most people don't understand X" is a thing I could accept, but "most people can't understand X" is usually false, with only rare exceptions.

 

You are confusing "Most people can't understand X." with "Most people have no hope of understanding X.". Only the latter matter for the psychological toll it has on people.

Hopelessness might be warranted or not, but it's there.

---

Separately, I believe that quite often, their hopelessness is warranted.

Everyone hits their ceilings.

I know many mathematically talented people who struggle to express themselves in ways that are legible to others, or to move their body in a natural way. They will get better if they train, but it's pretty clear to them and everyone else that their ceiling is low.

In general, I know many people talented [at a field] with clear limitations in [some other field]. Arts, maths, oral expression, style, empathy, physical strength, body awareness, and so on.

Over time, they learn to acknowledge their talents and their limitations.

Reply
The Eldritch in the 21st century
Gabriel Alfour1mo53

Nope.

I meant that even if the Evil People are Evil, and thus decide to not make all the bad things go away, the fact that they could make them go away is reassuring in itself.

I should have been clearer.

(I have edited it with the hope of making it clearer. Thanks.)

Reply1
Load More
55We are likely in an AI overhang, and this is bad.
16d
16
5How people politically confront the Modern Eldritch
20d
0
161The Eldritch in the 21st century
1mo
38
47Three main views on the future of AI
1mo
1
21The Gabian History of Mathematics
1mo
9
61When Money Becomes Power
2mo
16
8Morality, Values and Trade-Offs
2mo
2
-1Mind Conditioning
2mo
0
25About Stress
2mo
0
11The Ideological Spiral
3mo
1
Load More