LESSWRONG
LW

783
MichaelDickens
1614112740
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2MichaelDickens's Shortform
4y
139
No wikitag contributions to display.
Mikhail Samin's Shortform
MichaelDickens3d40

Importantly, AFAICT some Horizon fellows are actively working against x-risk (pulling the rope backwards, not sideways). So Horizon's sign of impact is unclear to me. For a lot of people, "tech policy going well" means "regulations that don't impede tech companies' growth".

Reply
Mikhail Samin's Shortform
MichaelDickens3d20

As in, Horizon fellows / people who work at Horizon?

Reply
Mikhail Samin's Shortform
MichaelDickens3d20

What leads to you believe this?

FWIW this is also my impression but I'm going off weak evidence (I wrote about my evidence here), and Horizon is pretty opaque so it's hard to tell. A couple weeks ago I tried reaching out to them to talk about it but they haven't responded.

Reply
MichaelDickens's Shortform
MichaelDickens5d20

I think so, yeah. I think my probability of the next model being catastrophically dangerous is a bit higher than it was a year ago, mainly because the IMO gold medal result and similar improvements on models' ability to reason on hard problems. An argument in the other direction is that the more data points you have along a capabilities curve, the more confident you can be that your model of the curve is accurate, although on balance I think this is probably outweighed by the fact that we are now closer to AGI than we were a year ago.

Reply
MichaelDickens's Shortform
MichaelDickens5d30

The next-gen LLM might pose an existential threat

I'm pretty sure that the next generation of LLMs will be safe. But the risk is still high enough to make me uncomfortable.

How sure are we that scaling laws are correct? Researchers have drawn curves predicting how AI capabilities scale based on how much goes into training them. If you extrapolate those curves, it looks like the next level of LLMs won't be wildly more powerful than the current level. But maybe there's a weird bump in the curve that happens in between GPT-5 and GPT-6 (or between Claude 4.5 and Claude 5), and LLMs suddenly become much more capable in a way that scaling laws didn't predict. I don't think we can be more than 99.9% confident that there's not.

How sure are we that current-gen LLMs aren't sandbagging (that is, deliberately hiding their true skill level)? I think they're still dumb enough that their sandbagging can be caught, and indeed they have been caught sandbagging on some tests. I don't think LLMs are hiding their true capabilities in general, and our understanding of AI capabilities is probably pretty accurate. But I don't think we can be more than 99.9% confident about that.

How sure are we that the extrapolated capability level of the next-gen LLM isn't enough to take over the world? It probably isn't, but we don't really know what level of capability is required for something like that. I don't think we can be more than 99.9% confident.

Perhaps we can be >99.99% that the extrapolated capability of the next-gen LLM is still not as smart as the smartest human. But an LLM has certain advantages over humans—it can work faster (at least on many sorts of tasks), it can copy itself, it can operate computers in a way that humans can't.

Alternatively, GPT-6/Claude 5 might not be able to take over the world, but it might be smart enough to recursively self-improve, and that might happen too quickly for us to do anything about.

How sure are we that we aren't wrong about something else? I thought of three ways we could be disastrously wrong:

  1. We could be wrong about scaling laws;
  2. We could be wrong that LLMs aren't sandbagging;
  3. We could be wrong about what capabilities are required for AI to take over.

But we could be wrong about some entirely different thing that I didn't even think of. I'm not more than 99.9% confident that my list is comprehensive.

On the whole, I don't think we can say there's less than a 0.4% chance that the next-gen LLM forces us down a path that inevitably ends in everyone dying.

Reply1
Reasons to sign a statement to ban superintelligence (+ FAQ for those on the fence)
MichaelDickens7d40

As I understand, this is how scientific bodies' position statements get written. Scientists do not universally agree about the facts in their field, but they iterate on the statement until none of the signatories have any major objections.

Reply
Sublinear Utility in Population and other Uncommon Utilitarianism
MichaelDickens7d40

I have a strong intuition that this isn't how it works:

When I have a positive experience, it is readily apparent to me that the experience is positive, and no amount of argument can convince me that actually I didn't enjoy myself.

Suppose I did something that I quite enjoyed, and then Omega came up to me and said "actually somebody else last week (or a simulation of you, or whatever) already experienced those exact same qualia, so your qualia weren't that valuable." I'd say, sorry Omega, that is wrong, my experience was good regardless of whether it had already happened before. I know it was good because I directly experienced its goodness.

Reply
davekasten's Shortform
MichaelDickens9d212

Perhaps I'm misunderstanding your objection but I think the issue is that Claude is hosted on AWS servers (among other places), which means Amazon could steal Claude's model weights if it wanted to, and ASL-3 states that Claude needs to be secure against theft by other companies (including Amazon).

I don't know for sure that Zach's assertion is true, but I'm reasonably confident that a dedicated Amazon security team could steal the contents of any AWS server if they really wanted to.

Reply
Patience and Willingness to Be Slow
MichaelDickens10d84

The great thing is outside university nobody cares about how fast I can apply the gauss algorithm. It's just important that I know when to use it.

This particular fact sounds right but I think the generalization is often wrong. At my first software development job, I learned more slowly than my peers, and I took longer than usual to get promoted from entry-level to mid-level. This had a real material impact on my earnings and therefore how much money I could donate. It would have been better for the world if I had been able to learn faster.

But I still basically agree with the lesson, as I understand it: trying to go fast is overrated. I don't think "try to go fast" would've helped me at all. (In fact I often was trying to go fast, and it didn't help.)

Reply1
Kabir Kumar's Shortform
MichaelDickens10d30

Yeah I agree that it's not a good plan. I just think that if you're proposing your own plan, your plan should at least mention the "standard" plan and why you prefer to do something different. Like give some commentary on why you don't think alignment bootstrapping is a solution. (And I would probably agree with your commentary.)

Reply
Load More
64Outlive: A Critical Review
4mo
4
9How concerned are you about a fast takeoff due to a leap in hardware usage?
Q
4mo
Q
7
29Why would AI companies use human-level AI to do alignment research?
6mo
8
16What AI safety plans are there?
6mo
3
7Retroactive If-Then Commitments
9mo
0
5A "slow takeoff" might still look fast
3y
3
2How much should I update on the fact that my dentist is named Dennis?
Q
3y
Q
3
15Why does gradient descent always work on neural networks?
Q
3y
Q
11
2MichaelDickens's Shortform
4y
139
19How can we increase the frequency of rare insights?
5y
10
Load More