yams — LessWrong

LESSWRONG
LW

yams — LessWrong

Unless its governance changes, Anthropic is untrustworthy

Thanks as always to Zac for continuing to engage on things like this.

Tiny nit for my employer: should probably read “including some* MIRI employees”

like any org, MIRI is made up of people that have significant disagreements with one another on a wide variety of important matters.

More than once I’ve had it repeated to me that ‘MIRI endorses y’, and tracked the root of the claim to a lack of this kind of qualifier. I know you mean the soft version and don’t take you to be over-claiming; unfortunately, experience has shown it’s worth clarifying, even though for most claims in most contexts I’d take your framing to be sufficiently clear.

New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

yams15d40

I'm struck by how many of your cruxes seem like things that it would actually just be in the hands of the international governing body to control. My guess is, if DARPA has a team of safety researchers, and they go to the international body, and they're like 'we're blocked by this set of experiments* that takes a large amount of compute; can we please have more compute?', and then the international body gets some panel of independent researchers to confirm that this is true, and the only solution is more compute for that particular group of researchers, they commission a datacenter or something so that the research can continue.

Like, it seems obviously true to me that people (especially in government/military) will continue working on the problem at all, and that access to larger amounts of resources for doing that work is a matter of petitioning the body. It feels like your plan is built around facilitating this kind of carveout, and the MIRI plan is built around treating it as the exception that it is (and prioritizing gaining some centralized control over AI as a field over guaranteeing to-me-implausible rapid progress toward the best possible outcomes).

*which maybe is 'building automated alignment researchers', but better specified and less terrifying

How likely is dangerous AI in the short term?

yams24d20

AIs with reliable 1-month time horizons will basically not be time-horizon-limited in any way that humans aren't

In this statement, are you thinking about time horizons as operationalized / investigated in the METR paper, or are you thinking about the True Time Horizon?

yams's Shortform

yams24d50

A group of researchers has released the Longitudinal Expert AI Panel, soliciting and collating forecasts regarding AI progress, adoption, and regulation from a large pool of both experts and non-experts.

Heroic Responsibility

yams1mo40

Didn’t disagree vote myself, but I think there’s a linguistic pattern of ‘just asking questions’ that is used to signal disagreement while also evading interrogation yourself. At first glance, your comment may be reading that way to others, who then hastily smash the disagree button to signal disagreement with the position they think you’re implying (even though you were really genuinely just asking questions).

I see this happen a lot, where folks mismodel someone’s epistemic state or tacking when, really, the person is just confused and trying to explicate the conditions of their confusion. In the broader world, claiming to be confused about something is a common tactic for trying to covertly convince someone of your position.

Noah Birnbaum's Shortform

yams1mo90

Duncan Sabien once ran the inverse experiment. He made a separate account to see how his posts would do without his reputation. The account only has one post still up, but iirc there used to be many more (tens). They performed similarly well to posts under his own name. Cool idea!

[plausibly I'm getting parts of the story wrong and someone who was around then will correct me]

Guys I might be an e/acc

yams1mo30

I think that I’d do this math by net QUALYs and not net deaths. My guess is doing it that way may actually change your result.

I’m not trying to avoid dying; I’m trying to steer toward living.

Which side of the AI safety community are you in?

yams1mo412

Yup! I just think there’s an unbounded way that a reader could view his comment: “oh! There are no current or future consequences at OAI for those who sign this statement!”

…and I wanted to make the bound explicit: real protections, into the future, can’t plausibly be offered, by anyone. Surely most OAI researchers are thinking ahead enough to feel the pressure of this bound (whether or not it keeps them from signing).

I’m still glad he made this comment, but the Strong Version is obviously beyond his reach to assure.

Which side of the AI safety community are you in?

yams1mo32

This is good!

My guess is that their hesitance is also linked to potential future climates, though, and not just the current climate, so I don’t expect additional signees to come forward in response to your assurances.

The IABIED statement is not literally true

yams2mo40

I think my crux is ‘how much does David’s plan resemble the plans labs actually plan to pursue?’

I read Nate and Eliezer as baking in ‘if the labs do what they say they plan to do, and update as they will predictably update based on their past behavior and declared beliefs’ to all their language about ‘the current trajectory’ etc etc.

I don’t think this resolves ‘is the tittle literally true’ in a different direction if it’s the only crux, and agree that this should have been spelled out more explicitly in the book (e.g. ‘in detail, why are the authors pessimistic about current safety plans’) from a pure epistemic standpoint (although think it was reasonable to omit from a rhetorical standpoint, given the target audience) and in various Headline Sentences throughout the book, and The Problem.

One generous way to read Nate and Eliezer here is to say ‘current techniques’ is itself intending to bake in ‘plans the labs currently plan to pursue’. I was definitely reading it this way, but think it’s reasonable for others not to. If we read it that way, and take David’s plan above to be sufficiently dissimilar from real lab plans, then I think the title’s literal interpretation goes through.

[your post has updated me from ‘the title is literally true’ to ‘the title is basically reasonable but may not be literally true depending on how broadly we construe various things’, which is a significantly less comfortable position!]

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments