I have indeed been publicly advocating against the inside game strategy at labs for many years (going all the way back to 2018), predicting it would fail due to incentive issues and have large negative externalities due to conflict of interest issues. I could dig up my comments, but I am confident almost anyone who I've interfaced with at the labs, or who I've talked to about any adjacent topic in leadership would be happy to confirm.

Reply

Stephen Fowler's Shortform

habryka41m20

Oh, weird. I always thought "ETA" means "Edited To Add".

Reply

1

simeon_c's Shortform

habryka3h30

Sure, I'll try to post here if I know of a clear opportunity to donate to either.

Reply

1

Stephen Fowler's Shortform

habryka5h142

I would be happy to defend roughly the position above (I don't agree with all of it, but agree with roughly something like "the strategy of trying to play the inside game at labs was really bad, failed in predictable ways, and has deeply eroded trust in community leadership due to the adversarial dynamics present in such a strategy and many people involved should be let go").

I do think most people who disagree with me here are under substantial confidentiality obligations and de-facto non-disparagement obligations (such as really not wanting to imply anything bad about Anthropic or wanting to maintain a cultivated image for policy purposes) so that it will be hard to find a good public debate partner, but it isn't impossible.

Reply

2

1

DeepMind's "Frontier Safety Framework" is weak and unambitious

habryka15h2714

The document doesn't specify whether "deployment" includes internal deployment. (This is important because maybe lots of risk comes from the lab using AIs internally to do AI development.)

This seems like such an obvious and crucial distinction that I felt very surprised when the framework didn't disambiguate between the two.

Reply

simeon_c's Shortform

habryka16h116

Yeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).

Reply

Is there a place to find the most cited LW articles of all time?

Answer by habrykaMay 17, 202460

We don't have a live count, but we have a one-time analysis from late 2023: https://www.lesswrong.com/posts/WYqixmisE6dQjHPT8/2022-and-all-time-posts-by-pingback-count

My guess is not much has changed since then, so I think that's basically the answer.

Reply

1

Is there a place to find the most cited LW articles of all time?

habryka2d20

What do you mean by "cited"? Do you mean "articles references in other articles on LW" or "articles cited in academic journals" or some other definition?

Reply

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

habryka2d77

I am quite interested in takes from various people in alignment on this agenda. I've engaged with both Davidad's and Bengio's stuff a bunch in the last few months, and I feel pretty confused (and skeptical) about a bunch of it, and would be interested in reading more of what other people have to say.

Reply

MATS Winter 2023-24 Retrospective

habryka3d95

This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).

The second hypothesis here seems much more likely (and my guess is your mentors would agree). My guess is after properly controlling for that you would find a mild to moderate negative correlation here.

But also, more importantly, the set of scholars from which MATS is drawing is heavily skewed towards the kind of person who would work at scaling labs (especially since funding has been heavily skewing towards funding the kind of research that can occur at scaling labs).

Reply