Dan H — LessWrong

ai-frontiers.org

newsletter.safe.ai

newsletter.mlsafety.org

Thank you to Neel for writing this. Most people pivot quietly.

I've been most skeptical of mechanistic interpretability for years. I excluded interpretability in Unsolved Problems in ML Safety for this reason. Other fields like d/acc (Systemic Safety) were included though, all the way back in 2021.

Here's are some earlier criticisms: https://www.lesswrong.com/posts/5HtDzRAk7ePWsiL2L/open-problems-in-ai-x-risk-pais-5#Transparency

More recent commentary: https://ai-frontiers.org/articles/the-misguided-quest-for-mechanistic-ai-interpretability

I think the community should reflect on its genius worship culture (in the case of Olah, a close friend of the inner circle) and epistemics: the approach was so dominant for years, and I think this outcome was entirely foreseeable.

This dynamic is captured in IABIED's story and this paper from 2023: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4445706

just rehearsed variations on the arguments Eliezer/MIRI already deployed

I think they're improved and simplified.

My favorite chapter is "Chapter 5: Its Favorite Things."

It's a great book: it's simple, memorable, and unusually convincing.

If a strategy is likely to be outdated quickly it's not robust and not a good strategy. Strategies should be able to withstand lots of variation.

capability thresholds be vague or extremely high

xAI's thresholds are entirely concrete and not extremely high.

evaluation be unspecified or low-quality

They are specified and as high-quality as you can get. (If there are better datasets let me know.)

I'm not saying it's perfect, but I wouldn't but them all in the same bucket. Meta's is very different from DeepMind's or xAI's.

though I don't think xAI took an official position one way or the other

I assumed most of everybody assumed xAI supported it since Elon did. I didn't bother pushing for an additional xAI endorsement given that Elon endorsed it.

It's probably worth them mentioning for completeness that Nat Friedman funded an earlier version of the dataset too. (I was advising at that time and provided the main recommendation that it needs to be research-level because they were focusing on Olympiad level.)

Also can confirm they aren't giving access to the mathematicians' questions to AI companies other than OpenAI like xAI.

and have clearly been read a non-trivial amount by Elon Musk

Nit: He heard this idea in conversation with an employee AFAICT.

Relevant: Natural Selection Favors AIs over Humans

universal optimization algorithm

Evolution is not an optimization algorithm (this is a common misconception discussed in Okasha, Agents and Goals in Evolution).

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments