Neel Nanda


My Overview of the AI Alignment Landscape

Wiki Contributions


It’s Probably Not Lithium

I pretty strongly disagree with your takes in that Twitter thread (though agree on the object level that you offered <$1000 too). I think the core point of "we want to find the truth here, and creating incentives against finding the truth seems not worth it if it could damage truth seeking" seems obviously reasonable. This incentive exists even at smaller amounts of money. 

I don't think things like pre-registration are good solutions to this - a clinical trial can maybe register clear rules to follow and a success criteria, in a way that "try to do research + literature reviews on a hard question and understand it better" can't (Also, clinical trials obviously have major biases due to publication incentives which makes their research much lower quality).

In order for not taking the bet to be a meaningful downwards update, you need to be able to fully decouple "making an epistemic bet given your best guess about reality" from all of the second-order effects of how the bet affects other relevant things. I think the response "I am not capable of this decoupling, and so refusing this bet is not a strong statement about my epistemic beliefs or confidence in them" is extremely reasonable.

Announcing the LessWrong Curated Podcast

I'm super excited about this! I find it much lower effort to consume audio content than text, and am a big fan of the SSC podcast. I expect this to significantly increase the number of curated LW posts I read

Where I agree and disagree with Eliezer

I broadly agree with this much more than Eliezer's and think this did a good job of articulating a bunch of my fuzzy "this seems off". Most notably, Eliezer underrating the Importance and tractability of interpretability, and overrating the discontinuity of AI progress

Intentionally Making Close Friends

Thanks a lot for the kind words! Good luck :)

Is AI Progress Impossible To Predict?

How much does this change when you remove the big outlier in the top left?

A Longlist of Theories of Impact for Interpretability

Note that for interpretability to give you information on where you are relative to your competitors, you both need the tools to exist, and for AI companies to use the tools and publicly release the results. It's pretty plausible to me that we get the first but not the second!

A Longlist of Theories of Impact for Interpretability

Honestly, I don't understand ELK well enough (yet!) to meaningfully comment. That one came from Tao Lin, who's a better person to ask.

Beyond micromarriages

Great point! (Though I may just enjoy this post because of the beautiful pun of Quality Adjusted Wife Years and microwives... Sadly I can't think of a similarly good pun with flipped gender roles!)

[Beta Feature] Google-Docs-like editing for LessWrong posts
(c) Google Docs does not support LaTeX

I'd make this point much more prominent, I almost missed it! This point changes the feature from 'likely much worse than Google Docs + copying later' to 'actually probably worth using' for me

Load More