This post has been written in collaboration with Iliad in service of one of Iliad's longer-term goals of understanding the simplicity bias of learning machines. Solomonoff induction is a general theoretical ideal for how to predict sequences that were generated by a computable process. It posits that in order to...
This post has been written in collaboration with Iliad in service of one of Iliad's longer-term goals of understanding the simplicity bias of learning machines. In this post, I give a self-contained treatment, including a proof, of the coding theorem.[1] Let x be a finite binary string and U a...
Recently, in a group chat with friends, someone posted this Lesswrong post and quoted: > The group consensus on somebody's attractiveness accounted for roughly 60% of the variance in people's perceptions of the person's relative attractiveness. I answered that, embarrassingly, even after reading Spencer Greenberg's tweets for years, I don't...
I've recently completed the in-person ARENA program, which is a 5-week bootcamp teaching the basics of safety research engineering (with the 5th week being a capstone project). Sometimes, I talk to people who want to work through the program independently and who ask for advice. Even though I didn't attempt...
TL;DR There has been a lot of discussion on Lesswrong on concerns about deceptive AI, much of which has been philosophical. We have now written a paper that proves that deception is one of two failure modes when using RLHF improperly. It's called “When Your AIs Deceive You: Challenges with...
Epistemic Status: I had the idea for the post a few days ago and quickly wrote it down while on a train. I'm very curious about other perspectives. TL;DR: The recent increased public interest in AI Safety will likely lead to more funding for and more researchers from academia. I...
Andrew Ng writes: > I'd like to have a real conversation about whether AI is a risk for human extinction. Honestly, I don't get how AI poses this risk. What are your thoughts? And, who do you think has a thoughtful perspective on how AI poses this risk that I...