Raemon

I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

Sequences

The Coordination Frontier
Privacy Practices
The LessWrong Review
Keep your beliefs cruxy and your frames explicit
LW Open Source Guide
Tensions in Truthseeking
Project Hufflepuff
Rational Ritual
Drawing Less Wrong

Comments

ShowMeTheProbability's Shortform

Becoming capable of building such a test is essentially the entire field of AI alignment. (yes, we don't have the ability to build such a test and that's bad, but the difficulty lives in the territory. MIRI's previously stated goal were specifically to become less confused)

chinchilla's wild implications

Yeah a few people have also brought up this concern recently. Will think about it.

chinchilla's wild implications
Raemon3dΩ34-2

Something I'm unsure about (commenting from my mod-perspective but not making a mod pronouncement) is how LW should relate to posts that lay out ideas that may advance AI capabilities. 

My current understanding is that all major AI labs have already figured out the chinchilla results on their own, but that younger or less in-the-loop AI orgs may have needed to run experiments that took a couple months of staff time. This post was one of the most-read posts on LW this month, and shared heavily around twitter. It's plausible to me that spreading these arguments plausibly speeds up AI timelines by 1-4 weeks on average.

It seems important to be able to talk about that and model the world, but I'm wondering if posts like this should live behind a "need to log-in" filter, maybe with a slight karma-gate, so that the people who end up reading it are at least more likely to be plugged into the LW ecosystem and are also going to get exposed to arguments about AI risk.

nostalgiabraist, I'm curious how you would feel about that.

Humans provide an untapped wealth of evidence about alignment
Raemon3dΩ7100

Curated. I'm not sure I endorse all the specific examples, but the general principles make sense to me as considerations to help guide alignment research directions.

Abstracting The Hardness of Alignment: Unbounded Atomic Optimization
Raemon3dΩ7103

FYI, I've found this concept useful in thinking, but I think "atomic" is a worse word than just saying "non-interruptible". When I'm explaining this to people I just say "unbounded, uninterruptible optimization". The word atomic only seems to serve to make people say "what's that?" and then I say "uninterruptible"

What is an agent in reductionist materialism?
Answer by RaemonAug 13, 202262

My impression is that a ton of work at MIRI (and some related research lines in other places) went into answering this question, and indeed, no one knows the answer very crisply right now and yup that's alarming. 

See John Wentworth's post on Why Agent Foundations? An Overly Abstract Explanation, which discusses the need to find the True Name of agents.

(Also, while I agree agents are "more mysterious than rocks or The Odyssey", I'm actually confused why the circularity is particularly the problem here. Why doesn't the Odyssey also run into the Abstraction for Whom problem?)

DeepMind alignment team opinions on AGI ruin arguments

Not sure if this was intentional or not, but the post title and opening paragraph both saying "DM" instead of "DeepMind" seemed kinda confusing. 

Seriously, what goes wrong with "reward the agent when it makes you smile"?

I also found this a good exercise in deliberate questioning/boggling.

From 2010-2014, when I was first forming my opinions on AI, it was really frustrating that anyone who objected to the basic AI arguments just... clearly hadn't been paying attention and at all and didn't understand the basic arguments. 

Perfect Predictors
Answer by RaemonAug 12, 202240

My understanding is that perfect predictor decision-thought-experiments are just simplified versions of "pretty good predictor thought experiments." Like, if you. know that Omega is 90% accurate, it's probably still better to one-box.

Introducing Pastcasting: A tool for forecasting practice

Minor bug/sadness report: I did the first question, then it prompted me to log in before revealing the answer, and AFAICT the question-answer didn't get recorded and now I can't find that question again and am curious and sad. 

Load More