MichaelDickens — LessWrong

Where I Am Donating in 2025

Thanks, I didn't see the page on how matching works. I edited that section to fix it.

My first draft actually said something more accurate, but then I edited it to be worse :/ I have edited it back to say something more like the original version.

The shortform of Ole Q Doc

MichaelDickens16h20

I think the term means something like "you demonstrated truth-seeking character/virtue".

Example: Someone (I forget who it was, sorry) came up with a novel AI alignment theory, and then they wrote a long post about how their own theory was deeply flawed. That post earned them Bayes points.

Claude Opus 4.5: Model Card, Alignment and Safety

MichaelDickens1d30

In terms of surviving superintelligence, it’s still the scene from The Phantom Menace. As in, that won’t be enough.

Are you talking about the scene near the beginning where the Neimoidians send some battle droids to kill the Jedi and all the droids die in 4.5 seconds flat?

Edit: I looked up the script and I was close but not quite right. "That won't be enough" is a line spoken by one of the Neimoidians after locking the doors to the bridge to keep the Jedi out. Qui-Gon was poised to cut through the door in 4.5 seconds flat until they closed the blast doors.

We won't solve non-alignment problems by doing research

MichaelDickens2d40

Good callout. I was glad to hear that Ilya is thinking about all sentient life and not just humans.

I didn't interpret it to mean that he's working on thing 1. The direct quote was

I think in particular, there’s a case to be made that it will be easier to build an AI that cares about sentient life than an AI that cares about human life alone, because the AI itself will be sentient. And if you think about things like mirror neurons and human empathy for animals, which you might argue it’s not big enough, but it exists. I think it’s an emergent property from the fact that we model others with the same circuit that we use to model ourselves, because that’s the most efficient thing to do.

Sounds to me like he expects an aligned AI to care about all sentient beings, but he isn't necessarily working on making that happen. AFAIK Ilya's new venture hasn't published any alignment research yet, so we don't know what exactly he's working on.

We won't solve non-alignment problems by doing research

MichaelDickens4d20

The problem of coordinating on AI development isn't the same thing as solving the alignment problem, but it's not the thing I'm pointing at in this post because it's still about avoiding misalignment.

We won't solve non-alignment problems by doing research

MichaelDickens6d60

I plan on writing something longer about this in the future but people use "alignment" to refer to two different things, basically thing 1 is "ASI solves ethics and then behaves ethically" and thing 2 is "ASI does what people want it to do". Approximately nobody is working on thing 1, only on thing 2, and thing 2 doesn't get us a solution to non-alignment problems.

What Is The Basin Of Convergence For Kelly Betting?

MichaelDickens10d110

For constant-relative-risk-aversion (CRRA) utility functions, the Kelly criterion is optimal iff you have logarithmic utility. For proof, see Samuelson (1971), The "Fallacy" of Maximizing the Geometric Mean in Long Sequences of Investing or Gambling.

I think there is only a fixed-proportion betting rule (i.e. "bet P% of your bankroll" for fixed P) for CRRA utility functions, because if risk aversion varies then the betting rule must also vary. But I'm not sure how to prove that.

ETA: Actually I think it shouldn't be too hard to prove that using the definition of CRRA. You could do something like, assume a fixed-proportion betting rule exists for some constant P, and then calculate the implied relative risk aversion and show that it must be a constant.

Hastings's Shortform

MichaelDickens11d40

There is a harder second-order question of "what sorts of videos maximize watch time, and will those be bad for my child?" Hastings's evidence points toward "yes", but I don't think the answer is obvious a priori. (The things YouTube thinks I want to watch are almost all good or neutral for me; YMMV.)

Knowing Whether AI Alignment Is a One-Shot Problem Is a One-Shot Problem

MichaelDickens11d20

Oops! Fixed.

Your Clone Wants to Kill You Because You Assumed Too Much

MichaelDickens12d20

For posterity, I would just like to make it clear that if I were ever cloned, I would treat my clone as an equal, and I wouldn't make him do things I wouldn't do—in fact I wouldn't try to make him do anything at all, we'd make decisions jointly.

(But of course my clone would already know that, because he's me.)

(I've spent an unreasonable amount of time thinking about how to devise a fair decision procedure between me and my clone to allocate tasks and resources in a perfectly egalitarian way.)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments