LESSWRONG
LW

Writer
184739950
Message
Dialogue
Subscribe

Rational Animations' head writer and helmsman

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
4Writer's Shortform
3y
39
No wikitag contributions to display.
RA x ControlAI video: What if AI just keeps getting smarter?
Writer9d40

This is very late, but I want to acknowledge that the discussion about the UAT in this thread seems broadly correct to me, although the script's main author disagreed when I last pinged him about this in May. And yeah, it was an honest mistake. Internally, we try quite hard to make everything true and not misleading, and the scripts and storyboards go through multiple rounds of feedback. We absolutely do not want to be deceptive. 

Reply1
Writer's Shortform
Writer3mo30

I’m about 2/3 of the way through watching “Orb: On the Movements of the Earth.” It’s an anime about heliocentrism. It’s not the real story of the idea, but it’s not that far off, either. It has different characters and perhaps a slightly different Europe. I was somehow hesitant to start it, but it’s very good! I don’t think I’ve ever watched a series that’s as much about science as this one.

Reply
RA x ControlAI video: What if AI just keeps getting smarter?
Writer4mo70

That's fair, we wrote that part before DeepSeek became a "top lab" and we failed to notice there was an adjustment to make

Reply
RA x ControlAI video: What if AI just keeps getting smarter?
Writer4mo82

It's true that a video ending with a general "what to do" section instead of a call-to-action to ControlAI would have been more likely to stand the test of time (it wouldn't be tied to the reputation of one specific organization or to how good a specific action seemed at one moment in time). But... did you write this because you have reservations about ControlAI in particular, or would you have written it about any other company?

Also, I want to make sure I understand what you mean by "betraying people's trust." Is it something like, "If in the future ControlAI does something bad, then, from the POV of our viewers, that means that they can't trust what they watch on the channel anymore?"

Reply
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Writer7mo20

But the "unconstrained text responses" part is still about asking the model for its preferences even if the answers are unconstrained.

That just shows that the results of different ways of eliciting its values remain sorta consistent with each other, although I agree it constitutes stronger evidence.

Perhaps a more complete test would be to analyze whether its day to day responses to users are somehow consistent with its stated preferences and analyzing its actions in settings in which it can use tools to produce outcomes in very open-ended scenarios that contain stuff that could make the model act on its values.

Reply
Writer's Shortform
Writer7mo50

Thanks! I already don't feel as impressed by the paper as I was while writing the shortform and I feel a little embarrassed for not thinking through things a little bit more before posting my reactions, although at least now there's some discussion under the linkpost so I don't entirely regret my comment if it prompted people to give their takes. I still feel to have updated in a non-negligible way from the paper though, so maybe I'm still not as pessimistic about it as other people. I'd definitely be interested in your thoughts if you find discourse is still lacking in a week or two.

Reply
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Writer7mo60

I'd guess an important caveat might be that stated preferences being coherent doesn't immediately imply that behavior in other situations will be consistent with those preferences. Still, this should be an update towards agentic AI systems in the near future being goal-directed in the spooky consequentialist sense.

Reply
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Writer7mo60

Why?

Reply
Writer's Shortform
Writer7mo*26-14

Surprised that there's no linkpost about Dan H's new paper on Utility Engineering. It looks super important, unless I'm missing something. LLMs are now utility maximisers? For real? We should talk about it: https://x.com/DanHendrycks/status/1889344074098057439

I feel weird about doing a link post since I mostly post updates about Rational Animations, but if no one does it, I'm going to make one eventually.

Also, please tell me if you think this isn't as important as it looks to me somehow.

EDIT: Ah! Here it is! https://www.lesswrong.com/posts/SFsifzfZotd3NLJax/utility-engineering-analyzing-and-controlling-emergent-value thanks @Matrice Jacobine!

Reply
What are the good rationality films?
Answer by WriterNov 21, 202442

The two Gurren Lagann movies cover all the events in the series, and based on my recollection, they should be better animated. Still based on what I remember, the first should have a pretty central take on scientific discovery. The second should be more about ambition and progress, but both probably have at least a bit of both. It's not by chance that some e/accs have profile pictures inspired by that anime. I feel like people here might disagree with part of the message, but I think it does say something about issues we care about here pretty forcefully. (Also, it was cited somewhere in HP: MoR, but for humor.)

Reply
Load More
9AI Sleeper Agents: How Anthropic Trains and Catches Them - Video
11d
0
14How Misaligned AI Personas Lead to Human Extinction – Step by Step
2mo
0
18Rational Animations' video about scalable oversight and sandwiching
2mo
0
10When will AI automate all mental work, and how fast?
3mo
0
100RA x ControlAI video: What if AI just keeps getting smarter?
4mo
18
19Can Knowledge Hurt You? The Dangers of Infohazards (and Exfohazards)
7mo
0
33Our new video about goal misgeneralization, plus an apology
8mo
0
72The King and the Golem - The Animation
10mo
1
144That Alien Message - The Animation
1y
10
10The world is awful. The world is much better. The world can be much better: The Animation.
1y
0
Load More