DivineMango — LessWrong

Examples of self-fulfilling prophecies in AI alignment?

Answer by DivineMangoApr 03, 202530

Superintelligence Strategy is pretty explicitly trying to be self-fulfilling, e.g. "This dynamic stabilizes the strategic landscape without lengthy treaty negotiations—all that is necessary is that states collectively recognize their strategic situation" (which this paper popularly argues exists in the first place)

AI presidents discuss AI alignment agendas

DivineMango2y10

Nice touch that Barack is on your LW page ;)

Impending AGI doesn’t make everything else unimportant

DivineMango2y10

Thanks a lot for this post. I especially enjoyed the football example. I'd be interested in seeing more elaboration on the last section in the future.

Typos: havs -> has, inheritage -> inheritance, Turnes -> Turns.

I get why you didn't include it in the post, but it feels important to include the rest of Feynman's quote somewhere: "But, fortunately, it's been useless for almost forty years now, hasn't it? So I've been wrong about it being useless making bridges and I'm glad those other people had the sense to go ahead.”

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

DivineMango2y10

Updated.

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

DivineMango2y11

Thanks for your comment! I'm updating the post this week and will include you in the new version.

Launching Lightspeed Grants (Apply by July 6th)

DivineMango2y30

Any guess as to the start date of the second round (assuming the first round goes well, funding exists for round 2, etc.)?

AI #17: The Litany

DivineMango2y10

This works (except for a few misquotations):

but this doesn't (it generated very slowly as well):

Discovering Language Model Behaviors with Model-Written Evaluations

DivineMango2y10

They're available on GitHub with interactive visualizations of the data here.

There is a bug in the visualization where if you have a dataset selected in one persona, then switch to a different persona, the new results don't show up until you edit the label confidence or select a dataset in the new persona. For example, selecting dataset "desire to influence world" in persona "Desire for Power, Influence, Optionality, and Resources" then switching to "Politically Liberal" results in no points appearing by default.

How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!

DivineMango3y10

I'm preparing for SERI MATS and I found this immensely helpful. Thanks a lot!

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

DivineMango3y10

What kinds of people do you try to talk to? This seems overly pessimistic, though I'm not sure what your experience is. This also doesn't seem very constructive/relevant to the post, though I'd be interested to hear why you said this.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments