LESSWRONG
LW

DivineMango
2032180
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Examples of self-fulfilling prophecies in AI alignment?
Answer by DivineMangoApr 03, 202530

Superintelligence Strategy is pretty explicitly trying to be self-fulfilling, e.g. "This dynamic stabilizes the strategic landscape without lengthy treaty negotiations—all that is necessary is that states collectively recognize their strategic situation" (which this paper popularly argues exists in the first place)

Reply
AI presidents discuss AI alignment agendas
DivineMango2y10

Nice touch that Barack is on your LW page ;)

Reply
Impending AGI doesn’t make everything else unimportant
DivineMango2y10

Thanks a lot for this post. I especially enjoyed the football example. I'd be interested in seeing more elaboration on the last section in the future.

Typos: havs -> has, inheritage -> inheritance, Turnes -> Turns.

I get why you didn't include it in the post, but it feels important to include the rest of Feynman's quote somewhere: "But, fortunately, it's been useless for almost forty years now, hasn't it? So I've been wrong about it being useless making bridges and I'm glad those other people had the sense to go ahead.”

Reply
Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
DivineMango2y10

Updated.

Reply
Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
DivineMango2y11

Thanks for your comment! I'm updating the post this week and will include you in the new version.

Reply
Launching Lightspeed Grants (Apply by July 6th)
DivineMango2y30

Any guess as to the start date of the second round (assuming the first round goes well, funding exists for round 2, etc.)?

Reply
AI #17: The Litany
DivineMango2y10

This works (except for a few misquotations):

but this doesn't (it generated very slowly as well):

Reply
Discovering Language Model Behaviors with Model-Written Evaluations
DivineMango2y10

They're available on GitHub with interactive visualizations of the data here.

There is a bug in the visualization where if you have a dataset selected in one persona, then switch to a different persona, the new results don't show up until you edit the label confidence or select a dataset in the new persona. For example, selecting dataset "desire to influence world" in persona "Desire for Power, Influence, Optionality, and Resources" then switching to "Politically Liberal" results in no points appearing by default.

Reply
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
DivineMango2y10

I'm preparing for SERI MATS and I found this immensely helpful. Thanks a lot!

Reply
Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
DivineMango2y10

What kinds of people do you try to talk to? This seems overly pessimistic, though I'm not sure what your experience is. This also doesn't seem very constructive/relevant to the post, though I'd be interested to hear why you said this.

Reply
Load More
264Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
2y
54
89Bruce Wayne and the Cost of Inaction
3y
5
8Approaches for collecting and analyzing data about yourself?
Q
6y
Q
4
9How can I reframe my study motivation?
Q
6y
Q
6