Wiki Contributions


Limerence Messes Up Your Rationality Real Bad, Yo

What was that supplement? Seems like a useful thing to have known if reproducible.

Conversation with Eliezer: What do you want the system to do?

Stampy's QA format might be a reasonable fit, given that we're aiming to become an single point of access for alignment.

FYI: I’m working on a book about the threat of AGI/ASI for a general audience. I hope it will be of value to the cause and the community

Nice! Feel free to use Stampy content, I hereby release you of the BY part of our copyright so you can use it without direct attribution. We're a project by Rob Miles's team working to create a single point of access for learning about alignment (basics, technical details, ecosystem, how to help, etc). By the time you publish we might be a good enough resource that you'll want to point your readers at us at the end of your book for further reading.

Slow motion videos as AI risk intuition pumps

That one requires login to view, which seems like a trivial inconvenience worth avoiding? 

Resources I send to AI researchers about AI safety

Amazing! Would you be happy for some of the content here to be used as a basis for Stampy answers?

AGI Ruin: A List of Lethalities

It's not just non-hand-codable, it is unteachable on-the-first-try because the thing you are trying to teach is too weird and complicated.

I have a terrifying hack which seems like it be possible to extract an AI which would act CEV-like way, using only True Names which might plausibly be within human reach, called Universal Alignment Test. I'm working with a small team of independent alignment researchers on it currently, feel free to book a call with me if you'd like to have your questions answered in real time. I have had "this seems interesting"-style reviews from the highest level people I've spoken to about it.

I failed to raise the idea with EY in 2015 at a conference, because I was afraid to be judged as a crackpot / seeming like I was making an inappropriate status claim by trying to solve too large a part of the problem. In retrospect this is a deeply ironic mistake to have made.

AGI Safety FAQ / all-dumb-questions-allowed thread

This post by the director of OpenPhil argues that even a human level AI could achieve DSA, with coordination.

AGI Safety FAQ / all-dumb-questions-allowed thread

Show a genuine keen interest in the things they have deep models of[1] first before bringing up alignment, unless they invite you to talk first. Steer towards deep conversation with some well-chosen questions[2], but be very open to having it about whatever they know most about rather than AI immediately. At some point, they are likely to ask about what you're interested in[3].

Then you have an A/B tested elevator pitch prepared, and adapted for your specific audience (ideally as few people as possible, which helps to lower the amount of in-the-moment status you need to spend on a brief weird-sounding monologue). Mine usually goes something like:

At some point, we will build artificial systems[4] which are more generally capable[5] than humans. At this point, they will tend to be the main drivers of their own development, resulting in a feedback cycle of recursive self-improvement called the intelligence explosion. What happens in the future will likely be determined by the values of the systems that emerge from this, as there is a global research effort to figure out how to make sure those values include humanity's well-being. I'm trying to contribute to that effort by x.

Then you let them steer and answer their questions as honestly and accurately as you can. If there's a lull in the conversation, bringing their attention to the arc of civilization with accelerating change viewable in their lifetimes eluded to in The Most Important Century and Sapiens is a good filler, but let them lead the conversation and politely (praising them for good questions!) give them quickfire answers. Be sure to flag any question you struggle to answer well for further research and thank them for giving you a question which you don't have an answer to yet. Feel free to post it to Stampy if we're missing it from our canonical questions.

This approach has a very good rate of the other person walking away seeming to take the concerns seriously, and a fairly good rate of people later joining the effort. It does depend on you actually having good answers to hand to their first few "but why don't we just" questions, which means being fairly well read (or watching all of Rob Miles).

  1. ^

    Almost everyone has deep models of something, for example a supermarket worker taught me about logistics and the changes to training needs and autonomy brought on by automation recently. And by learning the 5-10 minute version of everyone's deep models you become more intellectually awesome, which helps for all sorts of things.

  2. ^

    e.g. "What interests you?", or ideally just getting very curious about some aspect of a thing they've spent a lot of time on, usually work, studies, or a hobby.

  3. ^

    This means they're in a receptive state, having been heard and explicitly opened the door to you sharing, so the normal memetic filters are lowered. And if they don't ask you, they're probably not someone who it would help to sell on alignment.

  4. ^

    Intentionally not using the word AI here, so they create a new mental category for the thing I'm describing rather than using their existing sci-fi bucket.

  5. ^

    Intentionally not using the word intelligence here, as that brings up associations which are generally unhelpful (elitism, inadequacy, etc).

AGI Safety FAQ / all-dumb-questions-allowed thread

One of the first priorities of an AI in a takeoff would be to disable other projects which might generate AGIs. A weakly superintelligent hacker AGI might be able to pull this off before it could destroy the world. Also, fast takeoff could be less than months by some people's guess.

And what do you think happens when the second AGI wins, then maximizes the universe for "the other AI was defeated". Some serious unintended consequences, even if you could specify it well.

Load More