Staying Sane While Taking Ideas Seriously


I can confirm that Nate is not backdating memories—he and Eliezer were pretty clear within MIRI at the time that they thought Sam and Elon were making a tremendous mistake and that they were trying to figure out how to use MIRI's small influence within a worsened strategic landscape.

You were paying more attention than me (I don't follow anyone who engages with him a lot, so I maybe saw one of his tweets a week). I knew of him as someone who had been right early about COVID, and I also saw him criticizing the media for some of the correct reasons, so I didn't write him off just because he was obnoxious and a crypto fanatic.

The interest rate thing was therefore my Igon Value moment.

Balaji treating the ratio between 0.1% interest and 4.75% interest as deeply meaningful is so preposterous that I'm going to stop paying attention to anything he says from here on out.

I can imagine this coming from the equivalent of "adapt someone else's StackOverflow code" level capability, which is still pretty impressive. 

In my opinion, the scariest thing I've seen so far is coding Game Of Life Pong, which doesn't seem to resemble any code GPT-4 would have had in its training data. Stitching those things together means coding for real for real.

Sam's real plan for OpenAI has never changed, and has been clear from the beginning if you knew about his and Elon's deep distrust of DeepMind:

  1. Move fast, making only token efforts at incorporating our safety team's work into our capabilities work, in order to get way ahead of DeepMind. (If that frustration makes our original safety team leave en masse, no worries, we can always hire another one.)
  2. Maybe once we have a big lead, we can figure out safety.

Kudos for talking about learning empathy in a way that seems meaningfully different and less immediately broken than adjacent proposals.

I think what you should expect from this approach, should it in fact succeed, is not nothing- but still something more alien than the way we empathize with lower animals, let alone higher animals. Consider the empathy we have towards cats... and the way it is complicated by their desire to be a predator, and specifically to enjoy causing fear/suffering. Our empathy with cats doesn't lead us to abandon our empathy for their prey, and so we are inclined to make compromises with that empathy.

Given better technology, we could make non-sentient artificial mice that are indistinguishable by the cats (but their extrapolated volition, to some degree, would feel deceived and betrayed by this), or we could just ensure that cats no longer seek to cause fear/suffering.

I hope that humans' extrapolated volitions aren't cruel (though maybe they are when judged by Superhappy standards). Regardless, an AI that's guaranteed to have empathy for us is not guaranteed, and in general quite unlikely, to have no other conflicts with our volitions; and the kind of compromises it will analogously make will probably be larger and stranger than the cat example.

Better than paperclips, but perhaps missing many dimensions we care about.

Very cool! How does this affect your quest for bounded analogues of Löbian reasoning?

I used to believe, as do many Christians, that an open-hearted truthseeker will become convinced of the existence of the true God once they are exposed. To say otherwise makes missionary work seem rather manipulative (albeit still important for saving souls). More importantly, the principle is well attested in Christian thought and in the New Testament (Jesus with Nicodemus, Paul with the Athenians, etc).

There are and have been world religions that don't evangelize because they don't have the same assumption, but Christianity in particular is greatly wounded if that assumption proves false.

I have not read the book but I think this is exactly wrong, in that what happens after the ??? step is that shareholder value is not maximized.

I think you misinterpreted the book review: Caroline was almost surely making a Underpants Gnomes reference, which is used to indicate that the last thing does not follow in any way from the preceding.

Load More