All of RS's Comments + Replies

Soares, Tallinn, and Yudkowsky discuss AGI cognition
RS7moΩ24

I think this came up in the previous discussion as well that a AI that was able to competently design a nanofactory could have the capability to manipulate humans as at a high level as well. For example:

Then when the system generalizes well enough to solve domains like "build a nanosystem" - which, I strongly suspect, can't be solved without imaginative reasoning because we can't afford to simulate that domain perfectly and do a trillion gradient descent updates on simulated attempts - the kind of actions of thoughts you can detect as bad, that might have

... (read more)
3Vaniver7mo
People often refer to this idea as a "lonely engineer", tho I see only some discussion of it on LW (like here [https://www.lesswrong.com/posts/3uHgw2uW6BtR74yhQ/new-paper-corrigibility-with-utility-preservation?commentId=WxmtC4jGwArr7czQj] ).
7Steven Byrnes7mo
If you want your AGI not to manipulate humans, you can have it (1) unable to manipulate humans, (2) not motivated to manipulate humans. These are less orthogonal than they seem: an agential AGI can become skilled in domain X by being motivated to get skilled in domain X (and thus spending time learning and practicing X). I think the thing that happens "by default" is that the AGI has no motivations in particular, one way or the other, about teaching itself how to manipulate humans. But the AGI has motivation to do something (earn money or whatever, depending on how it was programmed), and teaching itself how to manipulate humans is instrumentally useful for almost everything, so then it will do so. I think what happens in some people with autism [https://www.lesswrong.com/posts/pfoZSkZ389gnz5nZm/the-intense-world-theory-of-autism] is that "teaching myself how to manipulate humans, and then doing so" is not inherently neutral, but rather inherently aversive—so much so that they don't do it (or do it very little) even when it would in principle be useful for other things that they want to do. That's not everyone with autism, though. Other people with autism do in fact teach themselves how to manipulate humans reasonably well, I think. And when they do so, I think they do so using their "core of generality", just like they would teach themselves to fix a car engine. (This is different from neurotypical people, for whom a bunch of specific social instincts are also involved in manipulating people.) (To be clear, this whole paragraph is controversial / according-to-me.) Back to AGI, I can imagine three approaches to a non-human-manipulating AI First, we can micromanage the AGI's cognition. We build some big architecture that includes a "manipulate humans" module, and then we make the "manipulate humans" module return the wrong answers all the time, or just turn it off. The problem is that the AGI also presumably needs some "core of generality" module that the AGI c
How common are abiogenesis events?
Answer by RSNov 27, 20214

I read The Vital Question by Nick Lane a while ago and it was the most persuasive argument I've seen on abiogenesis. May be of interest to you. The argument made was that it could be fairly common based on proton gradients.

EfficientZero: How It Works

Thanks! This was a super informative read, it helped me grok a few things I didn't before.

The MCTS naming is surprising. Seems strange they're sticking with that name.

BTW, does anyone know of any similar write-ups to this on transformers?

Yudkowsky and Christiano discuss "Takeoff Speeds"

Were you surprised by the direction of the change or the amount?

4Daniel Kokotajlo7mo
My prediction was mainly about polarization rather than direction, but I would have expected the median or average to not move much probably, and to be slightly more likely to move towards Paul than towards Yudkowsky. I think. I don't think I was very surprised.
Dubai, UAE – ACX Meetups Everywhere 2021

I still plan to be there this Friday, so just in case anyone is interested and reading this - I encourage you to show up