Wiki Contributions


I think the reward tampering paper has a pretty good description of the various failure modes of wireheading. Though I guess it would be nice to have something like the Goodhart Taxonomy post, but for reward tampering/wireheading. 

Is it possible to do the same with replies to comments? 

Answer by AlgonNov 23, 202230

Which equation do you want to prove? The first part of 5.8 follows from the second part of 5.8 and 5.7 via recursion. You can get the reverse implication via induction. 

More generally, when dealing with a problem, always be concrete. Substitute some values in, starting with the simplest cases, until you get something you can reason about and prove an analogous statement for. In this case, just let n=2 and expand out the terms so you're not hiding the detail you need behind an abstraction. 

Also, this is not the right place to ask these questions. You want to go to the Maths Stackexchange. And please, provide enough information in a precise manner so that people can answer you're question. 

Slower takeoff causes shorter timelines ... Moreover slower takeoff correlates with longer timelines

Uh, unless I'm quite confused, you mean faster takeoff causes shorter timelines. Right?

A video on the geometric derivative by the ever excellent Michael Penn:

The geometric derivative is the instantaneous exponential growth rate i.e.  where  is the geometric derivative. 

Eh. I think this framing isn't cutting reality at its joints. To the extent it does describe something real, it seems to be pointing towards that old rift between Christiano and Yud. But, like, you can just call that prosaic ai safety vs sharp-left-turn ai safety. And, you know, I don't think we've got a great name for this stuff. But the other stuff Scott's talking about doesn't quite fit in with that framing. 

Like the stuff about getting mainstream scientists on board. Everyone in AI safety would like it if the possibility of X-risks from AI was taken seriously, its just that they differ in what they think the shape of the problem is, and as such have different opinions on how valuable the work being done on prosaic ML safety stuff is. 

I think most people in the AI safety community are leery about outreach towards the public though? Like, in terms of getting governments on board, sure. In terms of getting joe-average to understand this stuff, well, that stinks too much of politics and seems like something which could go wildly wrong. 

Did you edit this post? I could have sworn it wasn't this long, or this clear, earlier on.

That's a shame. LW readers seem to be complaining about being inundated with AI content, so I suspect regular injections of progress studies content would be a breath of fresh air. 

I can't see the images on firefox, opera or edge. Cool post though.

Usually I bounce off of Reframing Superintelligence and your recent work.  I don't know what's different about this article, but I managed to understand what you were getting at without having to loop over each paragraph five times. It's like I'm reading Drexler circa 1992, and I love it. 

RE simulators: Isn't the whole problem with simulators that we're not sure how to extract the information we want from the system? If we knew how to get it to simulate alignment researchers working on the problem for a long time until it gets solved, then that seems like a major alignment advance. Similairly, with my hazy conception of narrow AI systems, I have the impression that getting weak AI systems to do substantial alignment work seems like that'd constitute a major alignment advance in itself. I guess I want to know how much work you estimate is needed before we could design a more detailed version of what you've outlined and expect it to work.

Load More