Sorted by New

Wiki Contributions


AGI Ruin: A List of Lethalities

Thanks for the response. I hope my post didn't read as defeatist, my point isn't that we don't need to try to make AI safe, it's that if we pick an impossible strategy, no matter how hard we try it won't work out for us.

So, what's the reasoning behind your confidence in the statement 'if we give a superintelligent system the right terminal values it will be possible to make it safe'? Why do you believe that it should principally be possible to implement this strategy so long as we put enough thought and effort into it? 
Which part of my reasoning do you not find convincing based on how I've formulated it? The idea that we can't keep the AI in the box if it wants to get out, the idea that an AI with terminal values will necessarily end up as an incidentally genocidal paperclip maximizer, or something else entirely that I'm not considering?

AGI Safety FAQ / all-dumb-questions-allowed thread

Is it reasonable to expect that the first AI to foom will be no more intelligent than say, a squirrel?

In a sense, yeah, the algorithm is similar to a squirrel that feels a compulsion to bury nuts. The difference is that in an instrumental sense it can navigate the world much more effectively to follow its imperatives. 

Think about intelligence in terms of the ability to map and navigate complex environments to achieve pre-determined goals. You tell DALL-E2 to generate a picture for you, and it navigates a complex space of abstractions to give you a result that corresponds to what you're asking it to do (because a lot of people worked very hard on aligning it). If you're dealing with a more general-purpose algorithm that has access to the real world, it would be able to chain together outputs from different conceptual areas to produce results - order ingredients for a cake from the supermarket, use a remote-controlled module to prepare it, and sing you a birthday song it came up with all by itself! This behaviour would be a reflection of the input in the distorted light of the algorithm, however well aligned it may or may not be, with no intermediary layers of reflection on why you want a birthday cake or decision being made as to whether baking it is the right thing to do, or what would be appropriate steps to take for getting from A to B and what isn't.

You're looking at something that's potentially very good at getting complicated results without being a subject in a philosophical sense and being able to reflect into its own value structure.

why assume AGIs will optimize for fixed goals?

I think the answer to 'where is Eliezer getting this from' can be found in the genesis of the paperclip maximizer scenario. There's an older post on LW talking about 'three types of genie' and another on someone using a 'utility pump' (or maybe it's one and the same post?), where Eliezer starts from the premise that we create an artifical intelligence to 'make something specific happen for us', with the predictable outcome that the AI finds a clever solution which maximizes for the demanded output, one that naturally has nothing to do with what we 'really wanted from it'. If asked to produce smiles, it will manufacture molecular smiley faces, and it will do its best to prevent us from executing this splendid plan.

This scenario, to me, seems much more realistic and likely to occur in the near-term than an AGI with full self-reflective capacities either spontaneously materializing or being created by us (where would we even start on that one)?
AI, more than anything else, is a kind of transhumanist dream, a deus ex machina that will grant all good wishes and make the world into the place they (read:people who imagine themselves as benevolent philosopher kings) want it to be ー so they'll build a utility maximizer and give it a very painstakingly thought-through list of instructions, and the genie will inevitably find a loophole that lets it follow those instructions to the letter, with no regard for its spirit.

It's not the only kind of AI that we could build, but it will likely be the first, and, if so, it will almost certainly also be the last.

AGI Ruin: A List of Lethalities

I haven't commented on your work before, but I read Rationality and Inadequate Equilibria around the time of the start of the pandemic and really enjoyed them. I gotta admit, though, the commenting guidelines, if you aren't just being tongue-in-cheek, make me doubt my judgement a bit. Let's see if you decide to delete my post based on this observation. If you do regularly delete posts or ban people from commenting for non-reasons, that may have something to do with the lack of productive interactions you're lamenting.

Uh, anyway.

One thought I keep coming back to when looking over many of the specific alignment problems you're describing is: 
So long as an AI has a terminal value or number of terminal values it is trying to maximize, all other values necessarily become instrumental values toward that end. Such an AI will naturally engage in any kinds of lies and trickery it can come up insofar as it believes they are likely to achieve optimal outcomes as defined for it. And since the systems we are building are rapidly becoming more intelligent than us, if they try to deceive us, they will succeed. If they want to turn us into paperclips, there's nothing we can do to stop them. 
Imo this is not a 'problem' that needs solving, but rather a reality that needs to be acknowledged. Superintelligent, fundamentally instrumental reason is an extinction event. 'Making it work for us somehow anyway' is a dead end, a failed strategy from the start.

Which leads me to conclude that the way forward would have to be research into systems that aren't strongly/solely determined by goal-orientation toward specific outcomes in this way. I realize that this is basically a non-sequitur in terms of what we're currently doing with machine learning - how are you supposed to train a system to not do a specific thing? It's not something that would happen organically, and it's not something we know how to manufacture. 
But we have to build some kind of system that will prevent other superintelligences from emerging, somehow, which means that we will be forced to let it out of the box to implement that strategy, and my point here is simply that it can't be ultimately and finally motivated by 'making the future correspond to a given state' if we expect to give it that kind of power over us and even potentially not end up as paperclips.