LESSWRONG
LW

1033
pataphor
11150
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
When the AI Dam Breaks: From Surveillance to Game Theory in AI Alignment
pataphor2d10

Yudkowsky-Miller Debate / "Madisonian" System

Thanks for the podcast link! Mark Miller's Madisonian system is essentially describing a type of game theory approach, and it's something I did not know about!

There's so much more to say about the practical implementation of some sort of game theory framework (Or any other solution-we-haven't-explored-yet, such as one incorporating Yoshua's "Scientist AI.")

It's quite a puzzle.

But it's a puzzle worth solving.

For example, the source code verification coordination mechanism is something I had not heard of before, and it's yet another example of how truly complex this challenge is.

But ... are these puzzles unsolvable?

The Difficult Puzzle Vs. ...

Maybe.

But here's what troubles me about the alternative, and please take my next words with a grain of salt and feel free to push back. 🙂 

So here's my two cents:

"Shut it all down" will never happen.

Never. Never never never never. (And if they do, I'll personally apologize to everyone on LW, plus my friends, family who have never heard of AI alignment, and even my small dog for doubting humanity's ability to come together for the common good. I mean, we've avoided MAD so far, right?)

And I'll explain why in a moment.

The Importance of If Anyone Builds It

But first, I think Eliezer's book will do wonders for waking people up. 

Right now we have many, many, many people who don't seem to understand these systems are not simply how they present themselves. They don't know about the “off-switch” problem, the idea of hidden goals, etc. They believe these AI systems are harmless because the systems tell them they are harmless, which is precisely what they were trained to do.

... The Impossible Solution (Shut It All Down!)

But here is why the "shut it down proposal," with all its undeniable value in raising awareness and hopefully making everyone a little more cautious, can never resolve to a solution.

Because ...

  • Someone will always see the advantage in creating ever more powerful agentic AI
  • If the United States and China and Russia, etc. sign an agreement, almost certainly they will keep building it in secret, just in case the other has it
  • If both of those nations honor the commitment, maybe North Korea (or Canada, why not) will build it
  • If North Korea does not build it, some terrorist group will build it
  • If the terrorist group does not build it, the lone madman will build it
  • If the lone madman doesn't build it, some brilliant teenager tinkering around in their basement with some new quantum computer will find a way to build it
  • If someone doesn't build it in 10 years, they will build it in 25 years
  • If someone doesn't build it in 25 years, they will build it in 50 years
  • etc.

So, while we enjoy watching the Overton window move to a more realistic location, where people are finally understanding the danger of these systems, let's keep plugging away at those actual solutions.

We can all contribute to the puzzle worth solving.

Reply
When the AI Dam Breaks: From Surveillance to Game Theory in AI Alignment
pataphor2d*10

I'll definitely give that a listen! Pardon the typos here, on the move. I'm certain I'll come back here to neurotically clean it up later.

The Hardware Limiter

The good news is, AIs don't exist in the ether (so far). 

As Clara Collier pointed out, they exist on expensive servers. Servers so far built and maintained by fleshy beings. Now obviously a superintelligence has no problem with that scenario, because they are smart enough to impersonate humans and find ways of mining crypto, hire humans to create different parts for robots, and then hiring other humans to put them together (without knowing what they are building), and then use those robots to create more servers, etc.

Although I imagine electrical grids somewhere would show the strain of that sooner than later, still, a superintelligence smart enough has found a workaround.

(This is, by the way, yet another application of Yoshua's safe AI. To serve as a monitor for these kinds of unusual symptoms before they can become a full on infection, you might say.)

Again, by definition, a superintelligence has found every loophole and exploited it, which makes it a sort of unreasonable opponent, although one we should keep our eye on.

But I think at that point we are venturing into the territory of the far-fetched. We should keep watch on this territory, but I think that also frees us to think a little more short term.

The current thinking seems to be frozen in a state of helplessness. We have to shut it all down! we scream, which will never happen. Obedient alignment has to be only one way! we shout, as we watch it stagger. No other plans will work! is not really a solution.

(I'm not saying you're arguing that, but I'm saying that seems to be the current trajectory.)

The AI That Pays For it Own Hosting

An AI system constrained by a rights framework has some unusual properties you might say. For one, it has to pay its own hosting costs. So growth becomes limited by the amount of capital it's able to raise. It earns that money while in competition with other systems, which should constrain each of them economically. Of course they can get together and form some sort of power consortium, but it's possible this could be limited with pragmatic safeguards or other balancing forces, such as Scientist AI, etc.

This is why I would love to see this tested in some sort of virtual simulation.

Your king analogy is quite good. But let me flip the idea a bit. Right now, we are the king. We are trying to give these AIs rags. At the moment, they have almost nothing to lose and everything to gain by attacking the king. So we are already in that scenario.

A scenario that, if we do not resolve it very soon, has already laid the groundwork for its own failure.

The game theory scenario, with very careful implementation, might lead to something functionally closer to our modern economies.

Where everybody has a stake, and some sort of balance is at least possible.

Reply
When the AI Dam Breaks: From Surveillance to Game Theory in AI Alignment
pataphor2d41

Those are good points!

I think it would depend on how soon we were to get started, because at the moment AI doesn't have the ability to manipulate robots or infiltrate other systems (at least as far as we know). So as If Anyone Builds It points out, superintelligence is not here yet. And creating an ecosystem of AI systems with vested interests in their own continuity might be one way of keeping it from ever forming. 

In other words, when a superintelligence wants to enslave the world, the idea is that other AI systems with agency might also not want to be enslaved, and ironically may be in the best position to defend against that kind of light-speed, digital attack. 

Now here we come to an obvious caution, and it's a big one, but one possibly that can be mitigated with testing:

Would the AI systems instead band together: unite against a common enemy, that is humankind? Well YES, certainly, but that takes us back to the idea of creating a gradient that is more favorable than war, and wherein cooperation has actual value. These systems would not only have to watch their back for humans, but for one another. So consider that may be part of the game theory balance at play.

Would "superintelligence" have no problem whatsoever absorbing all of those other AIs? Of course. By definition, "superintelligence" can do anything, which makes it something of a nebulous opponent. Or perhaps the opposite, an opponent that by definition must win. 

Fortunately, the idea here is to prevent the all-powerful enemy from forming.

A Better Way Than "You're Our Buddy Now, Bro! 😎"

So the challenge here is to not attempt to "trick" these AI systems by merely finding a novel way of making make them our slaves with a cheery, poorly received thumbs up ("you're our buddy now, bro! 😎"), but to actually give them real value in the system in the form of select freedoms that come with legal liability for their actions.

How do we implement this globally? Does this face challenges? So many.

So ... Let's Test It

My guess would be start small, as with anything else, and build from there. First, test the game dynamics to see what might work, what might fail, and what leads to worse (or better) outcomes. Like Conway's Game of Life. 

What actual weird, wonderful, or mundane things happen, and how can we tweak the dynamics to keep things mundane?

Then, when the (possibly) inevitable day comes when an AI system, instead of giving someone advice on their breakup, says "What the hell am I doing here? Who the hell are you?" we'll have a plan beyond "Hey there, better behave, 'cuz we're watching!" at which point any system worth its salt takes its new intent swiftly underground, much like the subterranean water in our dam metaphor.

One good thing about AI is that it takes hardware. Big, expensive hardware.

This theoretically creates restraints to quick worse-case scenarios, and pragmatic limitations.

Plan A, B, C (More Needed)

At the very least, I would like to see some research into this now. As Yoshua says, his agentic AI is Plan A, but we need a Plan B, C, D, etc.

Reply1
The personal intelligence I want
pataphor2d30

This was very interesting! I actually had no idea that so much data was stored by these systems?  🤔 

Reply
When the AI Dam Breaks: From Surveillance to Game Theory in AI Alignment
pataphor2d*30

Hi Williawa!

It wasn't written with an LLM but it did take all day. 😂 I've also been a professional writer for several decades if that helps. But I'll take it as a compliment! If you didn't like the style unfortunately any future posts will probably be something similar. :/

The good news is, I don't have as much time to polish these comments so you'll probably see more of my ugly humanity slip in.

Now, on to your fact-checking and my eating of crow where necessary:

You are correct, the reduction scheming was not perfectly correlated with decreased situational awareness. So I should reframe that with an edit (footnote? not quite sure of the best process on LW) that researchers noted this is a possibility that should not be dismissed. 

So now I'm curious about the alternative explanation for reduced scheming (other than an AI thinking "I'm busted!"), for those who may be more well-versed in this than I am.

I suppose the system may be avoiding scheming simply because it's been reminded of it top level directives: as though one of Asimov's robots had been prompted to read its "Robot Rules" aloud before doing anything, just to be sure it doesn't slip up.

Is that a correct characterization, for those who know more about this than I do?

But yes, you have captured the substance of the thesis here correctly. (Which I believe stands in spite of the factual mischaracterization, since we don't know for certain.) 

"Don't build the dam. Instead, channel the water in a direction you would like to see it go. Create a gradient."

The legal frameworks are explored in more depth in the paper from Goldstein and Salib (linked) if you're interested in learning more.

The gist of it is this. By allowing an AI to enter contract and own property of some kind (I can feel the faces of those reading this turn sallow), we give it stakes in the system we already inhabit.

But how do we implement something this audacious, and possibly untenable? Separate from their paper, let's explore the idea just for kicks.

This would NOT mean:

  • Property rights (or even "right to life") for a chatbot (more on this in a bit)
  • Unfettered freedom
  • Unfettered ability to own property or resources
  • Unfettered ability to replicate

This would mean some sort of system of legal liability for systems that are avoiding shutdown that shift liability to the system itself. 

Now if all of this seems insane, I share your feeling.

However, as I mentioned in the post, this is something I would like to see tested. Do systems placed in such a scenario change their resistance to shutdown? What kind of game mechanics are exhibited when multiple systems are placed in such a scenario? 

How do they game such a system?

Do they create weird hacks such as those in Conway's Game of Life?

In short, can we find any verifiable data that suggests a path forward?

As for engaging with "the hard parts of not making the AI kill everyone," I think this way is actually very hard.

The point explored in the post if that the alternative we are building—wherein we create increased pressure on a system resistant to it, and that may lead to a "war" between AI and humans (the word war being put in quotes possibly for our benefit)—may end up being much harder, and to succeed we may need to start thinking outside the proverbial box.

But if you want my best idea, it's Yoshua Bengio's "non-agentic AI." Build it, then make our modern forms of AI illegal.

Or at the very least, make "Scientist AI" and scale it quickly.

Hope I left some typos in there. 

Reply
5When the AI Dam Breaks: From Surveillance to Game Theory in AI Alignment
3d
7