Tom_Breton_(Tehom)

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

@Retired Urologist: ISTM it's a combination of two things:

  1. It's a bit of nerd cultural heritage most of us have in common.
  2. Science fiction, more than other fiction, tends to deal with ideas, especially science ideas, and future predictions. That's not to say that it usually deals well with them - science flubs are more common, and the aforementioned Star Wars is a big offender. But it's usually wrestling with them at least a little bit, and therein lies another reason it comes up frequently in idea discussions.

As a machine-learning problem, it would be straightforward: The second learning algorithm (scientist) did it wrong. He's supposed to train on half the data and test on the other half. Instead he trained on all of it and skipped validation. We'd also be able to measure how relatively complex the theories were, but the problem statement doesn't give us that information.

As a human learning problem, it's foggier. The second guy could still have honestly validated his theory against the data, or not. And it's not straightforward to show that one human-readable theory is more complex than another.

But with the information we're given, we don't know anything about that. So ISTM the problem statement has abstracted away those elements, leaving us with learning algorithms done right and done wrong.

"Gentlemen, I do not mind being contradicted, and I am unperturbed when I am attacked, but I confess I have slight misgivings when I hear myself being explained." -- Lord Balfour, to the English Parliament

C S Lewis termed this "Bulverism", this device of explaining why X is so {dumb, crazy, misinformed, w/e} as to claim Y, without lowering oneself to arguing against Y. Lewis however was not above committing Bulverism himself.

The novice thinks that Friendly AI is a problem of coercing an AI to make it do what you want, rather than the AI following its own desires. But the real problem of Friendly AI is one of communication - transmitting category boundaries, like "good", that can't be fully delineated in any training data you can give the AI during its childhood.

Or more generally, not just a binary classification problem but a measurement issue: How to measure benefit to humans or human satisfaction.

It has sometimes struck me that this FAI requirement has a lot in common with something we were talking about on the futarchy list a while ago. Specifically, how to measure a populace's satisfaction in a robust way. (Meta: exploring the details here would be going off on a tangent. Unfortunately I can't easily link to the futarchy list because Typepad has decided Yahoo links are "potential comment spam")

Of course with futarchy we want to do so for a different purpose, informing a decision market. At first glance the purposes might seem to have little in common. Futarchy contemplates just human participants. The human participants might well be aided by machines, but that is their business alone. FAI contemplates transcendent AI, where humanity cannot hope to truly control it anymore but can only hope that we have raised it properly (so to speak).

But beneath the surface they have important properties in common. They each contemplate an immensely intelligent mechanism that must do the right thing across an unimaginably broad panorama of issues. They both need to inform this mechanism's utility function, so they need to measure benefit to humans accurately and robustly. They both could be dangerous if the metric has loopholes. So they both need a metric that is not a fallible proxy for benefit to humans but a true measure of it. They both need this metric to be secure against intelligent attack - even the best metric does little good if an attacker can change it into something else. They both have to be started with the right metric or something that leads quite surely to it, because correcting them later will be impossible. (Robin speculated that futarchy could generate its own future utility function but I believe such an approach can only cause degeneration)

I conclude that there must be at least a strong resemblance between a desirable utility metric for futarchy and a desirable utility metric for FAI.

Beyond this, I speculate that futarchy has advantages as a sort of platform for FAI. I'll call the combination "futurAIrchy".

First, it might teach a young FAI better than any human teacher could. Like, the young FAI (or several versions or instances of it) would participate much like any other trader, but use the market feedback to refine its knowledge and procedures.

However, certain caprices of the market (January slump, that sort of thing) might lead to FAI learning bad or irrelevant tenets (eg, "January is an evil time"). That pseudo-knowledge would cause sub-optimal decisions and would risk insane behavior (eg, "Forcibly sedate everyone during january")

So I think we'd want FAI trader(s) to be insulated from the less meaningful patterns of the market. I propose that FAIs would trade thru a front end that only concerns itself with hedging against such patterns, and makes them irrelevant as far as the FAI can tell. Call it a "front-end AI". (Problems: Determining the right borderline as they both get more sophisticated. Who or what determines that, under what rules, and how could they abuse the power? Should there be just one front-end AI, arbitrarily many, or many but according to some governing rule?)

Secondly, the structure above might be an unusually safe architecture for FAI. Like, forever it is the rule that the only legitimate components are:

  • Many FAI's that do nothing except discover information and trade in the futarchy market thru a front-end AI. They merely try to maximize their profit (under some predetermined risk-tolerance, etc details)
  • One or many front-end AI's that do nothing except discover information and hedge in the market. Also maximizing their profit.
  • Decision mechanism governing the borderline between FAIs and front-end AIs. Might just be a separate decision market.
  • Many subordinate AIs whose scope of action is not limited by rules given here, but which are entirely subordinate to the decisions of the futarchy market, to the point where it's hard-wired that the market can pull a subordinate AIs plug.
  • A mechanism to measure human satisfaction or benefit to humans. This is ultimately what controls futurAIrchy. The metric has to be generated from humans' self-reports and situations. There's a lot more to be said.

Problems: "log-rolling" where different components collude and thereby accidentally defeat the system. I don't see an exploit yet but that doesn't mean there isn't one. Is there yet a separate mechanism for securing the system against collusion?

What becomes of the profit that these AIs make? Surely we don't put so much real spending power in their silicon hands. But then, all they can do is re-invest it. Perhaps the money ceases to be human-spendable money and becomes just tokens.

What if a FAI goes bankrupt, or becomes inordinately wealthy? I propose that the behavior be that of a population search algorithm (eg genetic algorithm, though it's not clear how or whether crossover should be used). Bankrupt FAIs, or even low-scoring ones, cease to exist, and successful ones reproduce.

If FAI's are like persisting individuals, their hardware is an issue. Like, when a bankrupt FAI is replaced by a wealthy one's offspring, what if the bankrupt one's hardware just isn't fast enough? One proposal: it is all somehow hardware-balanced so that only the algorithms make a difference. Another proposal: FAIs (or another component that works with them) can buy and sell the hardware FAIs run on. Thus a bankrupt FAI's hardware is already sold. But then it is not so obvious how reproduction should be managed.

There's plenty more to be said about futurAIrchy but I've gone on long enough for now.

Fascinating that you could present Lob's theorem as a cartoon, Eliezer.

One tiny nitpick: The support for statement 9 seems to be wrong. It reads (1, MP) but that doesn't follow. Perhaps you mean (8, 1, MP)

"Yes, I am the last man to have walked on the moon, and that's a very dubious and disappointing honor. It's been far too long." -- Gene Cernan

That doesn't seem like a pro-rationality quote to me. It has a space-y, science-y theme, which may connote rationality, but its content seems anti-rational to me.