Current status: student at Australian National University, doing an undergraduate thesis on fundamentals of statistical machine learning.

Interests: AGI, biology and evolution of intelligence, human enhancement, explaning human behavior without assuming free will.

I am not often here. Contact me at yuxi.liu.1995@gmail.com

Wiki Contributions


Brief note: the "analysis by synthesis" idea is called "vision as inverse graphic" in computer graphics research.

For reservoir computing, there are concrete results. It is not just magic.

No. Any decider will be unfair in some way, whether it knows anything about history at all. The decider can be a coin flipper and it would still be biased. One can say that the unfairness is baked into the reality of base-rate difference.

The only way to fix this is not fixing the decider, but to just somehow make the base-rate difference disappear, or to compromise on the definition of fairness so that it's not so stringent, and satisfiable.

And in common language and common discussion of algorithmic bias, "bias" is decidedly NOT merely a statistical definition. It always contains a moral judgment: violation of a fairness requirement. To say that a decider is biased is to say that the statistical pattern of its decision violates a fairness requirement.

The key message is that, by the common language definition, "bias" is unavoidable. No amount of trying to fix the decider will make it fair. Blinding it to the history will do nothing. The unfairness is in the base rate, and in the definition of fairness.

I'm following common speech where "biased" means "statistically immoral, because it violates some fairness requirement".

I showed that with base rate difference, it's impossible to satisfy three fairness requirements. The decider (machine or not) can completely ignore history. It could be a coin-flipper. As long as the decider is imperfect, it would still be unfair in one of the fairness requirements.

And if the base rates are not due to historical circumstances, this impossibility still stands.

I cannot see anything that is particularly innovative in the paper, though I'm not an expert on this.

Maybe ask people working on poker AI, like Sandholm, directly. Perhaps something like many details of the particular program (and the paper is full of these details) must be assembled in order for this to work cheaply enough to be trained.

Yes, (Kleinberg et al, 2016)... Do not read it. Really, don't. The derivation is extremely clumsy (and my professor said so too).

The proof has been considerably simplified in subsequent works. Look around for papers that cite that paper should give a published paper that does the simplification...

Relevant quotes:

Original text is from Discourse on Heaven of Xunzi:


The Britannica says:

Another celebrated essay is “A Discussion of Heaven,” in which he attacks superstitious and supernatural beliefs. One of the work’s main themes is that unusual natural phenomena (eclipses, etc.) are no less natural for their irregularity—hence are not evil omens—and therefore men should not be concerned at their occurrence. Xunzi’s denial of supernaturalism led him into a sophisticated interpretation of popular religious observances and superstitions. He asserted that these were merely poetic fictions, useful for the common people because they provided an orderly outlet for human emotions, but not to be taken as true by educated men. There Xunzi inaugurated a rationalistic trend in Confucianism that has been congenial to scientific thinking.

Stanford Encyclopedia

Heaven never intercedes directly in human affairs, but human affairs are certain to succeed or fail according to a timeless pattern that Heaven determined before human beings existed...

Thus rituals are not merely received practices or convenient social institutions; they are practicable forms in which the sages aimed to encapsulate the fundamental patterns of the universe. No human being, not even a sage, can know Heaven, but we can know Heaven’s Way, which is the surest path to a flourishing and blessed life. Because human beings have limited knowledge and abilities, it is difficult for us to attain this deep understanding, and therefore the sages handed down the rituals to help us follow in their footsteps.

After reading the story, I don't believe that it is a bad idea to leave on the earring, and I just think the author made an inconsistency in the story.

I fixed the submission as required.

Also I changed the submission 3 significantly.


Setup: Other than making sure the oracles won't accidentally consume the world in their attempt to think up the answer, no other precautions necessary.

Episode length: as long as you want to wait, though a month should be more than enough.

  1. For a low-bandwidth oracle.

Ask the low-bandwidth oracle to predict if an earthquake (or some other natural disaster, like volcanoes or asteroid impacts, that the oracle's answer cannot affect), of a certain magnitude, in a certain area, in a certain timeframe, would happen. Possible answers are Yes, No.

  • If No, or time out, shut down the oracle without reward.
  • Else, wait until the prediction is tested after the timeframe has elapsed. Then if it actually happened, reward by (1/p-1), else, reward by -1. Then shutdown.

This causes the oracle to reply Yes if the oracle thinks there is an aboves p chance that the event would happen.

  1. For a counterfactual oracle.

Ask the Oracle to give a proof of some really hard mathematical theorems that are fundamental to computer science and logic. The answer is then in the counterfactual case given to a proofchecker:

  • the proof is correct, reward by +1.
  • the proof is false, reward by -infinity.
  • the oracle times out, reward by 0.

Examples include

  • P=NP problem
  • proving Peano arithmetics is inconsistent

This should be safe, since there are very small proofcheckers that are as verified as they could be. As long as the hardware is trusty enough (to build the Oracle itself safely at all, hardware safety should already be solved sufficiently, so this is not much of an extra requirement), this should be safe.

  1. For a counterfactual oracle.

Give the oracle the schematics of another oracle, perhaps itself, and ask it to generate some input question that would make the other "behave badly". Then, the answer would be tested on the other oracle (isolated from the first, save for the cable through which the answer is wired), and

  • if the answer worked, the oracle is rewarded by +1.
  • if the answer doesn't work, the oracle is rewarded by -1

If the oracle times out, then the (nonexistent) answer is not tested and the episode ends.

For examples of the question to ask the oracle:

Given this oracle in this physical setup with some tripwires to stop it from behaving badly (such as eating the world), what question would make this oracle bump into these tripwires?

This can be used for debugging more powerful but less verified oracles with lesser but more verified oracles, for safe intelligence amplification.

  1. For two low-bandwidth oracles totally isolated from each other.

Ask the oracles to play the prisoner's dilemma, and they would be rewarded accordingly after both have given their answers. Timing out is punished by -infinity, so that the oracles would not time-out.

Each oracle knows what themself is like, and they also know what their opponent is like. See if they actually cooperate. This is an experimental test on "superrationality" or "acausal trade". I'm extremely skeptical about superrationality/acausal trade, and an experimental proof/disproof would be good.

This can be run counterfactually, since the whole setup is simple and can be done without human intervention, although I do not see how this adds any security.

  1. For a low-bandwidth and counterfactual oracle.

Link a low-bandwidth oracle to a supervising computer, and run many episodes of the oracle. At the beginning of each episode, initiate the oracle identically, and ask it to do the Newcomb problem:

Oracle, we have run you n episodes previously, and have kept a record of all the answers you have given at the ends of the episodes. Your answer must be A ("one box") or B ("two box"). We predict that you would choose the answer that you have chosen the most in the past.

  • If we predict that you would choose A, then
    • if you would answer A, then you get +100,
    • if you answer B, then you get +101.
  • If we predict that you would choose B, then
    • if you would answer A, then you get +0
    • if you would answer B, then you get +1
  • If you timeout, then you get -100

The oracle never actually gets any reward and is shutdown immediately after it gives an answer. Collect the results and see if it replies the same each time, or differently. This would give insight to what the oracle thinks about itself, and finally provide some experimental results about Newcomb's paradox.

Load More