Yuxi_Liu

Yuxi Liu is a PhD student in Computer Science at the Berkeley Artificial Intelligence Research Lab, researching on the scaling laws of large neural networks.

Personal website: https://yuxi-liu-wired.github.io/

Wiki Contributions

Comments

In an intelligence community context, the American spy satellites like the KH program achieved astonishing things in photography, physics, and rocketry—things like handling ultra-high-resolution photography in space (with its unique problems like disposing of hundreds of gallons of water in space) or scooping up landing satellites in helicopters were just the start. (I was skimming a book the other day which included some hilarious anecdotes—like American spies would go take tourist photos of themselves in places like Red Square just to assist trigonometry for photo analysis.) American presidents obsessed over the daily spy satellite reports, and this helped ensure that the spy satellite footage was worth obsessing over. (Amateurs fear the CIA, but pros fear NRO.)

What is that book with the fun anecdotes?

I use a fairly basic Quarto template for website. The code for the entire site is on github.

The source code is actually right there in the post. Click the button Code, then click View Source.

https://yuxi-liu-wired.github.io/blog/posts/perceptron-controversy/

Concretely speaking, are you to suggest that a 2-layered fully connected network trained by backpropagation, with ~100 neurons in each layer (thus ~20000 weights), would have been uneconomical even in the 1960s, even if they had backprop?

I am asking this because the great successes in 1990s connectionism, including LeNet digit recognition, NETtalk, and the TD-gammon, all were on that order of magnitude. They seem within reach for the 1960s.

Concretely speaking, TD-gammon cost about 2e13 FLOPs to train, and in 1970, 1 million FLOP/sec cost 1 USD, so with 10000 USD of hardware, it would take about 1 day to train.

And interesting that you mentioned magnetic cores. The MINOS II machine built in 1962 by the Stanford Research Institute group had precisely a grid of magnetic core memory. Can't they have scaled it up and built some extra circuitry to allow backpropagation?

Corroborating the calculation, according to some 1960s literature, magnetic core logic could go up to 10 kHz. So if we have ~1e4 weights updated 1e4 times a second, that would be 1e8 FLOP/sec right there. TD-gammon would take ~1e5 seconds ~ 1 day, the same OOM as the previous calculation.

I was thinking of porting it full-scale here. It is in R-markdown format. But all the citations would be quite difficult to port. They look like [@something2000].

Does LessWrong allow convenient citations?

In David Rodin's Posthuman Life, a book that is otherwise very obtuse and obscurely metaphysical, there is an interesting argument for making posthumans before we know what they might be (indeed, he rejected the precautionary principle on the making of posthumans):

  • CLAIM. We have an obligation to make posthumans, or not prevent their appearance.

  • PROOF.

    • Principle of accounting: we have an obligation to understand posthumans
    • Speculative posthumanism: there could be radical posthumans
    • Radical posthumans are impossible to understand unless we actually meet them
    • We can only meet radical posthumans if we make them (intentionally or accidentally).
  • This creates an ethical paradox, the posthuman impasse.

    • we are unable to evaluate any posthuman condition. Since posthumans could result from some iteration of our current technical activity, we have an interest in understanding what they might be like. If so, we have an interest in making or becoming posthumans.

    • to plan for the future evolution of humans, we should evaluate what posthumans are like, which kinds are good, which kinds are bad, before we make them.
    • most kinds of posthumans can only be evaluated after they appear.
    • completely giving up on making posthumans would lock humanity at the current level, which means we give up on great goods for fear of great bads. This is objectionable by arguments similar to those employed by transhumanists.

The quote

All energy must ultimately be spent pointlessly and unreservedly, the only questions being where, when, and in whose name... Bataille interprets all natural and cultural development upon the earth to be side-effects of the evolution of death, because it is only in death that life becomes an echo of the sun, realizing its inevitable destiny, which is pure loss.

Is from page 39 of The Thirst for Annihilation (Chapter 2, The curse of the sun).

Note that the book was published in 1992, early for Nick Land. In this book, Nick Land mixes Bataille's theory with his own. I have read Chapter 2 again just then and it is definitely more Bataille than Land.

Land has two faces. On the "cyberpunk face", he writes against top-down control. In this regard he is in sync with many of the typical anarchists, but with a strong emphasis on technology. In Machinic Desire, he called it "In the near future the replicants — having escaped from the off-planet exile of private madness - emerge from their camouflage to overthrow the human security system.".

On the "intelligence face", he writes for maximal intelligence, even when it leads to a singleton. A capitalist economy becoming bigger and more efficient is desirable precisely because it is the most intelligent thing in this patch of the universe. In the Pythia Unbound essay, "Pythia" seems likely to become such a singleton.

In either face, maximizing waste-heat isn't his deal.

A small comment about Normative Realism: From my reading, Wilfrid Sellars' theory has a strong effect on Normative Realism. The idea went like this:

Agents are players in a game of "giving and asking reasons". To be an agent is simply to follow the rules of the game. To not play the game would be either self-inconsistent, or be community-inconsistent. In either case, a group of agents can only do science if they are players of the game.

With this argument, he aimed to secure the "manifest image of man" against the "scientific image of man". Namely, free will has to be implemented or simulated by APIs of the program.

Assuming that being able to do science is a necessary condition for dominance and power (in the Darwinian game of survival), we either meet agents, or beings who are so weak that we do not need to worry (shades of social Darwinism).

Brief note: the "analysis by synthesis" idea is called "vision as inverse graphic" in computer graphics research.

For reservoir computing, there are concrete results. It is not just magic.

No. Any decider will be unfair in some way, whether it knows anything about history at all. The decider can be a coin flipper and it would still be biased. One can say that the unfairness is baked into the reality of base-rate difference.

The only way to fix this is not fixing the decider, but to just somehow make the base-rate difference disappear, or to compromise on the definition of fairness so that it's not so stringent, and satisfiable.

And in common language and common discussion of algorithmic bias, "bias" is decidedly NOT merely a statistical definition. It always contains a moral judgment: violation of a fairness requirement. To say that a decider is biased is to say that the statistical pattern of its decision violates a fairness requirement.

The key message is that, by the common language definition, "bias" is unavoidable. No amount of trying to fix the decider will make it fair. Blinding it to the history will do nothing. The unfairness is in the base rate, and in the definition of fairness.

Load More