jdp

Wiki Contributions

Comments

While Paul was at OpenAI, they accidentally overoptimized a GPT policy against a positive sentiment reward model. This policy evidently learned that wedding parties were the most positive thing that words can describe, because whatever prompt it was given, the completion would inevitably end up describing a wedding party.

In general, the transition into a wedding party was reasonable and semantically meaningful, although there was at least one observed instance where instead of transitioning continuously, the model ended the current story by generating a section break and began an unrelated story about a wedding party.

This example is very interesting to me for a couple of reasons:

Possibly the most interesting thing about this example is that it's a convergent outcome across (sensory) modes, negative prompting Stable Diffusion on sinister things gives a similar result:

Image alt-text

https://twitter.com/jd_pressman/status/1567571888129605632

Answer by jdpDec 16, 20215

The book Silicon Dreams: Information, Man, and Machine by Robert Lucky is where I got mine. It's a pop science book that explores the theoretical limits of human computer interaction using information theory. It's written to do exactly the thing you're asking for: Convey deep intuitions about information theory using a variety of practical examples without getting bogged down in math equations or rote exercises.

Covers topics like:

  • What are the bottlenecks to human information processing?
  • What is Shannon's theory of information and how does it work?
  • What input methods exist for computers and what is their bandwidth/theoretical limit?
  • What's the best keyboard layout?
  • How do (contemporary, the book was written in 1989) compression methods work?
  • How fast can a person read, and what are the limits of methods that purport to make it faster?
  • If an n-gram Markov chain becomes increasingly English like as it's scaled, does that imply a sufficiently advanced Markov chain is indistinguishable from human intelligence?

A lot of his question is to what extent AI methods can bridge the fundamental gaps between human and electronic computer information processing. As a result he spends a lot of time breaking down the way that various GOFAI methods work in the context of information theory. Given the things you want to understand it for, this seems like it would be very useful to you.

Very grim. I think that almost everybody is bouncing off the real hard problems at the center and doing work that is predictably not going to be useful at the superintelligent level, nor does it teach me anything I could not have said in advance of the paper being written. People like to do projects that they know will succeed and will result in a publishable paper, and that rules out all real research at step 1 of the social process.

This is an interesting critique, but it feels off to me. There's actually a lot of 'gap' between the neat theory explanation of something in a paper and actually building it. I can imagine many papers where I might say:

"Oh, I can predict in advance what will happen if you build this system with 80% confidence."

But if you just kinda like, keep recursing on that:

"I can imagine what will happen if you build the n+1 version of this system with 79% confidence..."

"I can imagine what will happen if you build the n+2 version of this system with 76% confidence..."

"I can imagine what will happen if you build the n+3 version of this system with 74% confidence..."

It's not so much that my confidence starts dropping (though it does), as that you are beginning to talk about a fairly long lead time in practical development work.

As anyone who has worked with ML knows, it takes a long time to get a functioning code base with all the kinks ironed out and methods that do the things they theoretically should do. So I could imagine a lot of AI safety papers with results that are, fundamentally, completely predictable, but a built system implementing them is still very useful to build up your implementing-AI-safety muscles.

I'm also concerned that you admit you have no theoretical angle of attack on alignment, but seem to see empirical work as hopeless. AI is full of theory developed as post-hoc justification of what starts out as empirical observation. To quote an anonymous person who is familiar with the history of AI research:

REDACTED

Today at 5:33 PM

Yeah. This is one thing that soured me on Schmidhuber. I realized that what he is doing is manufacturing history.

Creating an alternate reality/narrative where DL work flows from point A to point B to point C every few years, when in fact, B had no idea about A, and C was just tinkering with A.

Academic pedigrees reward post hoc propter ergo hoc on a mass scale.

And of course, post-alphago, I find this intellectual forging to be not just merely annoying and bad epistemic practice, but a serious contribution to X-Risk.

By falsifying how progress actually happened, it prioritizes any kind of theoretical work, downplaying empirical work, implementation, trial-and-error, and the preeminent role of compute.

In Schmidhuber's history, everyone knows all about DL and meta-learning, and DL history is a grand triumphant march from the perceptron to the neocognitron to Schmidhuber's LSTM to GPT-3 as a minor uninteresting extension of his fast memory work, all unfolding exactly as seen.

As opposed to what actually happened which was a bunch of apes poking in the mud drawing symbols grunting to each other until a big monolith containing a thousand GPUs appeared out of nowhere, the monkeys punched the keyboard a few times, and bow in awe.

And then going back and saying 'ah yes, Grog foresaw the monolith when he smashed his fist into the mud and made a vague rectangular shape'.

My usual example is ResNets. Super important, one of the most important discoveries in DL...and if you didn't read a bullshit PR interview MS PR put out in 2016 or something where they admit it was simply trying out random archs until it worked, all you have is the paper placidly explaining "obviously resnets are a good idea because they make the gradients flow and can be initialized to the identity transformation; in accordance with our theory, we implemented and trained a resnet cnn on imagenet..."

Discouraging the processes by which serendipity can occur when you have no theoretical angle of attack seems suicidal to me, to put it bluntly. While I'm quite certain there is a large amount of junk work on AI safety, we would likely do well to put together some kind of process where more empirical approaches are taken faster with more opportunities for 'a miracle' as you termed it to arise.

As a fellow "back reader" of Yudkowsky, I have a handful of books to add to your recommendations:

Engines Of Creation by K. Eric Drexler

Great Mambo Chicken and The Transhuman Condition by Ed Regis

EY has cited both at one time or another as the books that 'made him a transhumanist'. His early concept of future shock levels is probably based in no small part on the structure of these two books. The Sequences themselves borrow a ton from Drexler, and you could argue that the entire 'AI risk' vs. nanotech split from the extropians represented an argument about whether AI causes nanotech or nanotech causes AI.

I'd also like to recommend a few more books that postdate The Sequences but as works of history help fill in a lot of context:

Korzybski: A Biography by Bruce Kodish

A History Of Transhumanism by Elise Bohan

Both of these are thoroughly well researched works of history that help make it clearer where LessWrong 'came from' in terms of precursors. Kodish's biography in particular is interesting because Korzybski gets astonishingly close to stating the X-Risk thesis in Manhood of Humanity:

At present I am chiefly concerned to drive home the fact that it is the great disparity between the rapid progress of the natural and technological sciences on the one hand and the slow progress of the metaphysical, so-called social “sciences” on the other hand, that sooner or later so disturbs the equilibrium of human affairs as to result periodically in those social cataclysms which we call insurrections, revolutions and wars.

… And I would have him see clearly that, because the disparity which produces them increases as we pass from generation to generation—from term to term of our progressions—the “jumps” in question occur not only with increasing violence but with increasing frequency.

And in fact Korzybski's philosophy came directly out of the intellectual scene dedicated to preventing World War 2 after the first world war, in that sense there's a clear unbroken line from the first modern concerns about existential risk to Yudkowsky.