Review of A Map that Reflects the Territory

A few clarifications on the "new words" section:

Mediocristan and Extremistan are terms coined (I think) by Nassim Nicholas Taleb, in his book The Black Swan. They don't exactly mean what you say they do; the idea is that Mediocristan is an imaginary country where things have thin-tailed (e.g., normal) distributions and so differences are usually modest in size, and Extremistan is an imaginary country where things have fat-tailed (e.g., power-law) distributions and so differences are sometimes huge, and then you say something belongs to Mediocristan or Extremistan depending on whether the associated distributions are thin-tailed or fat-tailed.
The ones from EY's "Local Validity ..." post are, I think, just made up for the occasion and he intends it to be obvious from context at least roughly what they mean.
- "Code of the Light": how good, principled, rational, nice, honest people behave.
- "Straw authoritarians": strawman-authoritarians: that is, authoritarians who are transparently stupid and malicious, rather than whatever the most defensible sort of authoritarian might be.
The "memetic collapse" thing is a link, to a (spit) Facebook post by EY where he says this: "The Internet is selecting harder on a larger population of ideas, and sanity falls off the selective frontier once you select hard enough [...] the Internet, and maybe television before it, selected much more harshly from a much wider field of memes; and also allowed tailoring content more narrowly to narrower audiences [...] We're looking at a collapse of reference to expertise because deferring to expertise costs a couple of hedons compared to being told that all your intuitions are perfectly right, and at the harsh selective frontier there's no room for that. We're looking at a collapse of interaction between bubbles because there used to be just a few newspapers serving all the bubbles; and now that the bubbles have separated there's little incentive to show people how to be fair in their judgment of ideas for other bubbles [...] It seems plausible to me that *basic* software for intelligent functioning is being damaged by this hypercompetition [...] If you look at how some bubbles are talking and thinking now, "intellectually feral children" doesn't seem like entirely inappropriate language". In other words: changes in how communication works have enabled processes that systematically make us stupider, less tolerant, etc., and also get off my lawn.
"Glomarization" does yield search-engine hits when you spell it right; one of them is a wikipedia page entitled "Glomar response" which explains it pretty clearly.
I don't think memorizing the Bible or the digits of pi is a great example of "counterfeit understanding"; some of the people who memorize the Bible have a pretty good understanding of what it means, and people who memorize the digits of pi generally (I think) understand what digits mean. One of the best expositions of the counterfeit-understanding thing I know of comes from Richard Feynman writing about the terrible state of science education in Brazil at one time (I don't know whether it's improved since then); see e.g. here.

Regarding digits of pi, N. Gisin promotes the constructivist idea that certain mathematical expressions mean nothing in that they do not relate to anything real. One cannot make a scientific hypothesis involving them. The hundred-billionth twenty digit sequence of pi is smaller than the Plank length.

[-]gjm4y20

There's still a well-defined answer to the question of what the digits mean, and indeed of what they mean as digits of pi; e.g., the hundred-billionth digit of pi is what you get by carrying out a pi-computing algorithm and looking at the hundred-billionth digit of its output. Anyway, no one is memorizing that many digits of pi.

[EDITED to add:] On the other hand, people certainly memorize enough digits of pi that, e.g., an error in the last digit they memorize would make a sub-Planck-length difference to the length of a (euclidean-planar) circle whose diameter is that of the observable universe. (Size of observable universe is tens of billions of light-years; a year is 3x10^7 seconds so that's say 10^18 light-seconds; light travels at 3x10^8 m/s so that's < 10^27m; I forget just how short the Planck length is but I'm pretty sure it's > 10^-50m; so 80 digits should be enough, and even I have memorized that many digits of pi (and forgotten many of them again).

[-]William Gasarch4y10

Thanks! I will incorporate your comments into the review!

[-]William Gasarch4y10

I have incorporated the comments of gym into my review.

[-]gjm4y20

Pedantic note: it's "gjm", not "gym"; they're my initials.

[-]William Gasarch4y10

Fixed! Thanks!

[-]Ruby4y120

Thanks for the review! A link is a good way to post, but an even better way would be to reproduce the text here in this post. (People are generally much less likely to read things they have to click through to.)

[-]Ben Pace4y80

Oh yeah, +1 to this. I was surprised the post only had 11 karma when I saw it (William had sent me an advance copy and I’d really liked reading it) but when I saw that it was a link post I understood why.

[-]William Gasarch4y10

The review is a latex file and I posted the link to the pdf file that was generated.

Is there an easy way to make it so that its not a link?

[-]Ruby4y20

I'm afraid that beyond copy-pasting the resulting text, there isn't. :(

[-]cata4y60

Thanks for writing this! As someone who spends a lot of time hanging out on LessWrong and the other fora where these discussions are worked out in realtime, it's fun to read you coming at them fresh.

[-]Daniel Kokotajlo4y60

Great review! One comment:

But the collection bogs down with a series of essays (about 1/3 of the book) on Paul Christano's research on Iterated Amplification. This technique seems to be that the AI has to tell you at regular intervals Why it is doing what it is doing, to avoid the AI gaming the system. That sounds interesting! But then the essays seem to debate whether its a good idea or not. I kept shouting at the book JUST TRY IT OUT AND SEE IF IT WORKS! Did Paul Christano DO this? If so what were the results? We never find out.

First of all, that's not how I would describe the core idea of IDA. Secondly, and much more importantly, we can't try it out yet because our AIs aren't smart enough. For example, we could ask GPT-3 to tell us why it is doing what it is doing... but as far as we can tell it doesn't even know! And if it did know, maybe it could be trained to tell us... but maybe that only works because it's too dumb to know how to convincingly lie to us, and if it were smarter, the training methods would stop working.

Our situation is analogous to someone in medieval europe who has a dragon egg and is trying to figure out how to train and domesticate the dragon so it doesn't hurt anyone. You can't "just try out" ideas because your egg hasn't hatched yet. The best you can do is (a) practice on non-dragons (things like chickens, lizards, horses...) and hope that the lessons generalize, (b) theorize about domestication in general, so that you have a firm foundation on which to stand when "crunch time" happens and you are actually dealing with a live dragon and trying to figure out how to train it, (c) theorize about dragon domestication in particular by imagining what dragons might be like, e.g. "We probably won't be able to put it in a cage like we do with chickens and lizards and horses because it will be able to melt steel with its fiery breath..."

Currently people in the AI risk community are pursuing the analogues of a, b, and c. Did I leave out any option d? I probably did, I'd be interested to hear it!

[-]William Gasarch4y50

How would you describe the core ideas of IDA? I will incorporate your answer into the review and hence make it more accurate! I will also incorporate the reason why we can't try it out yet.

[-]Daniel Kokotajlo4y60

IDA stands for iterated distillation and amplification. The idea is to start with a system M which is aligned (such as a literal human), amplify it into a smarter system Amp(M) (such as by letting it think longer or spin off copies of itself), and then distilling the amplified system into a new system M+ which is smarter than M but dumber than Amp(M), and then repeat indefinitely to scale up the capabilities of the system while preserving alignment.

The important thing is to ensure that the amplification and distillation steps both preserve alignment. That way, we start with an aligned system and continue having aligned systems every step of the way even as they get arbitrarily more powerful. How does the amplification step preserve alignment? Well, it depends on the details of the proposal, but intuitively this shouldn't be too hard--letting an aligned agent think longer shouldn't make it cease being aligned. How does the distillation step preserve alignment? Well, it depends on the details of the proposal, but intuitively this should be possible -- the distilled agent M+ is dumber than Amp(M) and Amp(M) is aligned, so hopefully Amp(M) can "oversee" the training/creation of M+ in a way that results in M+ being aligned also. Intuitively, M+ shouldn't be able to fool or deceive Amp(M) because it's not as smart as Amp(M).

[-]ESRogs4y60

Also note that the AlphaZero algorithm is an example of IDA:

The amplification step is when the policy / value neural net is used to play out a number of steps in the game tree, resulting in a better guess at what the best move is than just using the output of the net directly.
The distillation step is when the policy / value net is trained to match the output of the game tree exploration process.

[-]William Gasarch4y30

I have incorporated your comments and also ack you. Thanks!

[-]rsaarelm4y10

So, somewhat inconsequential stylistic thing. I open a PDF link, see it's written in LaTeX, I start expecting something written more or less like an academic paper. This is written in very much a chatty, free-flowing blog post style, with jokes like calling neologisms "newords", so the whole thing feels a bit more off-kilter than was intended. This style of writing would probably work better as an HTML blog post (which could then be posted directly as a Lesswrong post here instead of hosted elsewhere and linked).

[-]Neel Nanda4y50

Just noting that I had the opposite reaction - I was pleasantly surprised by the fun style after the formal framing, and this made the whole thing more fun for me

[-]William Gasarch4y20

Interesting that the format gives a (in this case incorrect) indicator of the type of article it is.

In the early days of word processors the fear was that first drafts would be typed and look like they were far more done than they were, resulting in worse final drafts. I don't think this happened- in fact the opposite has happened- people can just keep on polishing and polishing.

LESSWRONG
LW

LESSWRONG
LW

65

Review of A Map that Reflects the Territory

65

65

1 Introduction

2 General Comments

3 Epistemology

4 Agency

5 Coordination

6 Curiosity

7 Alignment

8 Newords that I Learned or Tried to Learn From These Books

8.1 From the book Epistemology

8.2 From the Book Agency

8.3 From the Book Coordination

8.4 From the Book Curiosity

9 Should You Read This Book?

10 Acknowledgments