Review of A Map that Reflects the Territory

by William Gasarch26 min read12th Sep 202120 comments

57

Book Reviews
Frontpage

I had read about lesswrong.com on Scott Aaaronon's blog but never read it. Then there was an offer to review A Map that Reflects the Territory which (as you prob know) is a collection of lesswrong essays. 

I originally had just a link to my review, which I still have in this post; however, several people (correctly) urged me to make it a post you can just read directly. Due to the FAQ being out of date and my own luddite-tendencies I was unable to do this, but Ruby did it for me, for which I am grateful. The full text appears below the link. 

Here is a link to my review: 

https://www.cs.umd.edu/~gasarch/BLOGPAPERS/lesswrong.pdf

Here is the review in text to just read:

A Map that Reflect the Territory: Essays by the Less Wrong Community 
Author: Less Wrong 
Publisher: Less Wrong Press 
720 pages, Year: 2020 
$30.00 But See Note on Availability 

Reviewer: William Gasarch (gasarch@umd.edu )

Availability As of September 2021 when I finished this review the book was out of stock on Amazon. However: 

1 Introduction 

Less Wrong is a forum where people post essays. The stated philosophy is: We are a community dedicated to improving our reasoning and decision-making. We seek to hold true beliefs and to be effective at accomplishing our goals. More generally, we work to develop and practice the art of human rationality. That seems to cover a lot of ground! A satire of it would say the following: There are discussions about discussions, discussions about arguments, arguments about discussions, and arguments about arguments. 

That is not fair. The topics seem to be (1) find the truth in science and in life, (2) AGI (Artificial General Intelligence), and (3) probability. The most common non-trivial word in this book might be Bayes (a trivial word would be something like the which is likely more common but less interesting). 

This book is a best-of collection. We quote the preface: Users wrote reviews of the best posts of 2018, and voted on them using the quadratic voting system, popularized by Glen Weyl and Vitalik Buterin. From the 2000+ posts published that year, the Review narrowed down the 44 most interesting and valuable posts. 

The collection of posts is now gathered together in a book from the Less Wrong forum, titled A Map that Reflect the Territory: Essays by the Less Wrong Community 

This set of essays is a set of five books, titled Epistemology, Agency, Coordination, Curiosity, Alignment. Each book is small—about 6 inches long and 4 inches wide. 

2 General Comments 

PROS Many of the essays bring up a point that I had not thought of before. Many of the essays say something interesting in passing while getting to their point. 

CONS Some of the essays are trying to say something interesting but have no examples. There are times I am crying out give me an example! (reminds me of my days as a pure math major). Some of the essays are locally good but its not clear what their point is. 

CAVEAT (both a PRO and a CON) Many of the essays use a word or phrase as though I am supposed to already know them. If I was a regular member of the forum then perhaps I would know them. In the modern electronic age I can try to look them up. This is a PRO in that I learn new words and phrases. For me this is a really big PRO since I collect new words and phrases as a hobby. This is a CON in that going to look things up disrupts the flow of the essays. And sometimes I can’t find the new word or phrase on the web. 

In the third to last section of this review I will have a list of all of the words and phrases I learned by reading these books and either their meaning or that I could not find their meaning. Why third to last? Because the second to last section is my summary opinion and the reader of this review should be able to find it quickly (the last section is acknowledgments). I posted to lesswrong a request for what to tell me what some of the words mean and got a few responses. But there are more. If you know what one of the ones I do not know means, email me please! 

3 Epistemology 

I quote the first sentence: 

The first book is about epistemology, how we come to know the world. Most of the essays are on how to have a good argument. (Reminds me of Monty Python’s classic sketch the argument clinic. The essays are more enlightening but less funny.) 

Scott Alexander’s Varieties of Argumentative Experience is especially good and has. . . wait for it . . . examples!. Here is one concept I found very interesting: double-crux. Say Alice thinks gun control is good and Bob thinks gun control is bad. They should find related statements X and Y such that if X is true Alice will change her mind, and if Y is true then Bob will change his mind. In this case it could be 

X is If we have gun control then crime will go up. 

Y is If we have gun control then crime will go down. 

Hence the argument can now focus on a question that can be studied objectively. (I will now plug my cousin Adam Winkler’s book: Gunfight: The Battle over the Right to Bear Arms  which is an intelligent discussion of gun control including the history of the issue.) Another essay that I interpret as on the topic of how to have a good argument is Local Validity as a Key to Sanity and Civilization by Eliezer Yudkowsky. The essay is actually about laws and norms, but its more about the need to avoid having laws that only apply to some people and not others. While this seems like an obvious point, he gives it history and context. There are essays by Alkjash about how to come up with new ideas: babble and prune. Have lots of (possibly half-baked) ideas, and then prune to get the good ones. There is a delicate balance here— how much to babble? how much to prune? A fascinating aside in the article: baby’s can make all the phonemes— they learn language mostly by pruning. 

The essay Naming the Nameless by Sarah Constantin is about aesthetics and arguments. Why are artists left wing? What to do if you are are a conservative who likes modern art? She then critiques certain types of arguments from an aesthetic point of view. 

The last essay, Towards a New Technical Explanation of Technical Explanation by Abram Dem ski is the most technical. Its about logic, uncertainly, and probability. It seems to point to a way to predict things under uncertainty, however there are no examples. I felt like shouting Does it Work? Can you test it?. 

4 Agency 

I quote the first sentence: 

The second book is about agency, the ability to take action in the world and control the future. 

Despite the above sentence, this book does not have a coherent theme; however, it does have several very interesting essays. 

Eliezer Yudkowsky has two essays on honesty: Meta-Honesty: Firming Up Honesty Around the Edge Case (The Basics) and Meta-Honesty: Firming Up Honesty Around the Edge Case (The Details). When should one be honest? The usual easy example is lying to Nazi’s who ask if you are hiding Jews. Is there a consistent rule you can use? The essays suggest rules that involve never lying about lying. The second essay has two conversations that are so funny they should be made into a Monty Python sketch: 

Dumbledore trying to find out if Harry Potter robbed a bank, 

and the Gestapo asking about hiding Jews. What makes these conversations hilarious is that all parties know all about the issue of meta-honesty. Eliezer does admit that these scenarios would never happen. These essays raise interesting points but do not really give a solution. There probably is no solution. 

Michael Valentine Smith’s essay Noticing the Taste of the Lotus is about noticing that you are (say) playing a computer game to get more points, and using those points to buy things so that you can . . . play better and get more points so that you can buy things . . .. We (I mean every human) needs to BREAK OUT OF THIS DEATH SPIRAL. 

Scott Alexander’s The Tails Fall Apart as a Metaphor for Life begins by talking about the following: even though reading and writing scores are correlated, the top reading score is usually not the the top writing score. He then applies this observation to happiness and morality. That is, different definitions of happiness are some correlated, but not at the high end. Same for morality. This essay gave me lots to think about, though I don’t know what to conclude. 

The other essays were of the same type: they made some interesting points but didn’t really answer the rather hard questions they set out to tackle. This reminds me of what I liked about philosophy (my minor in college): the questions raised (e.g., What is Truth? What is Knowledge? What is Beauty?) are not going to be answered, but reading about the attempt to answer them is interesting. 

5 Coordination 

I quote the first sentence: 

This third book is about coordination, the ability of multiple agents to work together. 

Four of the essays are on game theory. They all go beyond the usual introduction of the Prisoner’s Dilemma and hence are all interesting. My challenge is is to give 1-2 sentences about each one. 

1. Anti-Social Punishment by Martin Sustrik. This describes an experiment that people really did involving whether a player does whats good for himself or whats good for the group. Results are interesting and seem to really tell us something. 

2. The Costly Coordination Mechanism of Common Knowledge by Ben Pace. The key to the prisoners dilemma is that the parties cannot talk to each other. In the real world how do enough people talk to each other so that they do not fall into the dilemma? 

3. The Pavlov Strategy by Sarah Constanin. This describes strategies for Prisoner’s Dilemma. 

4. Inadequate Equilibria vs Governance of the Commons by Martin Sustrik. This gives real examples of how people got around the tragedy of the commons. 

Prediction Markets: When do they work? by Zvi Mowshowitz is an excellent article about, as the title says, when Prediction Markets work. I was most intrigued by the fact that insider trading is quite legal; however, if it is known that people are doing it, less people might use that market. 

The Intelligent Social Web by Michael Valentine Smith views life as improv. In order for a scene to work everyone must naturally follow their role. In life we have a view of ourselves that we have to stick to to make the scene work. We may change slowly to adapt to a different scene. This is a fascinating way to view life! 

On the Loss and Preservation of Knowledge by Samo Burja begins with the question: What would Aristotle have thought of Artificial Intelligence? No it doesn’t! The essay really begins with the question How would you approach the question of “What would Aristotle have thought of Artificial Intelligence?”. It goes on to talk about how knowledge, schools of thought, and philosophies have a hard time being preserved, and giving signs that they were or were not. Alas, it is likely that the Aristotelian philosophy is not so well preserved to answer the question (thats my opinion). 

There are a few other essays, but the ones I mentioned are the highlights. This was my favorite book since so many of the essays were interesting. 

6 Curiosity 

I quote the first sentence: 

The fourth book is about curiosity, which is the desire to understand how the world works 

The three essays Is Science Slowing Down? by Scott Alexander and Why Did Everything Take So Long? and Why Everything Might Have Taken So Long both by Katja Grace look at the pace of science and other advancement. Scott Alexander argues that science is slowing down and he gives good reasons for this. Katja Grace examines why, for example, even thought humans have been around for 50,000 years the wheel was invented only about 6000 years ago. So for 44,000 

years people didn’t have the wheel! (My students are amazed that 30 years ago people didn’t have Netflix.) 

The essay What Motivated Rescuers During the Holocaust? by Martin Sustrik is interesting in both what they can say about the question and how they can say anything about the question. The essay Is Clickbait Destroying Our Intelligence by Eliezer Yudkowsky is locally interesting but wanders around quite a bit. 

The essay What Makes People Intellectual Active is some interesting but longer than it needs to be. 

The essay Are Minimal Circuits Demean-Free by Paul Christiano is about circuits (really AI systems) that satisfy the problem constraints but not in the way that you want. It was too technical for my tastes. Also (and this is not an objection) It may have fit better in the book Alignments. There are a few other essays, but the ones I mentioned are the highlights. 

7 Alignment 

I quote the first sentence: 

This fifth book is about alignment, the problem of aligning the thoughts and goals of artificial intelligence with those of humans. 

The essay Specification Gaming Examples in AI by Victoria Krakovna is about when AI systems do well but for the wrong reason. For example, a deep-learning model to detect pneumonia did well, but only because the more serious cases used a different X-ray machine. Victoria Krakovna has a longer article and 61 example here

 This is great in that we now see what the problem is. Then there was a great satirical essay The Rocket Alignment Problem by Eliezer Yudkowsky. There were some other essay of mild interest about what might happen (e.g, slow and steady or fast and abrupt progress). But the collection bogs down with a series of essays (about 1/3 of the book) on Paul Christano’s research on Iterated Amplification.  The idea is that you start
with a system M that is aligned--- it gives the right answers for the right reasons. Perhaps a literal human. You then amplify to a smarter system Amp(M) (perhaps letting it think longer or spinning off copies of itself). Then you (and this is the key!) distilling Amp(M) into a system M+ which isaligned. Repeat this many times. But note that you always make sure its aligned.

 That sounds interesting! And it might work! But then the essays seem to debate whether its a good idea or not. I kept shouting at the book JUST TRY IT OUT AND SEE IF IT WORKS!  I have since learned (from the comments below) that current AI is just not smart enough to do this yet. This raises a question: How much should one debate if an approach will work before the approach is possible to try? IMHO, less than they do in this book. 

8 Newords that I Learned or Tried to Learn From These Books 

This section has a list of newords that I learned (or tried to learn) from reading this book. In most cases I was able to look them up. In some cases I found out by posting on lesswrong. There are still some cases where I do not know what the newords means. 

8.1 From the book Epistemology 

1. No Free Lunch Theorem If an ML algorithm does well on one set of data it will do badly on another (this is a simplification). This is not just an informal statement—it has been formalized and proven. 

2. Code of the Light On page 19, in the article Local Validity as a Key to Sanity and Civilization by Eliezer Yudkowsky, is the following sentence: 

I’ve been musing recently about how a lot of the standard Code of the Light isn’t really written down anywhere anyone can find. 

Google Searches for code of the light only lead to the essay. The phrase was in green so I thought maybe in the original it was a link that would tell me what it means. Nope. 

When I posted an early version of this review I got this comment from gjm which is slightly paraphrase. 

I think that EY made up this terms for the occasion and he intends them to be, at least roughly, clear from context. It means “how good, principled, rational, nice, honest people behave.” 

3. Straw Authoritarians On page 20, in the article Local Validity as a Key to Sanity and Civi lization by Eliezer Yudkowsky, is the following sentence: 

Those who are not real-life straw authoritarians (who are sadly common) will cheerfully agree that there are some forms of goodness, even most forms of goodness, that it is not wise to legislate. 

When I posted an early version of this review I got this comment from gjm which is slightly paraphrase. 

Authoritarians who are transparently stupid and malicious, rather than whatever the most defensible sort of authoritarian might be. 

4. Whispernet Justice System being tried in the court of public opinion. I am guessing from context. Google only points that the essay it appeared in. Even so, this should be a word! 

5. The Great Stagnation The name of a pamphlet by Tyler Cowen from 2011 that argues that the American Economy has run out of steam for a variety of reasons. The phrase is now used independent of the book but with the same meaning. 

6. Memetics The study of memes in a culture. 

7. Memetic collapse On page 27, in the article Validity as a Key to Sanity and Civilization by Eliezer Yudkowsky, is the following sentence: 

It’s [the book Little Fuzzy by H. Beam Piper] from 1962, when the memetic collapse had started but not spread very far into science fiction. 

Google Searches only lead to the same essay I read this in. Searches on lesswrong lead to a few hits but they all seem to presuppose the reader knows the term. 

When I posted an early version of this review I got a comment from gjm which quotes from a Facebook post by EY. I paraphrase Facebook post: 

Since people can select just what the agree with (on the internet, on Facebook, etc) there is a collapse of references to expertise. Deferring to expertise causes a couple of hedons2 compared to being told your intuitions are right. We’re looking at a collapse of interactions between bubbles because there used to be just a few newspapers serving all the bubbles; and now that the bubbles have separated there’s little incentive to show people how to be fair in their judgment of ideas from other bubbles. In other words: changes in how communication works have enabled processes that systematically made us stupider, less tolerant, etc., and also get off of my lawn. 

2I had to look this one up: a hedon is a unit of pleasure used to theoretically weight people’s happiness. Like what I get when I find a cool new word. 

8. AGI Artificial General Intelligence 

9. Double cruxing Alice and Bob are having an argument. Get them to agree on a fact that would change their mind. Example: Alice is for gun control and Bob is against it. If Alice would change her mind if she knew gun control causes crime to go UP and Bob would change his mind if he knew gun control causes crime to go DOWN then they have reduced their disagreement to a factual statement that can be investigated. 

8.2 From the Book Agency 

1. Lotus Eater In the Odyssey they land on the Island of Lotus-Eaters. The taste of the lotus is so good that your goal is to eat them and you ignore other goals. Some of todays games have that property- you accumulate points that allow you to play more to get more points. . .. I’ve also heard of going to the gym and lifting weights so you get better at lifting weights. Origin might be Duncan Sabien. 

2. Medioracistan (gjm had a comment on the review which corrected an earlier version of this entry.) 

On page 16, in the article The tails coming apart as a metaphor for life by Scott Alexander, is the following: 

This leads to (to steal words from Taleb) a Mediocristan resembling the training data where the category works fine, vs an Extremistan where everything comes apart

In Nassim Nicholas Taleb’s book The Black Swan Mediocritan and Extremistan are imaginary countries. In Mediocritan things have thin-tailed d distributions, so differences are moderate. In Extremistan there are fat-tailed distributions, so difference are sometimes hugh. These countries are used to indicate if data is thin-tailed of fat-tailed. 

3. Extremistan A fat-tailed event that can spread (e.g, COVID) 

4. Deontology An ethical system that uses rules to tell right from wrong. Once the rules are set, no need for God or anything else. 

5. Glomarization always saying ‘I cannot confirm or deny’. I got this definition from the essay. Its a more common term than I had thought: there is a Wikipedia entry on Glomar Response

6. Dunbar’s number The number of people that we can interact with comfortably Dunbar esti mated it to be 150. Be carefully who you choose for your 150st friend. 

8.3 From the Book Coordination 

1. Miasma This seems to be the opposite of hype, but Google says its an unpleasant smell. 

2. Goodhart’s Demon I know what Goodhart’s law is (if a measure becomes a target it ceases to be a measure). I could not find Goodhart’s Demon anywhere on the web. 

3. Hansonian Death Trap On page 73, in the article Prediction Markets: When Do They Work, is the following: 

If you’re dealing with a hyper-complex Hansonian death trap of a conditional market where its 99% to not happen, even with good risk measurement tools that don’t tie up more money than necessary, no one is going want to put in the work and tie up the funds. 

Google Searches only turned up hits to this essay. Searches within lesswrong point to a few more hits, but they presuppose the reader knows the term. I did find a moral philosopher named Robin Hanson who (1) seems to talk about the kind of things lesswrong talks about (e.g., overcoming bias), and (2) is mentioned a lot on lesswrong. So he could be the Hansonian part. Or not. And I still don’t know what the death trap part means. 

4. The Costanza Do the opposite of what you naively think you should do. This is from an episode of Seinfeld where George Costanza intentionally does this since all of his past decisions have been wrong. Not to be confused with pulling a Costanza which means, if you are fired, show up for work the next day as if you weren’t, as if your boss was just joking. 

5. Lucas Critique It is naive to predict the effect of an economic policy based on past uses of it. 

6. Counterfeit Understanding Knowing the words but not their meaning. Like people who memorize proofs in math line-by-line but do not know the intuition behind them. I have had students who memorize proof templates but do not really understand the proofs or the intuitions behind them. In once case a student took the proof template for showing √2 and √3 irrational and used it to prove √4 irrational. 

8.4 From the Book Curiosity 

1. Dectupled Multiply by 10. 

2. Price’s law of scientific contributions If there are n people on a project than half of the work will be done by n people. 

3. Yudowsky’s law of mad science Every 18 months the min IQ needed to destroy the world decreases by one. Scary! Note that Yudkowsky is a contributor to these essays. So is Yudowsky’s law really a thing? Yes! 

4. Opsec Short for operational security. 

5. Bystander Effect On page 28, in the article What Motivated Rescuers During the Holocaust? by Martin Sustrik, is the following: 

As I already said, I am not an expert on the topic, but if what we see here is not an instance of the bystander effect, I’ll eat my hat. 

He is referring to that people who begin helping one Jew escape the Nazi’s end up helping more. 

The phrase Bystander Effect is on the web! A lot! It seems to be that the more people that are bystanders who could prevent something bad from happenings the less likely someone really will. This seems different from how its used in the essay. 

When I asked the lesswrong forum about this I got two responses: 

(a) beriukay said Since I have not read the first one [the article], I could only speculate that the people who end up helping realize that nobody else is doing to do anything to help, which breaks them out of the effect and they end up helping more. Excellent! This seems to say that what the author of the article meant to say is that this is an example of the converse of the standard bystander affect. 

(b) Tetrapace Grouping said: The bystander effect is an explanation of the whole story: 

  • Because of the bystander effect, most people weren’t rescuers during the Holocaust, even though that was obviously the morally correct thing to do; they were in a large group of people who could have intervened by didn’t. 
  • The standard way to break the bystander effect is by pointing out a single individual in the crowd to intervene, which is effectively what happened to people who became rescuers by circumstance that forced them into action. 

6. Memetically This seems to be related to memes but I could not find the word on the web. 

7. The Sequences On page 83, in the article What Makes People Intellectually Active? by Abram Demski, is the following: 

What is the difference between a smart person who has read the Sequences and considers AI x-risk important and interesting, but continues to be primarily a consumer of ideas, and someone who starts having ideas? 

The Sequences is impossible to look up on Google. Fortunately, if you search on the lesswrong site you get the following 

The original sequences were written by Eliezer Yudkowsky with the goal of creating a book on rationality. Someone with the name MIRI has since collated and edited the sequences into Rationality: AI to Zombies. If you are new to Less Wrong, this book is the best place to start. 

Darn. I started with A Map that Reflect the Territory: Essays by the Less Wrong Community 

8. Yed graphs On page 85, in the article What Makes People Intellectually Active? by Abram Demski, is the following: 

I might write one day on topics that interest me, and have sprawling Yed graphs in which I’m trying to make sense of confusing and conflicting evidence. however, I do not know what it is. 

When I asked the lesswrong forum what a Yed graph is I got a pointer to a product, Yed Graph Editor, that generates high quality graphs. Here is the pointer: 

https://www.yworks.com/products/yed 

When I was looking for what a Yed graph was, I did come across that, but I thought it was not how the term was being used in the article. 

9. LW-corpus Everything in the Less Wrong website. 

10. TAP On page 92, in the article What Makes People Intellectually Active? by Abram Demski, is the following: 

Its like the only rationality technique is TAPs, and you only set up taps of the form “resem blance to rationality concept” → “think of rationality concept”. 

When I asked the lesswrong forum what TAP was I found out that it stands for trigger action plan and I got a pointer to another lesswrong article that may be where the term originated. Here is the pointer

 

9 Should You Read This Book? 

Yes. 

Okay, I will elaborate on that. 

In the spirit of the Less Wrong community, I looked at evidence on this question. What kind of evidence? I went through all five books and, for each article, marked it either E for Excellence, G for Good, or M for Meh (none were B for Bad). 

1. Epistemology E-1, G-6, M-3. 

2. Agency E-2, G-2, M-1. 

3. Coordination E-6, G-2, M-2. 

4. Curiosity E-4, G-2, M-4. 

5. Alignment E-2, G-3, M-5. 

What to do with this information? 

1. There are 15 excellent articles! Thats. . . excellent! 

2. There are 15 good articles! Thats. . . good? 

3. There are 15 meh articles! Thats. . . meh. 

(I did not plan to have 15-15-15. Honestly! In the spirit Eliezer Yudkowsky essays on Meta Honesty I tell you that this is not the kind of thing I would lie about.) 

So is 15-15-15 a good ratio? Yes! And note that the good articles are still . . . good. But lets take a more birds-eye view (Do birds really have a good view? Do crows fly “as the crow flies”?) what did I learn from reading these 45 essays? 

1. Many interesting questions were raised that I had not thought of. Here is just a sample: (1) Why do inventions take so long to be invented? (2) Why do I play to much Dominion online? (From Noticing the taste of the Lotus, and it also says why I should stop), 

2. Many interesting meta questions were raised that I had not thought of. Here is just a sample: Can we know what Aristotle would think of AI? 

3. Some answers or inroads on these questions were made. Sometimes the answers were actual answers. Sometimes they gave me things to think about. Both outcomes are fine. 

4. Some newords for my newords hobby! 

So are there any negatives? Yes 

1. There were some words that I had to go look up. (For some of them, I still don’t know what they mean.) This interrupted the flow of the articles. I re-iterate that this can also be seen as a positive as you get to learn new words. 

2. The problem above points to a bigger problem: lesswrong writers (and I presume readers) seem to have their own language and hidden assumptions that it may take an outsider a while to catch onto. 

3. Some of the essays need examples. This may also be part of the bigger problem: lesswrong writers (and I presume readers) may already know of the examples or some context. And again, it makes it a bit rough for outsiders. 

And now for the elephant in the room: Why buy a book if the essays are on the web for free?
I have addressed this issue in the past since I’ve reviewed 3 blog books (see
https://www.cs.umd.edu/~gasarch/BLOGPAPERS/lipton.pdf
https://www.cs.umd.edu/~gasarch/BLOGPAPERS/liptonregan.pdf
https://www.cs.umd.edu/~gasarch/BLOGPAPERS/tao.pdf

)and have written my own blog book: Problems with a point: Explorations in Math and Computer Science by Gasarch and Kruskal (see 

https://www.amazon.com/Problems-Point-Exploring-Computer-Science/dp/9813279974 ). 

Here is an abbreviated quote from my book that applies to the book under review. The Elephant in the Room 

So why should you buy this book if its available for free? 

1. Trying to find which entries are worth reading would be hard. There are a lot of entries and it really is a mixed bag. 

2. There is something about a book that makes you want to read it. Having words on a screen just doesn’t do. I used to think this was my inner-Luddite talking, but younger people agree, especially about math-on-the-screen. 

10 Acknowledgments 

I thank Oliver Habryka for giving me this opportunity to review these books, and Ben Pace for proofreading and useful comments. 

I thank beriukay, Tetraspace Grouping, and gjm (probably not their real names) for clarifying some of the words in phrases that either I did not know or thought I did but was wrong.

I thank Ruby Bloom for helping me get this document in a form people could directly read it, as opposed to having a pointer to it. 

 All of the people acknowledged help make this review less wrong. 


 

57

20 comments, sorted by Highlighting new comments since Today at 6:01 PM
New Comment

A few clarifications on the "new words" section:

 

  • Mediocristan and Extremistan are terms coined (I think) by Nassim Nicholas Taleb, in his book The Black Swan. They don't exactly mean what you say they do; the idea is that Mediocristan is an imaginary country where things have thin-tailed (e.g., normal) distributions and so differences are usually modest in size, and Extremistan is an imaginary country where things have fat-tailed (e.g., power-law) distributions and so differences are sometimes huge, and then you say something belongs to Mediocristan or Extremistan depending on whether the associated distributions are thin-tailed or fat-tailed.
  • The ones from EY's "Local Validity ..." post are, I think, just made up for the occasion and he intends it to be obvious from context at least roughly what they mean.
    • "Code of the Light": how good, principled, rational, nice, honest people behave.
    • "Straw authoritarians": strawman-authoritarians: that is, authoritarians who are transparently stupid and malicious, rather than whatever the most defensible sort of authoritarian might be.
  • The "memetic collapse" thing is a link, to a (spit) Facebook post by EY where he says this: "The Internet is selecting harder on a larger population of ideas, and sanity falls off the selective frontier once you select hard enough [...] the Internet, and maybe television before it, selected much more harshly from a much wider field of memes; and also allowed tailoring content more narrowly to narrower audiences [...] We're looking at a collapse of reference to expertise because deferring to expertise costs a couple of hedons compared to being told that all your intuitions are perfectly right, and at the harsh selective frontier there's no room for that. We're looking at a collapse of interaction between bubbles because there used to be just a few newspapers serving all the bubbles; and now that the bubbles have separated there's little incentive to show people how to be fair in their judgment of ideas for other bubbles [...] It seems plausible to me that *basic* software for intelligent functioning is being damaged by this hypercompetition [...] If you look at how some bubbles are talking and thinking now, "intellectually feral children" doesn't seem like entirely inappropriate language". In other words: changes in how communication works have enabled processes that systematically make us stupider, less tolerant, etc., and also get off my lawn.
  • "Glomarization" does yield search-engine hits when you spell it right; one of them is a wikipedia page entitled "Glomar response" which explains it pretty clearly.
  • I don't think memorizing the Bible or the digits of pi is a great example of "counterfeit understanding"; some of the people who memorize the Bible have a pretty good understanding of what it means, and people who memorize the digits of pi generally (I think) understand what digits mean. One of the best expositions of the counterfeit-understanding thing I know of comes from Richard Feynman writing about the terrible state of science education in Brazil at one time (I don't know whether it's improved since then); see e.g. here.

Regarding digits of pi, N. Gisin promotes the constructivist idea that certain mathematical expressions mean nothing in that they do not relate to anything real. One cannot make a scientific hypothesis involving them. The hundred-billionth twenty digit sequence of pi is smaller than the Plank length.

There's still a well-defined answer to the question of what the digits mean, and indeed of what they mean as digits of pi; e.g., the hundred-billionth digit of pi is what you get by carrying out a pi-computing algorithm and looking at the hundred-billionth digit of its output. Anyway, no one is memorizing that many digits of pi.

[EDITED to add:] On the other hand, people certainly memorize enough digits of pi that, e.g., an error in the last digit they memorize would make a sub-Planck-length difference to the length of a (euclidean-planar) circle whose diameter is that of the observable universe. (Size of observable universe is tens of billions of light-years; a year is 3x10^7 seconds so that's say 10^18 light-seconds; light travels at 3x10^8 m/s so that's < 10^27m; I forget just how short the Planck length is but I'm pretty sure it's > 10^-50m; so 80 digits should be enough, and even I have memorized that many digits of pi (and forgotten many of them again).

Thanks! I will incorporate your comments into the review!

I have incorporated the comments of gym into my review.

Pedantic note: it's "gjm", not "gym"; they're my initials.

Thanks for the review! A link is a good way to post, but an even better way would be to reproduce the text here in this post. (People are generally much less likely to read things they have to click through to.)

Oh yeah, +1 to this. I was surprised the post only had 11 karma when I saw it (William had sent me an advance copy and I’d really liked reading it) but when I saw that it was a link post I understood why.

The review is a latex file and I posted the link to the pdf file that was generated.

Is there an easy way to make it so that its not a link?

I'm afraid that beyond copy-pasting the resulting text, there isn't. :(

Thanks for writing this! As someone who spends a lot of time hanging out on LessWrong and the other fora where these discussions are worked out in realtime, it's fun to read you coming at them fresh.

Great review! One comment:

But the collection bogs down with a series of essays (about 1/3 of the book) on Paul Christano's research on Iterated Amplification. This technique seems to be that the AI has to tell you at regular intervals Why it is doing what it is doing, to avoid the AI gaming the system. That sounds interesting! But then the essays seem to debate whether its a good idea or not. I kept shouting at the book JUST TRY IT OUT AND SEE IF IT WORKS! Did Paul Christano DO this? If so what were the results? We never find out.

First of all, that's not how I would describe the core idea of IDA. Secondly, and much more importantly, we can't try it out yet because our AIs aren't smart enough. For example, we could ask GPT-3 to tell us why it is doing what it is doing... but as far as we can tell it doesn't even know! And if it did know, maybe it could be trained to tell us... but maybe that only works because it's too dumb to know how to convincingly lie to us, and if it were smarter, the training methods would stop working.

Our situation is analogous to someone in medieval europe who has a dragon egg and is trying to figure out how to train and domesticate the dragon so it doesn't hurt anyone. You can't "just try out" ideas because your egg hasn't hatched yet. The best you can do is (a) practice on non-dragons (things like chickens, lizards, horses...) and hope that the lessons generalize, (b) theorize about domestication in general, so that you have a firm foundation on which to stand when "crunch time" happens and you are actually dealing with a live dragon and trying to figure out how to train it, (c) theorize about dragon domestication in particular by imagining what dragons might be like, e.g. "We probably won't be able to put it in a cage like we do with chickens and lizards and horses because it will be able to melt steel with its fiery breath..."

Currently people in the AI risk community are pursuing the analogues of a, b, and c. Did I leave out any option d? I probably did, I'd be interested to hear it!

How would you describe the core ideas of IDA? I will incorporate your answer into the review and hence make it more accurate!  I will also incorporate the reason why we can't try it out yet.

IDA stands for iterated distillation and amplification. The idea is to start with a system M which is aligned (such as a literal human), amplify it into a smarter system Amp(M) (such as by letting it think longer or spin off copies of itself), and then distilling the amplified system into a new system M+ which is smarter than M but dumber than Amp(M), and then repeat indefinitely to scale up the capabilities of the system while preserving alignment.

The important thing is to ensure that the amplification and distillation steps both preserve alignment. That way, we start with an aligned system and continue having aligned systems every step of the way even as they get arbitrarily more powerful. How does the amplification step preserve alignment? Well, it depends on the details of the proposal, but intuitively this shouldn't be too hard--letting an aligned agent think longer shouldn't make it cease being aligned. How does the distillation step preserve alignment? Well, it depends on the details of the proposal, but intuitively this should be possible -- the distilled agent M+ is dumber than Amp(M) and Amp(M) is aligned, so hopefully Amp(M) can "oversee" the training/creation of M+ in a way that results in M+ being aligned also. Intuitively, M+ shouldn't be able to fool or deceive Amp(M) because it's not as smart as Amp(M).

Also note that the AlphaZero algorithm is an example of IDA:

  • The amplification step is when the policy / value neural net is used to play out a number of steps in the game tree, resulting in a better guess at what the best move is than just using the output of the net directly.
  • The distillation step is when the policy / value net is trained to match the output of the game tree exploration process.

I have incorporated your comments and also ack you. Thanks!

So, somewhat inconsequential stylistic thing. I open a PDF link, see it's written in LaTeX, I start expecting something written more or less like an academic paper. This is written in very much a chatty, free-flowing blog post style, with jokes like calling neologisms "newords", so the whole thing feels a bit more off-kilter than was intended. This style of writing would probably work better as an HTML blog post (which could then be posted directly as a Lesswrong post here instead of hosted elsewhere and linked).

Just noting that I had the opposite reaction - I was pleasantly surprised by the fun style after the formal framing, and this made the whole thing more fun for me

Interesting that the format gives a (in this case incorrect) indicator of the type of article it is.

In the early days of word processors the fear was that first drafts would be typed and look like they were far more done than they were, resulting in worse final drafts. I don't think this happened- in fact the opposite has happened- people can just keep on polishing and polishing.