# Recommendations

Predictably Wrong
Argument and Analysis
The Methods of Rationality
316 pages from the LW 1.0 Wiki still need processing.
Hide

# Recent Discussion

I look at graphs like these (From the GPT-3 paper), and I wonder where human-level is:

Gwern seems to have the answer here

GPT-2-1.5b had a cross-entropy validation loss of ~3.3 (based on the perplexity of ~10 in Figure 4, and ). GPT-3 halved that loss to ~1.73 judging from Brown et al 2020 and using the scaling formula (). For a hypothetical GPT-4, if the scaling curve continues for another 3 orders or so of compute (100–1000×) before crossing over and hitting harder diminishing returns, the cross-entropy loss will drop, using to ~1.24 (

6gwern1hTo simplify Daniel's point: the pretraining paradigm [https://www.gwern.net/newsletter/2020/05#why-does-pretraining-work] claims that language draws heavily on important domains like logic, causal reasoning, world knowledge, etc; to reach human absolute performance (as measured in prediction: perplexity/cross-entropy/bpc), a language model must learn all of those domains roughly as well as humans do; GPT-3 obviously has not learned those important domains to a human level; therefore, if GPT-3 had the same absolute performance as humans but not the same important domains, the pretraining paradigm must be false because we've created a language model which succeeds at one but not the other. There may be a way to do pretraining right, but one turns out to not necessarily follow from the other and so you can't just optimize for absolute performance and expect the rest of it to fall into place. (It would have turned out that language models can model easier or inessential parts of human corpuses enough to make up for skipping the important domains; maybe if you memorize enough quotes or tropes or sayings, for example, you can predict really well while still failing completely at commonsense reasoning, and this would hold true no matter how much more data was added to the pile.) As it happens, GPT-3 has not reached the same absolute performance because we're just comparing apples & oranges. I was only talking about WebText in my comment there, but Omohundro is talking about Penn Tree Bank & 1BW. As far as I can tell, GPT-3 is still substantially short of human performance.
2avturchin1hAgreed. Superhuman levels will unlikely be achieved simultaneously in different domain even for universal system. For example, some model could be universal and superhuman in math, but not superhuman in say emotion readings. Bad for alignment.
7gwern1hI think Omohundro is wrong here. His GPT-3 perplexity of 20.5 must be for Penn Tree Bank [https://arxiv.org/pdf/2005.14165.pdf#page=11&org=openai]. However, his 'humans' perplexity of 12 is for a completely different dataset! Tracing his citations from his video to Shen et al 2017 [https://pdfs.semanticscholar.org/7fe4/e308de5b2e5b509be8636c169e7928c242d9.pdf] , which uses 1 Billion Word Benchmark [https://arxiv.org/abs/1312.3005]. 1BW was not reported in the GPT-3 paper because it was one of the datasets affected by contamination and dropped from evaluation [https://arxiv.org/pdf/2005.14165.pdf#subsection.3.1]. I've never read the Penn Tree Bank [https://repository.upenn.edu/cgi/viewcontent.cgi?article=1246&context=cis_reports] or 1BW so I can't compare. At best, I'd guess that if 1BW is collected from "English newspapers", that's less diverse than the Brown Corpus [https://en.wikipedia.org/wiki/Brown_Corpus] which goes beyond newspapers, and so perplexities will be lower on 1BW than PTB. However, some searching turned up no estimates for human performance on either PTB or WebText, so I can't guess what the real human vs GPT-3 comparison might be. I'm also a little puzzled what the 'de-tokenizers' are that the Radford GPT paper mentions are necessary for doing the perplexity calculations at all... (There are a lot of papers estimating English text entropy in terms of bits per character, but because of the BPEs and other differences, I don't know how to turn that into a perplexity which could be compared to the reported GPT-3 performance on Penn Tree Bank/WebText/LAMBADA/etc, which is why I didn't include a human baseline in my comment there - I just don't know.) No.

Looking more into reported perplexities, the only benchmark which seems to allow direct comparison of human vs GPT-2 vs GPT-3 is LAMBADA.

LAMBADA was benchmarked at a GPT-2 perplexity of 8.6, and a GPT-3 perplexity of 3.0 (zero-shot) & 1.92 (few-shot). OA claims in their GPT-2 blog post (but not the paper) that human perplexity is 1-2, but provides no sources and I couldn't find any. (The authors might be guessing based on how LAMBADA was constructed: examples were filtered by whether two independent human raters provided the same right answer.) Since L... (read more)

Sometimes there's a concept that can be difficult to understand when entangle with everything else that needs to be understood about our physics.

If you isolate that concept in a simpler universe, it makes it easier to explain how the concept works.

What are such examples?

(I feel like I asked a similar question somewhere at some point, but can't find it)

Euclidean geometry (which is 2500 years old), Newtonian physics and the special theory of relativity immediately come to mind.

and error and hyperparameter tuning that would probably increase the cost several-fold.

All of which was done on much smaller models and GPT-3 just scaled up existing settings/equations - they did their homework. That was the whole point of the scaling papers, to tell you how to train the largest cost-effective model without having to brute force it! I think OA may well have done a single run and people are substantially inflating the cost because they aren't paying any attention to the background research or how the GPT-3 paper pointedly omits any discussion of hyperparameter tuning and implies only one run (eg the dataset contamination issue).

2Mati_Roy7hWhen you're sufficiently curious, everything feels like a rabbit hole. Challenge me by saying a very banal statement ^_^ x-post: https://www.facebook.com/mati.roy.09/posts/10158883322499579
2mr-hire1hI tired because I didn't sleep well.
2Mati_Roy7hSort of smashing both of those saying together: > “If you wish to make an apple pie from scratch, you must first invent the universe.” -Carl Sagan > "Any sufficiently analyzed magic is indistinguishable from science!"-spin of Clarke's third law to get: Sufficiently understanding an apple pie is indistinguishable from understanding the world.

This Sunday at 12pm (PT), we're running another session of "lightning talks" by curated LessWrong authors (see here for previous weeks' transcripts).

• For the first hour, we will have a series of lightning talks each lasting about 5 minutes followed by discussion. The talks will be short and focus on presenting one core idea well, rather than rushing through a lot of content.
• From 1PM to 2PM, we'll have a hangout in breakout rooms. If you are not interested in the talks, feel free to just show up for this part (or the other way around).
• We want to give top LessWrong writers an interesting space to
1Pongo2hWondering if these weekly talks should be listed in the Community Events section?

Yeah, we have an open PR that adds online events to the Community section and the navigation menu on the left. Currently all events need a physical location, which is obviously pretty dumb during a global pandemic where most events are online, but it obviously made sense in the pre-pandemic world, so we've been encouraging a number of online meetup organizers to post them as normal posts instead.

A Priori
Show Highlight

Traditional Rationality is phrased as social rules, with violations interpretable as cheating: if you break the rules and no one else is doing so, you're the first to defect - making you a bad, bad person.  To Bayesians, the brain is an engine of accuracy: if you violate the laws of rationality, the engine doesn't run, and this is equally true whether anyone else breaks the rules or not.

Consider the problem of Occam's Razor, as confronted by Traditional philosophers.  If two hypotheses fit the same observations equally well, why believe the simpler one is more likely to be true?

1Idan Arye4hI'm not sure I follow - what do you mean by "didn't work"? Shouldn't it work the same as the heliocentric theory, seeing how every detail in its description is identical to the heliocentric model?
1TAG4hYou keep assuming verificationism in order to prove verificationism. They assert different things because they mean different things, because the dictionary meanings are different. In the thought experiment we are considering , the contents of the box can be er be tested. Nonetheless $10 and$100 mean different things.

They assert different things because they mean different things, because the dictionary meanings are different.

The Quotation is not the Referent. Just because the text describing them is different doesn't mean the assertions themselves are different.

Eliezer identified evolution with the blind idiot god Azathoth. Does this make evolution a religious Lovecraftian concept?

Scott Alexander identified the Canaanite god Moloch with the principle that forces you to sacrifice your values for the competition. Does this make that principle an actual god? Should we pr... (read more)

1Idan Arye3hI'm not sure you realize how strong a statement "the contents of the box can be never be tested" is. It means even if we crack open the box we won't be able to read the writing on the bill. It means that even if we somehow tracked all the $20 and all the$100 bills that were ever printed, their current location, and whether or not they were destroyed, we won't be able to find one which is missing and deduce that it is inside the box. It means that even if we had a powerful atom-level scanner that can accurately map all the atoms in a given volume and put the box inside it, it won't be able to detect if the atoms are arranged like a $20 bill or like a$100 bill. It means that even if a superinteligent AI capable of time reversal calculations tried to simulate a time reversal it wouldn't be able to determine the bill's value. It means, that the amount printed on that bill has no effect on the universe, and was never affected by the universe. Can you think of a scenario where that happens, but the value of dollar bill is still meaningful? Because I can easily describe a scenario where it isn't: Dollar bills were originally "promises" for gold. They were signed by the Treasurer and the secretary of the Treasury because the Treasury is the one responsible for fulfilling that promise. Even after the gold standard was abandoned, the principle that the Treasury is the one casting the value into the dollar bills remains. This is why the bills are still signed by the Treasury's representatives. So, the scenario I have in mind is that the bill inside the box is a special bill - instead of a fixed amount, it says the Treasurer will decide if it is worth 20 or 100 dollars. The bill is still signed by the Treasurer and the secretary of the Treasury, and thus has the same authority as regular bills. And, in order to fulfill the condition that the value of the bill is never known - the Treasurer is committed to never decide the worth of that bill. Is it still meaningful to ask

A friend observed that fewer people in the effective altruism movement are married than you might expect. I was curious: what are marriage rates like within the EA community? The 2018 EA Survey asked about relationship status, and we can look at how that varies by age:

I'm using "ever married" for people who are currently married or have ever been married, including people who are now divorced, widowed, or separated. Since some of these buckets might be pretty small, let's add sample size information:

The anonymized survey data doesn't have 35-44 data, and the 65+ gro... (Read more)

4Vanessa Kosoy9hAnother factor that might be in play is, if you're married with children then you have responsibilities towards your family, and that is an incentive against spending resources on altruistic causes.
1Mati_Roy15hIf anyone wants to do that, I would be really interested in seeing such an analysis but for children, as I see relationship status mostly as a proxy for that, but probably not a very good one. If no one does it, I might do it at some point.
3jefftk6hWas number of children ever asked on an EA survey?

I just made a quick search, and it seems like it never was :o

There's an interesting corollary of semi-decidable languages that sounds like the kind of cool fact you would teach in class, but somehow I've never heard or read it anywhere.

A semi-decidable language is a set over a finite alphabet such that there exists a Turing machine such that, for any , if you run on input , then [if it halts after finitely many steps and outputs '1', whereas if , it does something else (typically, it runs forever)].

The halting problem is semi-decidable. I.e., the language of all bit codes of Turing Machines ... (read more)

Against Victimhood
Show Highlight

Cross-posted, as always, from Putanumonit.

I have written many posts in the shape of giving life advice. I hear back from readers who take it and those who refuse it. Either is good — I’m just a guy on the internet, to be consumed as part of a balanced diet of opinions.

But occasionally I hear: who are you to give life advice, your own life is so perfect! This sounds strange at first. If you think I’ve got life figured out, wouldn’t you want my advice? I think what they mean is that I haven’t had to overcome the hardships they have, hostile people and adverse circumstances.

I talk quite ofte... (Read more)

1George8hI agree that victim mentality is useless, but reminding oneself that you were a victim of certain things isn't. Outside of, maybe, a pure objectivist, reminding yourself that a certain system or group is against you can serve as a good driver of rational actions, i.e. you can use it to tone down your empathy and act in a more self-interested way towards that group. Of course, the key word here is "self-interest", the problems you rightfully point out with victim mentality is that people often act upon it in ways that aren't self-interested, where they go into depressive or aggressive spirals that are of not help to themselves and at most (though seldom) just serve to hurt their victimizer, though often at greater personal cost.
4Ericf17hI think as a meta level the relocated comment is still important. People who are systematically oppressed, might have a different perspective than Jacob, who has been transiently hurt. For example, I have seen several different black people with an audience support contextual victim-hood, but the stance from white men is almost all in agreement with Jacob. As someone with neither history, I won't further speculate.

A systematically oppressed group can still be wrong. Being oppressed gives you an experience other people don't have, but doesn't give you epistemic superpowers. You can still derive wrong conclusions, despite having access to special data.

Anecdote time: When I was a kid, I was bullied by someone who did lots of sport. As a result, I developed an unconscious aversion to sport. (Because I didn't want to be like him, and I didn't want to participate in things that reminded me of him.) Obviously, this only further reduced the quality of my... (read more)

2lionhearted7hYeah, I have first-pass intuitions but I genuinely don't know. In a era with both more trustworthy scholarship (replication crisis, etc) and less polarization, I think this would actually be an amazing topic for a variety of longitudinal studies. Alas, probably not possible right now.
Rationality for Kids?
Show Highlight

So I really appreciate the lessons I've learned from "Rationality", but I wish I had learned them earlier in life. We are now homeschooling my kids, and I want to volunteer to teach my kids plus others who are interested lessons about thinking rationally.

Does anyone have recommendations on how to put together a curriculum which gets at the core ideas of rationality, but is oriented towards young kids? Some criteria:

Children will likely range from 7-11, meaning they should be simple concepts and require very little prior knowledge and only the simplest math.

Lessons should be int... (Read more)

I am so glad this question is here, as it's very relevant to my post a few weeks back about Effective Children Education.

By the way, I recommend following Duncan Sabien (referenced in the post below) on Facebook, he has good posts about children edu, e.g. his speech for sixth-graders (referenced by someone else here - but she picked the good parts).

As mentioned below, Julia Galef also sometimes mentions something related, but I haven't found much

1Answer by Xkcd10hFWIW I wanted to plug an older thread on this topic: https://www.lesswrong.com/posts/wSEGsjDPtSWkAtXce/teaching-rationality-to-kids [https://www.lesswrong.com/posts/wSEGsjDPtSWkAtXce/teaching-rationality-to-kids]

Going by the Risks from Learned Optimization sequence, it's not clear if mesa-optimization is a big threat if the model continues to be updated throughout deployment. I suspect this has been discussed before (links welcome), but I didn't find anything with a quick search.

Lifelong/online/continual learning is popular and could be the norm in future. I'm interested in how that (and other learning paradigms, if relevant) fits into beliefs about mesa-optimization risk.

If you believe the arguments hold up under a lifelong learning paradigm: is that because there could still be enough time between u... (Read more)

Thanks. I think I understand, but I'm still confused about the effect on the risk of catastrophe (i.e. not just being pseudo-aligned, but having a catastrophic real-world effect). It may help to clarify that I was mainly thinking of deceptive alignment, not other types of pseudo-alignment. And I'll admit now that I phrased the question stronger than I actually believe, to elicit more response :)

I agree that the probability of pseudo-alignment will be the same, and that an unrecoverable action could occur despite the threat of modification. I'm interested i... (read more)

Crossposted from Vessel Project.

My last article, “Life Through Quantum Annealing” was an exploration of how a broad range of physical phenomena — and possibly the whole universe — can be mapped to a quantum computing process. But the article simply accepts that quantum annealing behaves as it does; it does not attempt to explain why. That answer lies somewhere within a “true” description of quantum mechanics, which is still an outstanding problem.

Despite the massive predictive success of quantum mechanics, physicists still can’t agree on how it... (Read more)

Other than the assumption that all universes are inherently loopy, what mechanism would actually prevent such a universe (structured as a loop with multiple timelines) from itself being the original cause for a linear timeline (one that dissolves into a maximally entropic state and stops evolving)? Seems like these offshoots could be extremely numerous compared with timelines that successfully renew an earlier state.

Moral public goods
Show Highlight

Automatically crossposted

Suppose that a kingdom contains a million peasants and a thousand nobles, and:

• Each noble makes as much as 10,000 peasants put together, such that collectively the nobles get 90% of the income.
• Each noble cares about as much about themselves as they do about all peasants put together.
• Each person’s welfare is logarithmic in their income.

Then it’s simultaneously the case that:

1. Nobles prefer to keep money for themselves rather than donate it to peasants—money is worth 10,000x as much to a peasant, but a noble cares 1,000,000 times less about the peasant’s welfare.
2. Nobles pref

Potentially relevant new paper:

The logic of universalization guides moral judgment
To explain why an action is wrong, we sometimes say: “What if everybody did that?” In other words, even if a single person’s behavior is harmless, that behavior may be wrong if it would be harmful once universalized. We formalize the process of universalization in a computational model, test its quantitative predictions in studies of human moral judgment, and distinguish it from alternative models. We show that adults spontaneously make moral judgments

The obvious reason that Moloch is the enemy is that it destroys everything we value in the name of competition and survival. But this is missing the bigger picture. We value what we value because, in our ancestral environment, those tended to be the things that helped us with competition and survival. If the things that help us compete and survive end up changing, then evolution will ensure that the things we value change as well.

To borrow a metaphor: Elua cheats. The hedonic treadmill has nothing on the axiological treadmill.

Consider a thought experiment. In Meditations on Moloch, Scott Alexa... (Read more)

Eventually - sure. But for that eventuality to take place, the "electrical shock tyranny" would have to be more resilient than any political faction we've known of and persist for thousands of year. I doubt that this would be possible.

Hi all, I've been working on some AI forecasting research and have prepared a draft report on timelines to transformative AI. I would love feedback from this community, so I've made the report viewable in a Google Drive folder here.

With that said, most of my focus so far has been on the high-level structure of the framework, so the particular quantitative estimates are very much in flux and many input parameters aren't pinned down well -- I wrote the bulk of this report before July and have received feedback since then that I haven't fully incorporated yet. I'd prefer ... (Read more)

This is superb, and I think it'll have a substantial impact on debate going work. Great work!

• Short-term willingness to spend is something I've been thinking a lot about recently. My beliefs about expansion rates are strangely bimodal:
• If AI services are easy to turn into monopolies - if they have strong moats - then the growth rate should be extraordinary as legacy labour is displaced and the revenues are re-invested into improving the AI. In this case, blowing through \$1bn/run seems plausible.
• If AI services are easy to commodify - weak or no moat
6Daniel Kokotajlo11hThanks for doing this, this is really good! Some quick thoughts, will follow up later with more once I finish reading and digesting: --I feel like it's unfair to downweight the less-compute-needed scenarios based on recent evidence, without also downweighting some of the higher-compute scenarios as well. Sure, I concede that the recent boom in deep learning is not quite as massive as one might expect if one more order of magnitude would get us to TAI. But I also think that it's a lot bigger than one might expect if fifteen more are needed! Moreover I feel that the update should be fairly small in both cases, because both updates are based on armchair speculation about what the market and capabilities landscape should look like in the years leading up to TAI. Maybe the market isn't efficient; maybe we really are in an AI overhang. --If we are in the business of adjusting our weights for the various distributions based on recent empirical evidence (as opposed to more a priori considerations) then I feel like there are other pieces of evidence that argue for shorter timelines. For example, the GPT scaling trends seem to go somewhere really exciting if you extrapolate it four more orders of magnitude or so. --Relatedly, GPT-3 is the most impressive model I know of so far, and it has only 1/1000th as many parameters as the human brain has synapses. I think it's not crazy to think that maybe we'll start getting some transformative shit once we have models with as many parameters as the human brain, trained for the equivalent of 30 years. Yes, this goes against the scaling laws, and yes, arguably the human brain makes use of priors and instincts baked in by evolution, etc. But still, I feel like at least a couple percentage points of probability should be added to "it'll only take a few more orders of magnitude" just in case we are wrong about the laws or their applicability. It seems overconfident not to. Maybe I just don't know enough about the scaling laws and stuff
5abergal13hFrom Part 4 of the report: I'm not sure I totally follow why this should be true-- is this predicated on already assuming that the computation to train a neural network equivalent to a brain with N neurons scales in some particular way with respect to N?
4Daniel Kokotajlo1dAn important question IMO is whether or not those massive expenditures are for making large neural nets, as opposed to training them for a long time or having loads of them in parallel or something else entirely like researcher salaries. My guess is that Tesla, Waymo, etc. use neural nets 2+ orders of magnitude smaller than GPT-3 (as measured by parameter count.) Ditto for call center automation, robots, etc.

Economists say free trade is good because of "comparative advantage". But what is comparative advantage? Why is it good?

This is sometimes considered an arcane part of economics. (Wikipedia defines it using "autarky".) But it's really a very simple idea. Anyone can use it to understand the world and make decisions.

# I Islands

Say you live alone on an island.

Each week you gather and eat 10 coconuts and 10 bananas. It takes you five minutes to gather a coconut, and 10 minutes for a banana. Thus, you work 150 minutes per week.

You Need Time to gather one Time You Spend
Coconuts 10 5 minute

The ZOPA issue you raise actually disappears when the trade involves a lot of players, not only two.

Let's say we have N players. The first consequence would be the existence of a unique price. A lot of mechanisms can lead to a unique price, you could spy on your neighbors to see if they get a better deal than you do, or you could just have a price in mind which gets updated each time you get a deal or you don't - If I get a deal, that's suspicious, my price wasn't good enough, I'll update it. If I don't, I was too greedy, I&ap... (read more)

13Raemon15hCurated. I've read several explanations of comparative advantage over the years, and I found this to be among the most clear and accessible ones that I've read. I also liked the juxtaposition with ZOPA. I found Villiam's followup comment [https://www.lesswrong.com/posts/eLRSCC7r4KinuxqZX/comparative-advantage-and-when-to-blow-up-your-island?commentId=gNFCrAn5fmLMGPHFt#IV_ZOPA] additionally helpful for solidifying how a couple different economics principles fit together.

Summary: I think it’s important for surveys about the future of technology or society to check how people's predictions of the future depend on their beliefs about what actions or responsibilities they and others will take on. Moreover, surveys should also help people to calibrate their beliefs about those responsibilities by collecting feedback from the participants about their individual plans. Successive surveys could help improve the groups calibration as people update their responsibilities upon hearing from each other. Further down, I’ll argue that not doing this — i.e. surveying... (Read more)

5David Krueger14hI think something like this might be pretty valuable among EAs/rationalists dealing with COVID. We clearly have a lot of people doing independent research, and a lot of it might be redundant. I think EA/rats are mostly patting ourselves on the back RE how well we predicted COVID, and not thinking about how we could've done even better.I think we could've done better at coordinating within these communities, e.g. in an ideal world, we might've done better (and might do better) by coordinating efforts to channel COVID-efforts into producing community-facing or public-facing reports / recommendations. And we might've empowered some community members to be our "COVID czars" (e.g. by a combination of volunteering, voting, predicting who would do a good job, and funding them), and thus allowed many community members to spend a lot less energy on COVID.

Yeah, I've been thinking the same. It feels like there are a number of action-coordination dimension where we could have done substantially better (a substantial number of which will still be relevant for a while, so there is still time to improve).

Here are some somewhat unconnected unconfident thoughts on criticism that I’ve been thinking about recently.

---

A while ago, when I started having one-on-ones with people I was managing, I went into the meetings with a list of questions I was going to ask them. After the meetings, I’d look at my notes and realize that almost all the value of the meeting came from the part where I asked them what the worst parts of their current work situation were and what the biggest mistakes I was making as a manager were.

I started thinking that almost the whole point of meetings like that is to... (Read more)

Resonating with what Romeo's saying. For instance, in this quote in the original...

If I felt more secure and more superior to the people in the conversation, I think it would be easier to behave better

...I would differentiate "more secure" and "more superior". There's a version of the latter that is quite contemptuous, which is usually a whole layer on top of insecurity.

1jmh16hI wonder if there is not also another way to approach your goal. This may not get to everything you wish to improve but perhaps gets out of the whole eliciting honest and constructive criticism from your direct reports. For the most part I see a people manager's job as not only making sure they are doing their job but more importantly have the support and resources needed to accomplish their responsibilities. So an indirect way of assessing your own performance on the job as their manager is to inquire about the challenges they are facing and then consider what role you can play in removing some of the challenges.
4Alexei21hThis post reminds me of this one: https://www.lesswrong.com/posts/jkvAYpzjk8DF5tszD/conflict-the-rules-of-engagement-and-professionalism [https://www.lesswrong.com/posts/jkvAYpzjk8DF5tszD/conflict-the-rules-of-engagement-and-professionalism] And I agree with both pretty strongly. I think one way to get deep criticism / feedback is to start a “success spiral” in that direction. That is, start with requesting and responding to small feedback that’s easy for the other person to give and for you to receive and implement.

## Introduction

This was originally posted here.

I've been researching, for quite some time, the prospect of machine learning on a wider variety of data types than normally considered; things other than tables of numbers and categories. In particular, I want to do ML for program and proof synthesis which requires, at the very least, learning the structures of trees or graphs which don't come from a differentiable domain. Normal ML algorithms can't handle these; though some recent methods, such as graph neural networks and transformers, can be adapted to this domain with some promising results. How... (Read more)

This post was inspired by orthonormal's post Developmental Stages of GPTs and the discussion that followed, so only part of it is original.

First I'll aim to provide a crisper version of the argument for why GPT wants to mesa-optimize. Specifically, I'll explain a well-known optimization algorithm used in text generation, and argue that GPT can improve performance on its objective by learning to implement something like this algorithm internally.

Then I'll offer some ideas of mine about how we might change this.

# Explanation of beam search

Our goal is to generate plausible text. We evaluate whe... (Read more)

6gwern17hBeam search has never worked for likelihood-trained NNs, since at least char-RNNs back in 2015. Beam search does trigger repetition and other pathologies in GPT, see "The Curious Case of Neural Text Degeneration", Holtzman et al 2019 [https://arxiv.org/abs/1904.09751]. And while unlikelihood training seems to help, it's not a silver bullet, and is a bit ad hoc (especially if you think of it in terms of reinforcement learning).
2steve215218hSuppose I said (and I actually believe something like this is true): "GPT often considers multiple possibilities in parallel for where the text is heading—including both where it's heading in the short-term (is this sentence going to end with a prepositional phrase or is it going to turn into a question?) and where it's heading in the long-term (will the story have a happy ending or a sad ending?)—and it calculates which of those possibilities are most likely in light of the text so far. It chooses the most likely next word in light of this larger context it figured out about where the text is heading." If that's correct, would you call GPT a mesa-optimizer?
6John_Maxwell18hWell I suppose mesa-optimization isn't really a binary is it? Like, maybe there's a trivial sense in which self-attention "mesa-optimizes" over its input when figuring out what to pay attention to. But ultimately, what matters isn't the definition of the term "mesa-optimization", it's the risk of spontaneous internal planning/optimization that generalizes in unexpected ways or operates in unexpected domains. At least in my mind. So the question is whether this considering multiple possibilities about text stuff could also improve its ability to consider multiple possibilities in other domains. Which depends on whether the implementation of "considering multiple possibilities" looks more like beam search vs very domain-adapted heuristics.

I think the Transformer is successful in part because it tends to solve problems by considering multiple possibilities, processing them in parallel, and picking the one that looks best. (Selection-type optimization.) If you train it on text prediction, that's part of how it will do text prediction. If you train it on a different domain, that's part of how it will solve problems in that domain too.

I don't think GPT builds a "mesa-optimization infrastructure" and then applies that infrastructure to language modeling. I don't think it needs to. I think the Tr... (read more)

Epistemic Status: I only know as much as anyone else in my reference class (I build ML models, I can grok the GPT papers, and I don't work for OpenAI or a similar lab). But I think my thesis is original.

Related: Gwern on GPT-3

For the last several years, I've gone around saying that I'm worried about transformative AI, an AI capable of making an Industrial Revolution sized impact (the concept is agnostic on whether it has to be AGI or self-improving), because I think we might be one or two cognitive breakthroughs away from building one.

GPT-3 has made me move up my timelines, because it makes me... (Read more)

BTW with regard to "studying mesa-optimization in the context of such systems", I just published this post: Why GPT wants to mesa-optimize & how we might change this.

I'm still thinking about the point you made in the other subthread about MAML. It seems very plausible to me that GPT is doing MAML type stuff. I'm still thinking about if/how that could result in dangerous mesa-optimization.