Nominated Posts for the 2019 Review

Posts need at least 2 nominations to continue into the Review Phase.
Nominate posts that you have personally found useful and important.
Sort by: fewest nominations

2019 Review Discussion

(Written for Arbital in 2017.)


Introduction to the introduction: Why expected utility?

So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'.

And before we even ask what those are, we might first ask, Why?

There's a mathematical formalism, 'expected utility', that some people invented to talk about making decisions. This formalism is very academically popular, and appears in all the textbooks.

But so what? Why is that necessarily the best way of making decisions under every kind of circumstance? Why would an Artificial Intelligence care what's academically popular? Maybe there's some better way of thinking about rational agency? Heck, why is this formalism popular in the first place?

We can ask the same kinds of questions about probability theory:

Okay, we have this mathematical formalism in which...

31johnswentworth4hReviewTHINGS TO TAKE AWAY FROM THE ESSAY First and foremost: Yudkowsky makes absolutely no mention whatsoever of the VNM utility theorem [https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem]. This is neither an oversight nor a simplification. The VNM utility theorem is not the primary coherence theorem. It's debatable whether it should be considered a coherence theorem at all. Far and away the most common mistake when arguing about coherence (at least among a technically-educated audience) is for people who've only heard of VNM to think they know what the debate is about. Looking at the top-voted comments on this essay: * the first links to a post which argues against VNM on the basis that it assumes probabilities and preferences are already in the model * the second argues that two of the VNM axioms are unrealistic I expect that if these two commenters read the full essay, and think carefully about how the theorems Yudkowsky is discussing differ from VNM, then their objections will look very different. So what are the primary coherence theorems, and how do they differ from VNM? Yudkowsky mentions the complete class theorem in the post, Savage's theorem comes up in the comments, and there are variations on these two and probably others as well. Roughly, the general claim these theorems make is that any system either (a) acts like an expected utility maximizer under some probabilistic model, or (b) throws away resources in a pareto-suboptimal manner. One thing to emphasize: these theorems generally do not assume any pre-existing probabilities (as VNM does); an agent's implied probabilities are instead derived. Yudkowsky's essay does a good job communicating these concepts, but doesn't emphasize that this is different from VNM. One more common misconception which this essay quietly addresses: the idea that every system can be interpreted as an expected utility maximizer. This is technically true, in the sense that we can always pick a ut

No need to think about editing at this point, we'll sort out all editing issues after the review. (And for this specific issue, all hyperlinks in the books have been turned into readable footnotes, which works out just fine in the vast majority of cases.)

3Tetraspace Grouping4dNominationI have used this post quite a few times as a citation when I want to motivate the use of expected utility theory as an ideal for making decisions, because it explains how it's not just an elegant decisionmaking procedure from nowhere but a mathematical inevitability of the requirements to not leave money on the table or to accept guaranteed losses. I find the concept of coherence theorems a better foundation than the normal way this is explained, by pointing at the von Neumann-Morgensten axioms and saying "they look true".
4Zack_M_Davis4dNominationThis is the second nomination in order to get this in the official Review pool, in order for John S. Wentworth's future "more detailed review" [https://www.lesswrong.com/posts/RQpNHSiWaXTvDxt6R/coherent-decisions-imply-consistent-utilities?commentId=ADzemSrCbCo5NQzBa] to be in the official Review pool.

The Introduction

The Curse of the Counterfactual is a side-effect of the way our brains process is-ought distinctions. It causes our brains to compare our past, present, and future to various counterfactual imaginings, and then blame and punish ourselves for the difference between reality, and whatever we just made up to replace it.

Seen from the outside, this process manifests itself as stress, anxiety, procrastination, perfectionism, creative blocks, loss of motivation, inability to let go of the past, constant starting and stopping on one goal or frequent switching between goals, low self-esteem and many other things. From the inside, however, these counterfactuals can feel more real to us than reality itself, which can make it difficult to even notice it's happening, let alone being able to stop it.

Unfortunately, even though each specific instance of the curse can be defused using relatively simple techniques, we can’t just remove the parts of

...
8Jacobian3dNominationI've come across a lot of discussion recently about self-coercion, self-judgment, procrastination, shoulds, etc. Having just read it, I think this post is unusually good at offering a general framework applicable to many of these issues (i.e., that of the "moral brain" taking over). It's also peppered with a lot of nice insights, such as why feeling guilty about procrastination is in fact moral licensing that enables procrastination. While there are many parts of the posts that I quibble with (such as the idea of the "moral brain" as an invariant specialized module), this post is a great standalone introduction and explanation of a framework that I think is useful and important.
4pjeby2dI'm curious what the objection to the "moral brain" term is. As used in this article, it's mainly shorthand for a complex interaction of social learning, biases, specialized emotions, and prospect theory's notion of a baseline expectation of what one "ought" to have or be able to get in a specific circumstance or in exchange for a specific cost. (Or conversely what some specific thing "ought" to cost.)
2Jacobian1dThis statement for example: > Motivating you to punish things is what that part of your brain does, after all; it’s not like it can go get another job! I'm coming more from a predictive processing / bootstrap learning / constructed emotion paradigm in which your brain is very flexible about building high-level modules like moral judgment and punishment. The complex "moral brain" that you described is not etched into our hardware and it's not universal, it's learned. This means it can work quite differently or be absent in some people, and in others it can be deconstructed or redirected — "getting another job" as you'd say. I agree that in practice lamenting the existence of your moral brain is a lot less useful than dissolving self-judgment case-by-case. But I got a sense from your description that you see it as universal and immutable, not as something we learned from parents/peers and can unlearn. P.S. Personal bias alert — I would guess that my own moral brain is perhaps in the 5th percentile of judginess and desire to punish transgressors. I recently told a woman about EA and she was outraged about young people taking it on themselves to save lives in Africa when billionaires and corporations exist who aren't helping. It was a clear demonstration of how different people's moral brains are.

Personal bias alert — I would guess that my own moral brain is perhaps in the 5th percentile of judginess and desire to punish transgressors

Note that this is not evidence in favor of being able to unlearn judginess, unless you're claiming you were previously at the opposite end of the spectrum, and then unlearned it somehow. If so, then I would love to know what you did, because it would be 100% awesome and I could do with being a lot less judgy myself, and would love a way to not have to pick off judgmental beliefs one at a time.

If you have something ... (read more)

[EDIT: A crucial consideration was pointed out in the comments. For all the designs I've looked at, it's cheaper to just get a heat exchanger and ventilation fans, and blow the air outside/pull it inside and eat the extra heating costs/throw on an extra layer of clothing, than it is to buy a CO2 stripper. There's still an application niche for poorly ventilated rooms without windows, but that describes a lot fewer occasions than my previous dreams of commercial use.]


So, I have finally completed building a CO2 stripper that removes CO2 from the air to (hopefully) improve cognition in environments with high CO2 levels. In California, the weather is pretty good so it's easy to just crack a window at any point during the year, but other areas get quite cold during the winter or quite warm during summer and it's infeasible to open a window unless you want to...

2Ben Pace4dNominationI want to nominate this as a thoughtful reflection on a project, but it matters to me whether the stripper worked. Diffractor, did you test it as Gwern suggested, and did it remove CO2 from a room?

It is currently disassembled in my garage, will be fully tested when the 2.0 version is built, and the 2.0 version has had construction stalled for this year because I've been working on other projects. The 1.0 version did remove CO2 from a room as measured by a CO2 meter, but the size and volume made it not worthwhile.

Response To: Who Likes Simple Rules?

Epistemic Status: Working through examples with varying degrees of confidence, to help us be concrete and eventually generalize.

Robin Hanson has, in his words, “some puzzles” that I will be analyzing. I’ve added letters for reference.

  • A] People are often okay with having either policy A or policy B adopted as the standard policy for all cases. But then they object greatly to a policy of randomly picking A or B in particular cases in order to find out which one works better, and then adopt it for everyone.
  • B] People don’t like speed and red-light cameras; they prefer human cops who will use discretion. On average people don’t think that speeding enforcement discretion will be used to benefit society, but 3 out of 4 expect that it will benefit them personally. More generally people seem to like a crime law system where at least a dozen different people are authorized to
  • ...
2habryka2dNominationI still really like this post, but also still think it really could use some cleaning up. I think a cleaned up version of this post could probably make my top 10 of posts from 2019, and so it seems worth nominating for the review.

Might be helpful to say more about what it would mean to clean up this particular post?

This post is eventually about partial agency. However, it's been a somewhat tricky point for me to convey; I take the long route. Epistemic status: slightly crazy.


I've occasionally said that everything boils down to credit assignment problems.

One big area which is "basically credit assignment" is mechanism design. Mechanism design is largely about splitting gains from trade in a way which rewards cooperative behavior and punishes uncooperative behavior. Many problems are partly about mechanism design:

  • Building functional organizations;
  • Designing markets to solve problems (such as prediction markets, or kidney-transplant trade programs);
  • Law, and law enforcement;
  • Practical coordination problems, such as splitting rent;
  • Social norms generally;
  • Philosophical issues in ethics/morality (justice, fairness, contractualism, issues in utilitarianism).

Another big area which I claim as "basically credit assignment" (perhaps more controversially) is artificial intelligence.


In the 1970s, John Holland kicked off the investigation of learning classifier systems. John Holland had recently invented the Genetic Algorithms paradigm, which applies an evolutionary paradigm to...

2habryka2dNominationMost of my points from my curation notice still hold. And two years later, I am still thinking a lot about credit assignment as a perspective on many problems I am thinking about.

This seems like one I would significantly re-write for the book if it made it that far. I feel like it got nominated for the introductory material, which I wrote quickly in order to get to the "main point" (the gradient gap). A better version would have discussed credit assignment algorithms more.

Introduction

Internal Family Systems (IFS) is a psychotherapy school/technique/model which lends itself particularly well for being used alone or with a peer. For years, I had noticed that many of the kinds of people who put in a lot of work into developing their emotional and communication skills, some within the rationalist community and some outside it, kept mentioning IFS.

So I looked at the Wikipedia page about the IFS model, and bounced off, since it sounded like nonsense to me. Then someone brought it up again, and I thought that maybe I should reconsider. So I looked at the WP page again, thought “nah, still nonsense”, and continued to ignore it.

This continued until I participated in CFAR mentorship training last September, and we had a class on CFAR’s Internal Double Crux (IDC) technique. IDC clicked really well for me, so I started using it a lot and also facilitating it to...

5johnpeterwest2dWow. So glad I ended up on a Goodreads review for the IFS main book and this article was recommended. Just wanted to say thank you for the metaphor presented, really helpful.

Glad it was of use! :)

This is Part I of the Specificity Sequence

Specificity turns any argument into a game of 3D Chess. Just when it seems like your argument is a clash of two ground armies, you can use your specificity powers to take off and fly all over the conceptual landscape. Fly, I say!

"Uber exploits its drivers!"

Want to see what a 3D Chess argument looks like? Behold the conversation I had the other day with my friend “Steve”:

Steve: Uber exploits its drivers by paying them too little!

Steve’s statement was a generic one, lacking specific detail. So I shot back with my own generic counterpoint:

Liron: No, job creation is a force for good at any wage. Uber creates increased demand for labor, which drives wages up in the economy as a whole.

You can see I was showing off my mastery of basic economics. This seemed like a good move to me at the time, but...

Nominating this whole sequence. It’s a blast, even if reading it felt very jumpy and stop-and-start. And I love how it’s clearly a self-example. But overall it’s just some really key lessons, taught better than any other place on the internet that I know.

2habryka2dNominationI really liked this whole sequence. I think I have some disagreements with its presentation (a bit too loud for my taste), but I have actually repeatedly mentally referred back to the idea of specificity that is proposed here, and the sequence caused me to substantially update that trying to be more specific is a surprisingly powerful level in lots of different situations. I also just really like the example and concreteness driven content of the sequence.
7elityre5dNominationI liked this post, though I am afraid that it will suggest the wrong spirit.

If the thesis in Unlocking the Emotional Brain (UtEB) is even half-right, it may be one of the most important books that I have read. Written by the psychotherapists Bruce Ecker, Robin Ticic and Laurel Hulley, it claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds. Furthermore, if UtEB is correct, it also explains why rationalist techniques such as Internal Double Crux [1 2 3] work.

UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how the world functions and what caused those emotions to occur. The brain then uses those models to guide our future behavior. Emotional issues...

This post discusses something I have found hard to put into words, and helps draw it out for everyone to talk about. Seems very valuable to include in the review.

This post begins the Immoral Mazes sequence. See introduction for an overview of the plan. Before we get to the mazes, we need some background first.

Meditations on Moloch

Consider Scott Alexander’s Meditations on Moloch. I will summarize here. 

Therein lie fourteen scenarios where participants can be caught in bad equilibria.

  1. In an iterated prisoner’s dilemma, two players keep playing defect.
  2. In a dollar auction, participants massively overpay.
  3. A group of fisherman fail to coordinate on using filters that efficiently benefit the group, because they can’t punish those who don’t profi by not using the filters.
  4. Rats are caught in a permanent Malthusian trap where only those who do nothing but compete and consume survive. All others are outcompeted.
  5. Capitalists serve a perfectly competitive market, and cannot pay a living wage.
  6. The tying of all good schools to ownership of land causes families to work two jobs whose incomes are then captured by the owners of land.
  7. Farmers outcompeted foragers
...

Nominating this whole sequence. I learned a lot from it.

DeepMind released their AlphaStar paper a few days ago, having reached Grandmaster level at the partial-information real-time strategy game StarCraft II over the summer.

This is very impressive, and yet less impressive than it sounds. I used to watch a lot of StarCraft II (I stopped interacting with Blizzard recently because of how they rolled over for China), and over the summer there were many breakdowns of AlphaStar games once players figured out how to identify the accounts.

The impressive part is getting reinforcement learning to work at all in such a vast state space- that took breakthroughs beyond what was necessary to solve Go and beat Atari games. AlphaStar had to have a rich enough set of potential concepts (in the sense that e.g. a convolutional net ends up having concepts of different textures) that it could learn a concept like "construct building P" or "attack unit Q" or "stay out...

This together with Rick's post on the topic really helped me navigate the whole Alphastar thing, and I've been coming back to it a few times to help me figure out how general current ML methods are (I think I disagree a good amount with it, but still think it makes a good number of points). 

This is crossposted from the AI Impacts blog.

Artificial intelligence defeated a pair of professional Starcraft II players for the first time in December 2018. Although this was generally regarded as an impressive achievement, it quickly became clear that not everybody was satisfied with how the AI agent, called AlphaStar, interacted with the game, or how its creator, DeepMind, presented it. Many observers complained that, in spite of DeepMind’s claims that it performed at similar speeds to humans, AlphaStar was able to control the game with greater speed and accuracy than any human, and that this was the reason why it prevailed.

Although I think this story is mostly correct, I think it is harder than it looks to compare AlphaStar’s interaction with the game to that of humans, and to determine to what extent this mattered for the outcome of the matches. Merely comparing raw numbers for actions taken per...

This was really useful at the time for helping me orient around the whole "how good are AIs at real-time strategy" thing at the time, and I think is still the post I would refer to the most (together with orthonormal's post, which I also nominated).

2Ben Pace2dNominationIt's a really detailed analysis of this situation, and I think this sort of analysis probably generalizes to lots of cases of comparing ML to humans. I'm not confident though, and would update a bunch from a good review of this.

Reply to: Decoupling vs Contextualising Norms

Chris Leong, following John Nerst, distinguishes between two alleged discursive norm-sets. Under "decoupling norms", it is understood that claims should be considered in isolation; under "contextualizing norms", it is understood that those making claims should also address potential implications of those claims in context.

I argue that, at best, this is a false dichotomy that fails to clarify the underlying issues—and at worst (through no fault of Leong or Nerst), the concept of "contextualizing norms" has the potential to legitimize derailing discussions for arbitrary political reasons by eliding the key question of which contextual concerns are genuinely relevant, thereby conflating legitimate and illegitimate bids for contextualization.

Real discussions adhere to what we might call "relevance norms": it is almost universally "eminently reasonable to expect certain contextual factors or implications to be addressed." Disputes arise over which certain contextual factors those are, not whether context matters at all.

The

...

This post gave specific words to a problem I've run into many times, and am just pretty glad to have words for. It also became relevant in a bunch of contexts I was in. 

4Vaniver4dNominationThere's a set of posts by Zack_M_Davis in a similar vein that came out in 2019; some examples are Maybe Lying Doesn't Exist [https://www.lesswrong.com/posts/bSmgPNS6MTJsunTzS/maybe-lying-doesn-t-exist], Firming Up Not-Lying Around Its Edge Cases Is Less Broadly Useful Than One Might Think [https://www.lesswrong.com/posts/MN4NRkMw7ggt9587K/firming-up-not-lying-around-its-edge-cases-is-less-broadly] , Free Speech and Triskaidekaphobic Calculators: A Reply to Hubinger on the Relevance of Public Online Discussion to Existential Risk [https://www.lesswrong.com/posts/yaCwW8nPQeJknbCgf/free-speech-and-triskaidekaphobic-calculators-a-reply-to] , and this one. Overall, this was the one I liked the most at the time (as evidenced by strong-upvoting it, and only weak or not upvoting the others). It points clearly at a confusion underlying a common dichotomy, in a way that I think probably changed the discourse afterwards for the better.

A few years ago, the rationalsphere was small, and it was hard to get funding to run even one organization. Spinning up a second one with the same focus area might have risked killing the first one.

By now, I think we have the capacity (financial, coordinational and human-talent-wise) that that's less of a risk. Meanwhile, I think there are a number of benefits to having more, better, friendly competition.

Reasons competition seems good

Diversity of worldviews is better.

Two research orgs might develop different schools of thought that lead to different insights. This can lead to more ideas as well as avoiding the tail risks of bias and groupthink.

Easier criticism.

When there's only one org doing A Thing, criticizing that org feels sort of like criticizing That Thing. And there may be a worry that if the org lost funding due to your criticism, That Thing wouldn't get done at all. Multiple...

Nominating this post as much for the main body as well as Ray's top-level comment. I guess maybe this post is somewhat downstream of me, so it's not super surprising I like it, but I do think many many parts of the world could really benefit from more healthy competitions, and I've set many plans into motion that try to create more competition in ways that I think improves things quite a bit.

2Ben Pace2dNominationThis seems basically right to me.

This post covers the set-up and results from our exploration in amplifying generalist research using predictions, in detail. It is accompanied by a second post with a high-level description of the results, and more detailed models of impact and challenges. For an introduction to the project, see that post.

___

The rest of this post is structured as follows.

First, we cover the basic set-up of the exploration.

Second, we share some results, in particular focusing on the accuracy and cost-effectiveness of this method of doing research.

Third, we briefly go through some perspectives on what we were trying to accomplish and why that might be impactful, as well as challenges with this approach. These are covered more in-depth in a separate post.

Overall, we are very interested in feedback and comments on where to take this next.

Set-up of the experiment

A note on the experimental design

To begin with, we note that...

I really like amplification and want people to try it more. This was the most serious real-life effort in amplification that I can remember, and while I don't think it's results convinced ended up being super surprising to me, the methodology was quite good, and I would like to see more of it (or somewhat enhanced versions of it)

3Ben Pace2dNominationI was really excited by this research — in particular I was overjoyed to see the priors, predictions, and updates — and would like to see more built on it.

The following is QRI's unified theory of music, meditation, psychedelics, depression, trauma, and emotional processing. Implications for how the brain implements Bayesian updating, and future directions for neuroscience. Crossposted from http://opentheory.net

-----------------

Context: follow-up to The Neuroscience of Meditation and A Future For Neuroscience; a unification of (1) the Entropic Brain & REBUS (Carhart-Harris et al. 2014; 2018; 2019), (2) the Free Energy Principle (Friston 2010), (3) Connectome-Specific Harmonic Waves (Atasoy et al. 2016; 2017), and (4) QRI’s Symmetry Theory of Valence (Johnson 2016; Gomez Emilsson 2017).

0. Introduction

Why is neuroscience so hard?

Part of the problem is that the brain is complicated. But we’ve also mostly been doing it wrong, trying to explain the brain using methods that couldn’t possibly generate insight about the things we care about.

On QRI’s lineages page, we suggest there’s a distinction between ‘old’ and ‘new’ neuroscience:

Traditionally, neuroscience has been concerned with cataloguing the brain, e.g. collecting discrete observations
...

I think this post is 90% likely to make very little sense, but, ever since reading it I can't get rid of the spark of doubt that maybe this post is saying something really important and valuable and all study of rationality that does not understand it is doomed from the start.

I do think even without this post being anywhere close to right I got some useful things out of it, but by far the strongest reason for why I am nominating this post is because I want people to review it and engage with it critically.

2Ben Pace2dNominationI would like to see this post reviewed. (Jacob Falkovich, here's looking at you.)

Note: I'll be trying not to engage too much with the object level discussion here – I think my marginal time on this topic is better spent thinking and writing longform thoughts. See this comment.

Over the past couple months there was some extended discussion including myself, Habryka, Ruby, Vaniver, Jim Babcock, Zvi, Ben Hoffman, Jessicata and Zack Davis. The discussion has covered many topics, including "what is reasonable to call 'lying'", and "what are the best ways to discuss and/or deal with deceptive patterns in public discourse", "what norms and/or principles should LessWrong aspire to" and others.

This included comments on LessWrong, email, google-docs and in-person communication. This post is intended as an easier-to-read collection of what seemed (to me) like key points, as well as including my current takeaways.

Part of the challenge here was that it seemed like Benquo and I had mostly similar models, but many critiques I made seemed

...

I was sadly not part of the conversations involved, but this writeup is pretty helpful and I think important.

2habryka2dNominationI changed my mind on a lot of things around the time these conversations happened. I don't know how much this writeup catches the generators of those updates, but I do think it captures more than any other post I know of, and I do think the things I learned from Jessica, Ben and Zack are quite valuable and important.

This post is based on chapter 15 of Uri Alon’s book An Introduction to Systems Biology: Design Principles of Biological Circuits. See the book for more details and citations; see here for a review of most of the rest of the book.

Fun fact: biological systems are highly modular, at multiple different scales. This can be quantified and verified statistically, e.g. by mapping out protein networks and algorithmically partitioning them into parts, then comparing the connectivity of the parts. It can also be seen more qualitatively in everyday biological work: proteins have subunits which retain their function when fused to other proteins, receptor circuits can be swapped out to make bacteria follow different chemical gradients, manipulating specific genes can turn a fly’s antennae into legs, organs perform specific functions, etc, etc.

On the other hand, systems designed by genetic algorithms (aka simulated evolution) are decidedly not modular. This can also be quantified...

Coming back to this post, I have some thoughts related to it that connect this more directly to AI Alignment that I want to write up, and that I think make this post more important than I initially thought. Hence nominating it for the review. 

Originally posted on The Roots of Progress, August 12, 2017

I recently finished The Alchemy of Air, by Thomas Hager. It's the story of the Haber-Bosch process, the lives of the men who created it, and its consequences for world agriculture and for Germany during the World Wars.

What is the Haber-Bosch process? It's what keeps billions of people in the modern world from starving to death. In Hager's phrase: it turns air into bread.


Some background. Plants, like all living organisms, need to take in nutrients for metabolism. For animals, the macronutrients needed are large, complex molecules: proteins, carbohydrates, fats. But for plants they are elements: nitrogen, phosphorus and potassium (NPK). Nitrogen is needed in the largest quantities.

Nitrogen is all around us: it constitutes about four-fifths of the atmosphere. But plants can't use atmospheric nitrogen. Nitrogen gas, , consists of two atoms held together by a triple covalent bond. The strength of

...

Seconding johnswentworth nominations. This was I think my favorite post from Jason in 2019, and I still think the study of progress is pretty crucial for a lot of work on LessWrong, and this post does a pretty good job of it. 

4johnswentworth4dNominationI'd love to nominate to basically everything Jason writes. Heck, I'd totally buy a book of posts from roots of progress. But of those which showed up on LW in 2019, this is one of the two which were most roots-of-progress-y.

This essay is an adaptation of a talk I gave at the Human-Aligned AI Summer School 2019 about our work on mesa-optimisation. My goal here is to write an informal, accessible and intuitive introduction to the worry that we describe in our full-length report.

I will skip most of the detailed analysis from our report, and encourage the curious reader to follow up this essay with our sequence or report.

The essay has six parts:

Two distinctions draws the foundational distinctions between
“optimised” and “optimising”, and between utility and reward.

What objectives? discusses the behavioral and internal approaches to understanding objectives of ML systems.

Why worry? outlines the risk posed by the utility ≠ reward gap.

Mesa-optimisers introduces our language for analysing this worry.

An alignment agenda sketches different alignment problems presented by these ideas, and suggests transparency and interpretability as a way to solve them.

Where does this leave us? summarises the essay and suggests where to look

...

I think of Utility != Reward as probably the most important core point from the Mesa-Optimizer paper, and I preferred this explanation over the one in the paper (though it leaves out many things and wouldn't want it to be the only thing someone reads on the topic)

2Ben Pace6dNominationI nominate this alongside the sequence, as a less formal explanation of the core ideas. I can imagine this essay being the more widely read and intuitive one.

Thankyou to Sisi Cheng (of the Working as Intended comic) for the excellent drawings.

Suppose we have a gearbox. On one side is a crank, on the other side is a wheel which spins when the crank is turned. We want to predict the rotation of the wheel given the rotation of the crank, so we run a Kaggle competition.

We collect hundreds of thousands of data points on crank rotation and wheel rotation. 70% are used as training data, the other 30% set aside as test data and kept under lock and key in an old nuclear bunker. Hundreds of teams submit algorithms to predict wheel rotation from crank rotation. Several top teams combine their models into one gradient-boosted deep random neural support vector forest. The model achieves stunning precision and accuracy in predicting wheel rotation.

On the other hand, in a very literal sense, the model contains no gears. Is...

This is just such a central idea we use on LessWrong, explained well and with great images.

(If it is published in the book, it should be included alongside Val's original post on the subject.)

Load More