# Recommendations

Predictably Wrong
Argument and Analysis
The Methods of Rationality
2212y
24
1381y
38

# Recent Discussion

This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:

1. don't feel ready to be written up as a full post
2. I think the process of writing them up might make them worse (i.e. longer than they need to be)

4Raemon10hA thing I might have maybe changed my mind about: I used to think a primary job of a meetup/community organizer was to train their successor, and develop longterm sustainability of leadership. I still hold out for that dream. But, it seems like a pattern is: 1) community organizer with passion and vision founds a community 2) they eventually move on, and pass it on to one successor who's pretty closely aligned and competent 3) then the First Successor has to move on to, and then... there isn't anyone obvious to take the reins, but if no one does the community dies, so some people reluctantly step up. and.... ...then forever after it's a pale shadow of its original self. For semi-branded communities (such as EA, or Rationality), this also means that if someone new with energy/vision shows up in the area, they'll see a meetup, they'll show up, they'll feel like the meetup isn't all that good, and then move on. Wherein they (maybe??) might have founded a new one that they got to shape the direction of more. I think this also applies to non-community organizations (i.e. founder hands the reins to a new CEO who hands the reins to a new CEO who doesn't quite know what to do) So... I'm kinda wondering if second-generation successors should just... err on the side of shutting the thing down when they leave, rather than trying desperately to find a replacement. The answer isn't obvious. There is value that continues to be created by the third+ generation. I think I've mostly gone from "having a firm opinion that you should be proactively training your successor" to "man, I dunno, finding a suitable successor is actually pretty hard, mrrr?"
2Dagon17mWhat's different for the organizer and first successor, in terms of their ability to do the primary job of finding their successor? I also note the pattern you mention (one handoff mostly succeeds, community degrades rapidly around the time the first successor leaves with no great second successor). But I also have seen a lot of cases where the founder fails to hand off in the first place, and some where it's handed off to a committee or formal governance structure, and then eventually dies for reasons that don't seem caused by succession. I wonder if you've got the causality wrong - communities have a growth/maintenance/decline curve, which varies greatly in the parameters, but not so much in the shape. It seems likely to me that the leaders/organizers REACT to changes in the community by joining, changing their involvement, or leaving, rather than causing those changes.

I'm not Ray, but I'll take a stab --

The founder has a complete vision for the community/meetup/company/etc. They were able to design a thing that (as long as they continue putting in energy) is engaging, and they instinctively know how to change it so that it continues being great for participants.

The first successor has an incomplete, operational/keep-things-running-the-way-they-were type vision. They cargo-cult whatever the founder was doing. They don't have enough vision to understand the 'why' behind all the decisions. But putting your finger on their ... (read more)

In my experience, constant-sum games are considered to provide "maximally unaligned" incentives, and common-payoff games are considered to provide "maximally aligned" incentives. How do we quantitatively interpolate between these two extremes? That is, given an arbitrary  payoff table representing a two-player normal-form game (like Prisoner's Dilemma), what extra information do we need in order to produce a real number quantifying agent alignment?

If this question is ill-posed, why is it ill-posed? And if it's not, we should probably understand how to quantify such a basic aspect of multi-agent interactions, if we want to reason about complicated multi-agent situations whose outcomes determine the value of humanity's future. (I started considering this question with Jacob Stavrianos over the last few months, while supervising his SERI project.)

Thoughts:

• Assume the alignment function has range  or .
• Constant-sum
...
2gjm3hSorry, I think I wasn't clear about what I don't understand. What is a "strategy profile (like stag/stag)"? So far as I can tell, the usual meaning of "strategy profile" is the same as that of "strategy", and a strategy in a one-shot game of stag hunt looks like "stag" or "hare", or maybe "70% stag, 30% hare"; I don't understand what "stag/stag" means here. ---- It is absolutely standard in game theory to equate payoffs with utilities. That doesn't mean that you have to do the same, of course, but I'm sure that's why Dagon said what he did and it's why when I was enumerating possible interpretations that was the first one I mentioned. (The next several paragraphs are just giving some evidence for this; I had a look on my shelves and described what I found. Most detail is given for the one book that's specifically about formalized 2-player game theory.) "Two-Person Game Theory" by Rapoport, which happens to be the only book dedicated to this topic I have on my shelves, says this at the start of chapter 2 (titled "Utilities"): Unfortunately, Rapoport is using the word "payoffs" to mean two different things here. I think it's entirely clear from context, though, that his actual meaning is: you may begin by specifying monetary payoffs, but what we care about for game theory is payoffs as utilities. Here's more from a little later in the chapter: A bit later: and: As I say, that's the only book of formal game theory on my shelves. Schelling's Strategy of Conflict has a little to say about such games, but not much and not in much detail, but it looks to me as if he assumes payoffs are utilities. The following sentence is informative, though it presupposes rather than stating: "But what configuration of value systems for the two participants -- of the "payoffs", in the language of game theory -- makes a deterrent threat credible?" (This is from the chapter entitled "International Strategy"; in my copy it's on page 13.) Rapoport's "Strategy and Conscience" isn't a
2Answer by JonasMoss3hAlright, here comes a pretty detailed proposal! The idea is to find out if the sum of expected utility for both players is “small” or “large” using the appropriate normalizers. First, let's define some quantities. (I'm not overly familiar with game theory, and my notation and terminology are probably non-standard. Please correct me if that's the case!) * A.The payoff matrix for player 1. * B.The payoff matrix for player 2. * s,rthe mixed strategies for players 1 and 2. These are probability vectors, i.e., vectors of non-negative numbers summing to 1. Then the expected payoff for player 1 is the bilinear formsTAr=∑i,jsiaijrjand the expected payoff for player 2 issTBr=∑i,jsibijrj. The sum of payoffs issT(A+B )r. But we're not done defining stuff yet. I interpret alignment to be about welfare. Or how large the sum of utilities is when compared to the best-case scenario and the worst-case scenario. To make an alignment coefficient out of this idea, we will need * l(A,B).This is the lower bound to the sum of payoffs,l(A,B)=minu,v[uT(A+B)v], whereu,vare probability vectors. Evidentely,l(A,B)=min(A+B). * u(A,B).The upper bound to the sum of payoffs in the counterfactual situation where the payoff to player 1 is not affected by the actions of player 2, and vice versa. Thenu(A,B)=maxu,vuTAv+maxu,vuTBv. Now we find thatu(A,B)=maxA+max B. Now define the alignment coefficient of the strategies(s,r)in the game defined by the payoff matrices(A,B)as a=sT(A+B)r−l(A,B)u(A,B)−l(A,B).The intuition is that alignment quantifies how the expected payoff sumsT(A+B)rcompares to the best possible payoff sumu(A,B) attainable when the payoffs are independent. If they are equal, we have perfect alignment(a=1). On the other hand, ifsT(A+B)r=l(A,B), the expected payoff sum is as bad as it could possibly be, and we have minimal alignment (a=0). The only problem is thatu(A,B)=l(A,B)makes the denominator equal to 0; but in this case,u(A,B)=sT(A+B)ras well, which I believe
2Answer by Vitor4hQuick sketch of an idea (written before deeply digesting others' proposals): Intuition: Just like player 1 has a best response (starting from a strategy profiles, improve her own utility as much as possible), she also has an altruistic best response (which maximally improves the other player's utility). Example: stag hunt. If we're at (rabbit, rabbit), then both players are perfectly aligned. Even if player 1 was infinitely altruistic, she can't unilaterally cause a better outcome for player 2. Definition: given a strategy profiles, ana-altruistic better response is any strategy of one player that gives the other player at leastaextra utility for each point of utility that this player sacrifices. Definition: player 1 isa-aligned with player 2 if player 1 doesn't have anx -altruistic better response for anyx>a. 0-aligned: non-spiteful player. They'll give "free" utility to other players if possible, but they won't sacrifice any amount of their own utility for the sake of others. c-aligned forc∈(0,1): slightly altruistic. Your happiness matters a little bit to them, but not as much as their own. 1-aligned: positive-sum maximizer. They'll yield their own utility as long as the total sum of utility increases. c-aligned forc∈(1,∞): subservient player: They'll optimize your utility with higher priority than their own. ∞-aligned: slave. They maximize others' utility, completely disregarding their own. Obvious extension from players to strategy profiles: How altruistic would a player need to be before they would switch strategies?

On re-reading this I messed up something with the direction of the signs. Don't have time to fix it now, but the idea is hopefully clear.

# Three case studies

## 1. Incentive landscapes that can’t feasibly be induced by a reward function

You’re a deity, tasked with designing a bird brain. You want the bird to get good at singing, as judged by a black-box hardcoded song-assessing algorithm that you already built into the brain last week. The bird chooses actions based in part on within-lifetime reinforcement learning involving dopamine. What reward signal do you use?

Well, we want to train the bird to sing the song correctly. So it’s easy: the bird practices singing, and it listens to its own song using the song-assessing black box, and it does RL using the rule:

The better the song sounds, the higher the reward.

Oh wait. The bird is also deciding how much time to spend practicing singing, versus foraging...

Nice post!

I'm generally bullish on multiple objectives, and this post is another independent arrow pointing in that direction. Some other signs which I think point that way:

• The argument from Why Subagents?. This is about utility maximizers rather than reward maximizers, but it points in a similar qualitative direction. Summary: once we allow internal state, utility-maximizers are not the only inexploitable systems; markets/committees of utility-maximizers also work.
• The argument from Fixing The Good Regulator Theorem. That post uses some incoming informatio
3Archimedes2hThis was enlightening for me. I suspect the concept of treating agents (artificial, human, or otherwise) as multiple interdependent subsystems working in coordination, each with its own roles, goals, and rewards, rather than as a single completely unified system is critical for solving alignment problems. I recently read Entangled Life (by Merlin Sheldrake), which explores similar themes. One of the themes is that the concept of the individual is not so easily defined (perhaps not even entirely coherent). Every complex being is made up of smaller systems and also part of a larger ecosystem and none of these levels can truly be understood independently of the others.

Talk is cheap.  Someone who says "I want to die eventually" isn't actually invested in the answer - it's just them justifying to themselves why they're not exercising, eating right, and otherwise planning for a long future.

Tracey Davis went to the second-floor girls' lavatory for her regular practice. Tracey drew her Gibson Flying V2 from of its case. Tracey inserted her Gibson Flying V2 back in its case. What was the point?

If you looked through the bars of the wrought iron cage by Luna Lovegood's bed you wouldn't see anything inside. The cage's door was sealed with a heavy padlock. Luna didn't remember locking it.

Tracey liked that there was a staircase from the Slytherin dormitories to Ravenclaw. It meant she didn't have to answer the raven's riddles. Tracey could answer the raven's riddles. If she wanted to. (Not that she had ever tried.) It was just demeaning.

The Ravenclaw common room was brightly-lit by the tall windows all around the circular Common Room. A...

1Measure3hIs this after the concert in part 4? Does Tracey not remember it?
1Measure3h"plan music" did you mean "play"?

Fixed. Thanks.

"Your fingers are all wrong. So is your posture. And how you hold it," said Myrtle.

Tracey tried again. Her fingers hurt.

"Still wrong. Sit like this," Myrtle demonstrated.

Tracey mirrored Myrtle.

"No. Sit literally right here where I'm sitting," said Myrtle.

Tracey reached into Myrtle. It felt like ice water. Myrtle sat still. Tracey gritted her teeth, took a deep breath and superimposed herself. Tracey's skin rippled grey where bits of Myrtle protruded.

"You feel cold," said Tracey.

"Ghosts can't feel temperature. We can't smell. We can't taste. We see in shades of grey. But we can hear," said Myrtle.

Myrtle moved her hands into position. Tracey followed.

Nearly Headless Nick's deathday anniversary was October 31st. All the Hogwarts ghosts had attended. Many wore formal white sheets.

Fixed. Thanks.

A friend of mine has been bootstrapping a business-to-business software-as-a-service startup that's seeing serious growth. It needs someone who can put dedicated effort into scaling it, but my friend is near the end of their career and looking to retire. What do people do in this situation?

More details: they were running a traditional labor-limited small business and they automated some of the work. This automation was a huge improvement and they realized it could be useful to other companies. In early 2019 they had a web app ready and started taking external customers. In late 2020 they started to see serious growth, which has continued. They let me share some numbers:

This is a run rate of ~$340k/y, up from ~$100k/y a quarter ago, ~$52k a quarter before that, ~$12k/y a quarter before...

2Dagon1hStartups are individual, highly varied, and idiosyncratic. ~20 hours/week sounds a lot like a pretty good retirement - why not carry on for a few more years, and figure out if the "zero churn" for the last few months is real, or just a blip from being first to notice this was worth solving? Alternately, hire out most of the work, so they have a smaller income stream, but can spend half (probably not less) the time. The primary question is who wants to provide the energy and vision to grow to the point that it's worth anything, either to hit a VC or to sell to a bigger company. For many small businesses, transition is by partnership - someone "buys in" over the course of a decade by taking a lower salary in exchange for ownership. But I don't know of any cases where that happens for a TINY business that's a sole proprietorship.

~20 hours/week sounds a lot like a pretty good retirement

They're only putting 20 hours a week into it because they have a lot of other things going on, but I think it would grow much more effectively with multiple full-time people. Given its current growth rate, I think it probably should have two full-time people, one technical and one non-technical, similar to a classic early stage software startup?

As the world knows, the FDA approved Biogen’s anti-amyloid antibody today, surely the first marketed drug whose Phase III trial was stopped for futility. I think this is one of the worst FDA decisions I have ever seen, because – like the advisory committee that reviewed the application, and like the FDA’s own statisticians – I don’t believe that Biogen really demonstrated efficacy. No problem apparently. The agency seems to have approved it based on its demonstrated ability to clear beta-amyloid, and is asking Biogen to run a confirmatory trial to show efficacy.

[...]

So the FDA has, for expediency’s sake, bought into the amyloid hypothesis although every single attempt to translate that into a beneficial clinical effect has failed. I really, really don’t like the precedent that this

...

what doesn’t get approved, now?

What doesn't get approved are the things which haven't spent enough money and obeisance to ritual to submit the requests properly, and probably a few things which are actively harmful.

I know that this is just the libertarian turn

Yeah, I wish it were.  What it really means is that the magical thinking that "FDA approval" is what makes something a "good drug" gets a little more weight, and does nothing to reduce interference or make it cheaper to bring useful things to people who can benefit.

3ChristianKl2hThis decision made it harder to bring new Alzheimer drugs to market as one of the people who resigned from the FDA's advisory panel explains on CNN [https://edition.cnn.com/videos/health/2021/06/11/dr-joel-perlmutter-intv-fda-advisor-resign-alzheimers-aducanumab-sot-nr-vpx.cnn] . Companies already knew beforehand that Alzheimer drugs are a multi-billion dollar market. The fact that the FDA approved the drug based on reducing amyloid beta plaques, suggest that other companies are incentivied to develop drugs that target amyloid beta plaques as well instead of going for something that's actually promising. Previous discussion of why targeting amyloid beta plaques likely isn't a good idea can be found at the recent LessWrong post Core Pathways of Aging [https://www.lesswrong.com/posts/ui6mDLdqXkaXiDMJ5/core-pathways-of-aging].
4ChristianKl3hTHREE EXPERTS RESIGN AS FDA ADVISERS OVER APPROVAL OF ALZHEIMER’S DRUG [HTTPS://ARSTECHNICA.COM/SCIENCE/2021/06/THREE-EXPERTS-RESIGN-AS-FDA-ADVISORS-OVER-APPROVAL-OF-ALZHEIMERS-DRUG/]

(originally posted at Secretum Secretorum)

I think there is something fascinating and useful about many of the observations, adages, and aphorisms that we (often sarcastically) designate as eponymous laws, effects, or principles in the same way we might for a scientific law. They are often funny (and memorable because of it), but many of them do speak to very fundamental aspects of human psychology and the human condition more generally. Murphy’s Law is probably the most well known example.

Murphy’s Law – “Anything that can go wrong will go wrong”

There are also a few lesser-known corollaries.

Murphy's Second Law – “Nothing is as easy as it looks”

Murphy's Third Law – “Everything takes longer than you think it will (even when you account for Murphy’s Third Law).1

Murphy's Fourth Law – “If there is...

I searched far and wide to compile a comprehensive list of eponymous laws/effects/principles

Really?  my very first search turned up https://en.wikipedia.org/wiki/List_of_eponymous_laws , which is a little briefer than your list, but actually meets your criteria of being eponymous.  You include a bunch of things like "you broke it rule" and "theory of internet relativity", which are not named after everyone, and are more like https://en.wikipedia.org/wiki/Iron_law , or just observations.