Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for quick takes by habryka. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

In an attempt to get myself to write more here is my own shortform feed. Ideally I would write something daily, but we will see how it goes.

121 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Welp, I guess my life is comic sans today. The EA Forum snuck some code into our deployment bundle for my account in-particular, lol:

Screenshot for posterity.



And finally, I am freed from this curse.
I hope the partial unveiling of a your user_id hash will not doom us all, somehow. 
You can just get people's userIds via the API, so it's nothing private. 

Thoughts on integrity and accountability

[Epistemic Status: Early draft version of a post I hope to publish eventually. Strongly interested in feedback and critiques, since I feel quite fuzzy about a lot of this]

When I started studying rationality and philosophy, I had the perspective that people who were in positions of power and influence should primarily focus on how to make good decisions in general and that we should generally give power to people who have demonstrated a good track record of general rationality. I also thought of power as this mostly unconstrained resource, similar to having money in your bank account, and that we should make sure to primarily allocate power to the people who are good at thinking and making decisions.

That picture has changed a lot over the years. While I think there is still a lot of value in the idea of "philosopher kings", I've made a variety of updates that significantly changed my relationship to allocating power in this way:

  • I have come to believe that people's ability to come to correct opinions about important questions is in large part a result of whether their social and monetary incentives reward them when they ha
... (read more)

Just wanted to say I like this a lot and think it'd be fine as a full fledged post. :)

More than fine. Please do post a version on its own. A lot of strong insights here, and where I disagree there's good stuff to chew on. I'd be tempted to respond with a post. I do think this has a different view of integrity than I have, but in writing it out, I notice that the word is overloaded and that I don't have as good a grasp of its details as I'd like. I'm hesitant to throw out a rival definition until I have a better grasp here, but I think the thing you're in accordance with is not beliefs so much as principles?
1Eli Tyre

This was a great post that might have changed my worldview some.

Some highlights:


People's rationality is much more defined by their ability to maneuver themselves into environments in which their external incentives align with their goals, than by their ability to have correct opinions while being subject to incentives they don't endorse. This is a tractable intervention and so the best people will be able to have vastly more accurate beliefs than the average person, but it means that "having accurate beliefs in one domain" doesn't straightforwardly generalize to "will have accurate beliefs in other domains".

I've heard people say things like this in the past, but haven't really taken it seriously as an important component of my rationality practice. Somehow what you say here is compelling to me (maybe because I recently noticed a major place where my thinking was majorly constrained by my social ties and social standing) and it prodded me to think about how to build "mech suits" that not only increase my power but incentives my rationality. I now have a todo item to "think about principles for incentivizing true belief... (read more)

3mako yass
I think you might be confusing two things together under "integrity". Having more confidence in your own beliefs than the shared/imposed beliefs of your community isn't really a virtue or.. it's more just a condition that a person can be in, whether it's virtuous is completely contextual. Sometimes it is, sometimes it isn't. I can think of lots of people who should have more confidence other peoples' beliefs than they have in their own. In many domains, that's me. I should listen more. I should act less boldly. An opposite of that sense of integrity is the virtue of respect- recognising other peoples' qualities- it's a skill. If you don't have it, you can't make use of other peoples' expertise very well. A superfluence of respect is a person who is easily moved by others' feedback, usually, a person who is patient with their surroundings. On the other hand I can completely understand the value of {having a known track record of staying true to self-expression, claims made about the self}. Humility is actually a part of that. The usefulness of deliniating that into a virtue separate from the more general Honesty is clear to me.
There's a lot of focus on personally updating based on evidence. Groups aren't addressed as much. What does it mean for a group to have a belief? To have honesty or integrity?
See Sinclair: "It is difficult to get a man to understand something, when his salary depends upon his not understanding it!"

Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens. 

Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.

@jefftk comments on the HN thread on this:  Another HN commenter says (in a different thread): 

I'm probably missing something simple, but what is 356? I was expecting a probability or a percent, but that number is neither.

I think 356 or more people in the population needed to make there be a >5% of 2+ deaths in a 2 month span from that population


I think there should be some sort of adjustment for Boeing not being exceptionally sus before the first whistleblower death - shouldn't privilege Boeing until after the first death, should be thinking across all industries big enough that the news would report on the deaths of whistleblowers. which I think makes it not significant again. 

Shouldn't that be counting the number squared rather than the number?
2Seth Herd
Ummm, wasn't one of them just about to testify against Boeing in court, on their safety practices? And they "committed suicide" after saying the day before how much they were looking forward to finally getting a hearing on their side of the story? That's what I read; I stopped at that point, thinking "about zero chance that wasn't murder".
I think the priors here are very low, so while I agree it looks suspicious, I don't think it's remotely suspicious enough to have the correct posterior be "about zero chance that wasn't murder". Corporations, at least in the U.S. really very rarely murder people.
2Seth Herd
That's true, but the timing and incongruity of a "suicide" the day before testifying seems even more absurdly unlikely than corporations starting to murder people. And it's not like they're going out and doing it themselves; they'd be hiring a hitman of some sort. I don't know how any of that works, and I agree that it's hard to imagine anyone invested enough in their job or their stock options to risk a murder charge; but they may feel that their chances of avoiding charges are near 100%, so it might make sense to them. I just have absolutely no other way to explain the story I read (sorry I didn't get the link since this has nothing to do with AI safety) other than that story being mostly fabricated. People don't say "finally tomorrow is my day" in the evening and then put a gun in their mouth the next morning without being forced to do it. Ever. No matter how suicidal, you're sticking around one day to tell your story and get your revenge. The odds are so much lower than somebody thinking they could hire a hit and get away with it, and make a massive profit on their stock options. They could well also have a personal vendetta against the whistleblower as well as the monetary profit. People are motivated by money and revenge, and they're prone to misestimating the odds of getting caught. They could even be right that in their case it's near zero. So I'm personally putting it at maybe 90% chance of murder.
Poisoning someone with MRSA infection seems possible but if that's what happened it's capabilities that are not easily available. If such a thing would happen in another case, people would likely speak about nation-state capabilities. 
2Nathan Young
I find this a very suspect detail, though the base rate of cospiracies is very low.

A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time. 

We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully.

I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.


somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW...We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention

Multi-version wikis are a hard design problem.

It's something that people kept trying, when they soured on a regular Wikipedia: "the need for consensus makes it impossible for minority views to get a fair hearing! I'll go make my own Wikipedia where everyone can have their own version of an entry, so people can see every side! with blackjack & hookers & booze!" And then it becomes a ghost town, just like every other attempt to replace Wikipedia. (And that's if you're lucky: if you're unlucky you turn into Conservapedia or Rational Wiki.) I'm not aware of any cases of 'non-consensus' wikis that really succeed - it seems that usually, there's so little editor activity to go around that having ... (read more)

So, the key difficulty this feels to me like its eliding is the ontology problem. One thing that feels cool about personal wikis is that people come up with their own factorization and ontology for the things they are thinking about. Like, we probably won't have a consensus article on the exact ways L in Death Note made mistakes, but would be sadder without that kind of content. So I think in addition to the above there needs to be a way for users to easily and without friction add a personal article for some concept they care about, and to have a consistent link to it, in a way that doesn't destroy any of the benefits of the collaborative editing.  My sense is that collaboratively edited wikis tend to thrive heavily around places where there is a clear ontology and decay when the ontology is unclear or the domain permits many orthogonal carvings. This makes video game wikis so common and usually successful, as via the nature of their programming they will almost always have a clear structure to them (the developer probably coded an abstraction for "enemies" and "attack patterns" and "levels" and so the wiki can easily mirror them and document them). It feels to me that anything that wants to somehow build a unification of personal wikis and consensus wikis needs to figure out how to gracefully handle the ontology problem.

One thing that feels cool about personal wikis is that people come up with their own factorization and ontology for the things they are thinking about...So I think in addition to the above there needs to be a way for users to easily and without friction add a personal article for some concept they care about, and to have a consistent link to it, in a way that doesn't destroy any of the benefits of the collaborative editing.

My proposal already provides a way to easily add a personal article with a consistent link, while preserving the ability to do collaborative editing on 'public' articles. Strictly speaking, it's fine for people to add wiki entries for their own factorization and ontology.

There is no requirement for those to all be 'official': there doesn't have to be a 'consensus' entry. Nothing about a /wiki/Acausal_cooperation/gwern user entry requires the /wiki/Acausal_cooperation consensus entry to exist. (Computers are flexible like that.) That just means there's nothing there at that exact URL, or probably better, it falls back to displaying all sub-pages of user entries like usual. (User entries presumably get some sort of visual styling, in the same way that comments o... (read more)

1. Users can just create pages corresponding to their own categories 2. Like Notion we could allow two-way links between pages so users would just tag the category in their own custom inclusions.
I agree with Gwern. I think it's fairly rare that someone wants to write the whole entry themselves or articles for all concepts in a topic. It's much more likely that someone just wants to add their own idiosyncratic takes on a topic. For example, I'd love to have a convenient way to write up my own idiosyncratic takes on decision theory. I tried including some of these in the main Wiki, but it (understandably) was reverted. I expect that one of the main advantages of this style of content would be that you can just write a note without having to bother with an introduction or conclusion. I also think it would be fairly important (though not at the start) to have a way of upweighting the notes added by particular users. I agree with Gwern that this may result in more content being added to the main wiki pages when other users are in favour of this.
5Seth Herd
TLDR: The only thing I'd add to Gwern's proposal is making sure there are good mechanisms to discuss changes. Improving the wiki and focusing on it could really improve alignment research overall. Using the LW wiki more as a medium for collaborative research could be really useful in bringing new alignment thinkers up to speed rapidly. I think this is an important part of the overall project; alignment is seeing a burst of interest, and being able to rapidly make use of bright new minds who want to donate their time to the project might very well make the difference in adequately solving alignment in time. As it stands, someone new to the field has to hunt for good articles on any topic, and they provide some links to other important articles, but that's not really their job. The wiki's tags does serve that purpose. The articles are sometimes a good overview of that concept or topic, but more community focus on the wiki could make them work much better as a way Ideally each article aims to be a summary of current thinking on that topic, including both majority and minority views. One key element is making this project build community rather than strain it. Having people with different views work well collaboratively is a bit tricky. Good mechanisms for discussion are one way to reduce friction and any trend toward harsh feelings when ones' contributions are changed. The existing comment system might be adequate, particularly with more of a norm of linking changes to comments, and linking to comments from the main text for commentary.
Do you have an underlying mission statement or goal that can guide decisions like this?  IMO, there are plenty of things that should probably continue to live elsewhere, with some amount of linking and overlap when they're lesswrong-appropriate.   One big question in my mind is "should LessWrong use a different karma/voting system for such content?".  If the answer is yes, I'd put a pretty high bar for diluting LessWrong with it, and it would take a lot of thought to figure out the right way to grade "wanted on LW" for wiki-like articles that aren't collections/pointers to posts.  
One small idea: Have the ability to re-publish posts to allPosts or the front page after editing. This worked in the past, but now doesn't anymore (as I noticed recently when updating this post).
Yeah, the EA Forum team removed that functionality (because people kept triggering it accidentally). I think that was a mild mistake, so I might revert it for LW.
Cool idea, but before doing this one obvious inclusion would be to make it easier to tag LW articles, particularly your own articles, in posts by @including them.

Btw is happening. LW post and frontpage banner probably going up Sunday or early next week. 

Thoughts on voting as approve/disapprove and agree/disagree:

One of the things that I am most uncomfortable with in the current LessWrong voting system is how often I feel conflicted between upvoting something because I want to encourage the author to write more comments like it, and downvoting something because I think the argument that the author makes is importantly flawed and I don't want other readers to walk away with a misunderstanding about the world.

I think this effect quite strongly limits certain forms of intellectual diversity on LessWrong, because many people will only upvote your comment if they agree with it, and downvote comments they disagree with, and this means that arguments supporting people's existing conclusions have a strong advantage in the current karma system. Whereas the most valuable comments are likely ones that challenge existing beliefs and that are rigorously arguing for unpopular positions.

A feature that has been suggested many times over the years is to split voting into two dimensions. One dimension being "agree/disagree" and the other being "approve/disapprove". Only the "approve/disapprove" dimension m... (read more)

Having a reaction for "changed my view" would be very nice.

Features like custom reactions gives me this feeling that.. language will emerge from allowing people to create reactions that will be hard to anticipate but, in retrospect, crucial. Playing a similar role that body language plays during conversation, but designed, defined, explicit.

If someone did want to introduce the delta through this system, it might be necessary to give the coiner of a reaction some way of linking an extended description. In casual exchanges.. I've found myself reaching for an expression that means "shifted my views in some significant lasting way" that's kind of hard to explain in precise terms, and probably impossible to reduce to one or two words, but it feels like a crucial thing to measure. In my description, I would explain that a lot of dialogue has no lasting impact on its participants, it is just two people trying to better understand where they already are. When something really impactful is said, I think we need to establish a habit of noticing and recognising that.

But I don't know. Maybe that's not the reaction type that what will justify the feature. Maybe it will be something we can't think of now.

Generally, it seems useful to be able to take reduced measurements of the mental states of the readers.

the language that will emerge from allowing people to create reactions that will be hard to anticipate but, in retrospect, crucial

This is essentially the concept of a folksonomy, and I agree that it is potentially both applicable here and quite important.

5Rob Bensinger
I like the reactions UI above, partly because separating it from karma makes it clearer that it's not changing how comments get sorted, and partly because I do want 'agree'/'disagree' to be non-anonymous by default (unlike normal karma). I agree that the order of reacts should always be the same. I also think every comment/post should display all the reacts (even just to say '0 Agree, 0 Disagree...') to keep things uniform. That means I think there should only be a few permitted reacts -- maybe start with just 'Agree' and 'Disagree', then wait 6+ months and see if users are especially clambering for something extra. I think the obvious other reacts I'd want to use sometimes are 'agree and downvote' + 'disagree and upvote' (maybe shorten to Agree+Down, Disagree+Up), since otherwise someone might not realize that one and the same person is doing both, which loses a fair amount of this thing I want to be fluidly able to signal. (I don't think there's much value to clearly signaling that the same person agreed and upvoted or disagree and downvoted a thing.) I would also sometimes click both the 'agree' and 'disagree' buttons, which I think is fine to allow under this UI. :)
2Said Achmiz
Why not Slashdot-style?
Slashdot has tags, but each tag still comes with a vote. In the above, the goal would be explicitly to allow for the combination of "upvoted though I still disagree" which I don't think would work straightforwardly with the slashdot system. I also find it it quite hard to skim for anything on Slashdot, including the tags (and the vast majority of users at any given time can't add reactions on slashdot at any given time, so there isn't much UI for it).

What is the purpose of karma?

LessWrong has a karma system, mostly based off of Reddit's karma system, with some improvements and tweaks to it. I've thought a lot about more improvements to it, but one roadblock that I always run into when trying to improve the karma system, is that it actually serves a lot of different uses, and changing it in one way often means completely destroying its ability to function in a different way. Let me try to summarize what I think the different purposes of the karma system are:

Helping users filter content

The most obvious purpose of the karma system is to determine how long a post is displayed on the frontpage, and how much visibility it should get.

Being a social reward for good content

This aspect of the karma system comes out more when thinking about Facebook "likes". Often when I upvote a post, it is more of a public signal that I value something, with the goal that the author will feel rewarded for putting their effort into writing the relevant content.

Creating common-knowledge about what is good and bad

This aspect of the karma system comes out the most when dealing with debates, though it's present in basically any kar... (read more)

This is really good and I missed it until now. I vote for you making this a full-on post. I think it's fine as is for that.

I just came back from talking to Max Harms about the Crystal trilogy, which made me think about rationalist fiction, or the concept of hard sci-fi combined with explorations of cognitive science and philosophy of science in general (which is how I conceptualize the idea of rationalist fiction). 

I have a general sense that one of the biggest obstacles for making progress on difficult problems is something that I would describe as “focusing attention on the problem”. I feel like after an initial burst of problem-solving activity, most people when working on hard problems, either give up, or start focusing on ways to avoid the problem, or sometimes start building a lot of infrastructure around the problem in a way that doesn’t really try to solve it. 

I feel like one of the most important tools/skills that I see top scientist or problem solvers in general use, is utilizing workflows and methods that allow them to focus on a difficult problem for days and months, instead of just hours. 

I think at least for me, the case of exam environments displays this effect pretty strongly. I have a sense that in an exam environment, if I am given a question, I successfully focus my fu

... (read more)
4Eli Tyre
This is a really important point, which I kind of understood ("research" means having threads of inquiry that extend into the past and future), but I hadn't been thinking of it in terms of workflows that facilitate that kind of engagement.
nods I've gotten a lot of mileage over the years from thinking about workflows and systems that systematically direct your attention towards various parts of reality. 
Yes, fiction has a lot of potential to change mindsets. Many Philosophers actually look at the greatest novel writers to infer the motives and the solutions their heroes to come up with general theories that touch the very core of how our society is laid out. Most of this come from the fact that we are already immersed in a meta-story, externally and internally. Much of our efforts are focused on internal rationalizations to gain something where a final outcome has been already thought out, this being consciously known to us or not. I think that in fiction this is laid out perfectly. So analyzing fiction is rewarding in a sense. Specially when realizing that when we go to exams or interviews we're rapidly immersing ourselves in an isolated story with motives and objectives (what we expect to happen), we create our own little world, our own little stories.
Warning: HPMOR spoilers! I suspect that fiction can conveniently ignore the details of real life that could ruin seemingly good plans. Let's look at HPMOR. The implication for real life is that, similarly, smart plans are still likely to fail, and you know it. Which is probably why you are not trying hard enough. You probably already remember situations in your past when something seemed like a great idea, but still failed. Your brain may predict that your new idea would belong to the same reference class.
While I agree that this is right, your two objections are both explicitly addressed within the relevant chapter:  Obviously things could have still gone wrong, and Eliezer has explicitly acknowledged that HPMOR is a world in which complicated plans definitely succeed a lot more than they would in the normal world, but he did try to at least cover the obvious ways things could go wrong. 
2Ben Pace
I have covered both of your spoilers in spoiler tags (">!").

Is intellectual progress in the head or in the paper?

Which of the two generates more value:

  • A researcher writes up a core idea in their field, but only a small fraction of good people read it in the next 20 years
  • A researchers gives a presentation at a conference to all the best researchers in his field, but none of them write up the idea later

I think which of the two will generate more value determines a lot of your strategy about how to go about creating intellectual progress. In one model what matters is that the best individuals hear about the most important ideas in a way that then allows them to make progress on other problems. In the other model what matters is that the idea gets written as an artifact that can be processed and evaluated by reviews and the proper methods of the scientific progress, and then built upon when referenced and cited.

I think there is a tradeoff of short-term progress against long-term progress in these two approaches. I think many fields can go through intense periods of progress when focusing on just establishing communication between the best researchers of the field, but would be surprised if that period lasts longer than one or two decades. He... (read more)

Depends if you're sticking specifically to "presentation at a conference", which I don't think is necessarily that "high bandwidth". Very loosely, I think it's something like (ordered by "bandwidth"): repeated small group of individual interaction (e.g. apprenticeship, collaboration) >> written materials >> presentations. I don't think I could have learned Kaj's models of multi-agent minds from a conference presentation (although possibly from a lecture series). I might have learnt even more if I was his apprentice.
What if someone makes a video? (Or the powerpoint/s used in the conference are released to the public?)
This was presuming that that would not happen (for example, because there is a vague norm that things are kind-of confidential and shouldn't be posted publicly).

Thoughts on minimalism, elegance and the internet:

I have this vision for LessWrong of a website that gives you the space to think for yourself, and doesn't constantly distract you with flashy colors and bright notifications and vibrant pictures. Instead it tries to be muted in a way that allows you to access the relevant information, but still gives you the space to disengage from the content of your screen, take a step back and ask yourself "what are my goals right now?".

I don't know how well we achieved that so far. I like our frontpage, and I think the post-reading experience is quite exceptionally focused and clear, but I think there is still something about the way the whole site is structured, with its focus on recent content and new discussion that often makes me feel scattered when I visit the site.

I think a major problem is that Lesswrong doesn't make it easy to do only a single focused thing on the site at a time, and it doesn't currently really encourage you to engage with the site in a focused way. We have the library, which I do think is decent, but the sequence navigation experience is not yet fully what I would like it to be, and when... (read more)

4mako yass
When I was a starry eyed undergrad, I liked to imagine that reddit might resurrect old posts if they gained renewed interest, if someone rediscovered something and gave it a hard upvote, that would put it in front of more judges, which might lead to a cascade of re-approval that hoists the post back into the spotlight. There would be no need for reposts, evergreen content would get due recognition, a post wouldn't be done until the interest of the subreddit (or, generally, user cohort) is really gone. Of course, reddit doesn't do that at all. Along with the fact that threads are locked after a year, this is one of many reasons it's hard to justify putting a lot of time into writing for reddit.

Thoughts on negative karma notifications:

  • An interesting thing that I and some other people on the LessWrong team noticed (as well as some users) was that since we created karma notifications we feel a lot more hesitant to downvote older comments, since we know that this will show up for the other users as a negative notification. I also feel a lot more hesitant to retract my own strong upvotes or upvotes in general since the author of the comment will see that as a downvote.
  • I've had many days in a row in which I received +20 or +30 karma, followed by a single day where by chance I received a single downvote and ended up at -2. The emotional valence of having a single day at -2 was somehow stronger than the emotional valence of multiple days of +20 or +30.
What I noticed on the EA forum is the whole karma thing is messing up with my S1 processes and makes me unhappy on average. I've not only turned off the notifications, but also hidden all karma displays in comments via css, and the experience is much better.
I... feel conflicted about people deactivating the display of karma on their own comments. In many ways karma (and downvotes in particular) serve as a really important feedback source, and I generally think that people who reliably get downvoted should change how they are commenting, and them not doing so usually comes at high cost. I think this is more relevant to new users, but is still relevant for most users. Deactivating karma displays feels a bit to me like someone who shows up at a party and says "I am not going to listen to any subtle social feedback that people might give me about my behavior, and I will just do things until someone explicitly tells me to stop", which I think is sometimes the correct behavior and has some good properties in terms of encouraging diversity of discussion, but I also expect that this can have some pretty large negative impact on the trust and quality of the social atmosphere. On the other hand, I want people to have control over the incentives that they are under, and think it's important to give users a lot of control over how they want to be influenced by the platform. And there is also the additional thing, which is that if users just deactivate the karma display for their comments without telling anyone then that creates an environment of ambiguity where it's very unclear whether someone receives the feedback you are giving them at all. In the party metaphor this would be like showing up and not telling anyone that you are not going to listen to subtle social feedback, which I think can easily lead to unnecessary escalation of conflict. I don't have a considered opinion on what to incentivize here, besides being pretty confident that I wouldn't want most people to deactivate their karma displays, and that I am glad that you told me here that you did. This means that I will err on the side of leaving feedback by replying in addition to voting (though this obviously comes at a significant cost to me, so it might be game t
4Said Achmiz
Well… you can’t actually stop people from activating custom CSS that hides karma values. It doesn’t matter how you feel about it—you can’t affect it! It’s therefore probably best to create some mechanism that gives people what they want to get out of hiding karma, while still giving you what you want out of showing people karma (e.g., a “hide karma but give me a notification if one of my comments is quite strongly downvoted” option—not suggesting this exact thing, just brainstorming…).
Hmm, I agree that I can't prevent it in that sense, but I think defaults matter a lot here, as does just normal social feedback and whatever the social norms are. It's not at all clear to me that the current equilibrium isn't pretty decent, where people can do it, but it's reasonably inconvenient to do it, and so allows the people who are disproportionately negatively affected by karma notification to go that route. I would be curious in whether there are any others who do the same as Jan does, and if there are many, then we can figure out what the common motivations are and see whether it makes sense to elevate it to some site-level feature.
6Said Achmiz
But this is an extremely fragile equilibrium. It can be broken by, say, someone posting a set of simple instructions on how to do this. For instance: Anyone running the uBlock Origin browser extension can append several lines to their “My Filters” tab in the uBlock extension preferences, and thus totally hide all karma-related UI elements on Less Wrong. (PM me if you want the specific lines to append.) Or someone makes a browser extension to do this. Or a user style. Or…
FWIW I also think it's quite possible the current equilibrium is decent (which is part of reasons why I did not posted something like "How did I turned karma off" with simple instruction about how to do it on the forum, which I did consider). On the other hand I'd be curious about more people trying it and reporting their experiences. I suspect many people kind of don't have this action in the space of things they usually consider - I'd expect what most people would do is 1) just stop posting 2) write about their negative experience 3) complain privately.
Actually I turned the karma for all comments, not just mine. The bold claim is my individual taste in what's good on the EA forum is in important ways better than the karma system, and the karma signal is similar to sounds made by a noisy mob. If I want I can actually predict what average sounds will the crowd make reasonably well, so it is not any new source of information. But it still messes up with your S1 processing and motivations. Continuing with the party metaphor, I think it is generally not that difficult to understand what sort of behaviour will make you popular at a party, and what sort of behaviours even when they are quite good in a broader scheme of things will make you unpopular at parties. Also personally I often feel something like "I actually want to have good conversations about juicy topics in a quite place, unfortunately you all people are congregating at this super loud space, with all these status games, social signals, and ethically problematic norms how to treat other people" toward most parties. Overall I posted this here because it seemed like an interesting datapoint. Generally I think it would be great if people moved toward writing information rich feedback instead of voting, so such shift seems good. From what I've seen on EA forum it's quite rarely "many people" doing anything. More often it is like 6 users upvote a comment, 1user strongly downvotes it, something like karma 2 is a result. I would guess you may be in larger risk of distorted perception that this represents some meaningful opinion of the community. (Also I see some important practical cases where people are misled by "noises of the crowd" and it influences them in a harmful way.)
If people are checking karma changes constantly and getting emotional validation or pain from the result, that seems like a bad result. And yes, the whole 'one -2 and three +17s feels like everyone hates me' thing is real, can confirm.
Because of the way we do batching you can't check karma changes constantly (unless you go out of your way to change your setting) because we batch karma notifications on a 24h basis by default.
I mean, you can definitely check your karma multiple times a day to see where the last two sig digits are at, which is something I sometimes do.
True. We did very intentionally avoid putting your total karma on the frontpage anywhere as most other platforms do to avoid people getting sucked into that unintentionally, but it you can still do that on your profile. I hope we aren't wasting a lot of people's time by causing them to check their profile all the time. If we do, it might be the correct choice to also only update that number every 24h.
2Rob Bensinger
I've never checked my karma total on LW 2.0 to see how it's changed.
In my case, it sure feels like I check my karma often because I often want to know what my karma is, but maybe others differ.
3Ben Pace
Do our karma karma notifications disappear if you don’t check them that day? My model of Zvi suggested to me this is attention-grabbing and bad. I wonder if it’s better to let folks be notified of all days’ karma updates ‘til their most recent check in, and maybe also see all historical ones ordered by date if they click on a further button, so that the info isn’t lost and doesn’t feel scarce.
Nah, they accumulate until you click on them.
Which is definitely better than it expiring, and 24h batching is better than instantaneous feedback (unless you were going to check posts individually for information already, in which case things are already quite bad). It's not obvious to me what encouraging daily checks here is doing for discourse as opposed to being a Skinner box.

The motivation was (among other things) several people saying to us "yo, I wish LessWrong was a bit more of a skinner box because right now it's so throughly not a skinner box that it just doesn't make it into my habits, and I endorse it being a stronger habit than it currently is."

See this comment and thread.

It's interesting to see how people's votes on a post or comment are affected by other comments. I've noticed that a burst of vote count changes often appears after a new and apparently influential reply shows up.
Yeah, I had the same occurrence + feeling recently when I wrote the quant trading post. It felt like: "Wait, who would downvote this post...??" It's probably more likely that someone just retracted an upvote.
0mako yass
Reminder: If a person is not willing to explain their voting decisions, you are under no obligation to waste cognition trying to figure them out. They don't deserve that. They probably don't even want that.

That depends on what norm is in place. If the norm is to explain downvoting, then people should explain, otherwise there is no issue in not doing so. So the claim you are making is that the norm should be for people to explain. The well-known counterargument is that this disincentivizes downvoting.

you are under no obligation to waste cognition trying to figure them out

There is rarely an obligation to understand things, but healthy curiosity ensures progress on recurring events, irrespective of morality of their origin. If an obligation would force you to actually waste cognition, don't accept it!

1mako yass
I'm not really making that claim. A person doesn't have to do anything condemnable to be in a state of not deserving something. If I don't pay the baker, I don't deserve a bun. I am fine with not deserving a bun, as I have already eaten. The baker shouldn't feel like I am owed a bun. Another metaphor is that the person who is beaten on the street by silent, masked assailants should not feel like they owe their oppressors an apology.
4Said Achmiz
Do you mean anything by this beyond “you don’t have an obligation to figure out why people voted one way or another, period”? (Or do you think that I [i.e., the general Less Wrong commenter] do have such an obligation?) Edit: Also, the “They don’t deserve that” bit confuses me. Are you suggesting that understanding why people upvoted or downvoted your comment is a favor that you are doing for them?
2mako yass
Sometimes a person wont want to reply and say outright that they thought the comment was bad, because it's just not pleasant, and perhaps not necessary. Instead, they might just reply with information that they think you might be missing, which you could use to improve, if you chose to. With them, an engaged interlocutor will be able to figure out what isn't being said. With them, it can be productive to try to read between the lines. Isn't everything relating to writing good comments a favor, that you are doing for others. But I don't really think in terms of favors. All I mean to say is that we should write our comments for the sorts of people who give feedback. Those are the good people. Those are the people who're a part of a good faith self-improving discourse. Their outgroup are maybe not so good, and we probably shouldn't try to write for their sake.
I think I disagree. If you are getting downvoted by 5 people and one of them explains why, then even if the other 4 are not explaining their reasoning it's often reasonable to assume that more than just the one person had the same complaints, and as such you likely want to update more that it's better for you to change what you are doing.
6mako yass
We don't disagree.

Thoughts on impact measures and making AI traps

I was chatting with Turntrout today about impact measures, and ended up making some points that I think are good to write up more generally.

One of the primary reasons why I am usually unexcited about impact measures is that I have a sense that they often "push the confusion into a corner" in a way that actually makes solving the problem harder. As a concrete example, I think a bunch of naive impact regularization metrics basically end up shunting the problem of "get an AI to do what we want" into the problem of "prevent the agent from interferring with other actors in the system".

The second one sounds easier, but mostly just turns out to also require a coherent concept and reference of human preferences to resolve, and you got very little from pushing the problem around that way, and sometimes get a false sense of security because the problem appears to be solved in some of the toy problems you constructed.

I am definitely concerned that Turntrou's AUP does the same, just in a more complicated way, but am a bit more optimistic than that, mostly because I do have a sense that in the AUP case there is actually some meaningful reduction go

... (read more)
7Matthew Barnett
[ETA: This isn't a direct reply to the content in your post. I just object to your framing of impact measures, so I want to put my own framing in here] I tend to think that impact measures are just tools in a toolkit. I don't focus on arguments of the type "We just need to use an impact measure and the world is saved" because this indeed would be diverting attention from important confusion. Arguments for not working on them are instead more akin to saying "This tool won't be very useful for building safe value aligned agents in the long run." I think that this is probably true if we are looking to build aligned systems that are competitive with unaligned systems. By definition, an impact penalty can only limit the capabilities of a system, and therefore does not help us to build powerful aligned systems. To the extent that they meaningfully make cognitive reductions, this is much more difficult for me to analyze. On one hand, I can see a straightforward case for everyone being on the same page when the word "impact" is used. On the other hand, I'm skeptical that this terminology will meaningfully input into future machine learning research. The above two things are my main critiques of impact measures personally.
I think a natural way of approaching impact measures is asking "how do I stop a smart unaligned AI from hurting me?" and patching hole after hole. This is really, really, really not the way to go about things. I think I might be equally concerned and pessimistic about the thing you're thinking of. The reason I've spent enormous effort on Reframing Impact is that the impact-measures-as-traps framing is wrong! The research program I have in mind is: let's understand instrumental convergence on a gears level. Let's understand why instrumental convergence tends to be bad on a gears level. Let's understand the incentives so well that we can design an unaligned AI which doesn't cause disaster by default. The worst-case outcome is that we have a theorem characterizing when and why instrumental convergence arises, but find out that you can't obviously avoid disaster-by-default without aligning the actual goal. This seems pretty darn good to me.

Printing more rationality books: I've been quite impressed with the success of the printed copies of R:A-Z and think we should invest resources into printing more of the other best writing that has been posted on LessWrong and the broad diaspora.

I think a Codex book would be amazing, but I think there also exists potential for printing smaller books on things like Slack/Sabbath/etc., and many other topics that have received a lot of other coverage over the years. I would also be really excited about printing HPMOR, though that has some copyright complications to it.

My current model is that there exist many people interested in rationality who don't like reading longform things on the internet and are much more likely to read things when they are in printed form. I also think there is a lot of value in organizing writing into book formats. There is also the benefit that the book now becomes a potential gift for someone else to read, which I think is a pretty common way ideas spread.

I have some plans to try to compile some book-length sequences of LessWrong content and see whether we can get things printed (obviously in coordination with the authors of the relevant pieces).

Congratulations! Apparently it worked!

Forecasting on LessWrong: I've been thinking for quite a while about somehow integrating forecasts and prediction-market like stuff into LessWrong. Arbital has these small forecasting boxes that look like this:

Arbital Prediction Screenshot

I generally liked these, and think they provided a good amount of value to the platform. I think our implementation would probably take up less space, but the broad gist of Arbital's implementation seems like a good first pass.

I do also have some concerns about forecasting and prediction markets. In particular I have a sense that philosophical and mathematical progress only rarely benefits from attaching concrete probabilities to things, and more works via mathematical proof and trying to achieve very high confidence on some simple claims by ruling out all other interpretations as obviously contradictory. I am worried that emphasizing probability much more on the site would make making progress on those kinds of issues harder.

I also think a lot of intellectual progress is primarily ontological, and given my experience with existing forecasting platforms and Zvi's sequence on prediction markets, they are not very good at resolving ontological confusions and ... (read more)


This feature is important to me. It might turn out to be a dud, but I would be excited to experiment with it. If it was available in a way that was portable to other websites as well, that would be even more exciting to me (e.g. I could do this in my base blog).

Note that this feature can be used for more than forecasting. One key use case on Arbital was to see who was willing to endorse or disagree with, to what extent, various claims relevant to the post. That seemed very useful.

I don't think having internal betting markets is going to add enough value to justify the costs involved. Especially since it both can't be real money (for legal reasons, etc) and can't not be real money if it's going to do what it needs to do.

There are some external platforms that one could integrate with, here is one that is run by some EA-adjacent people: I am currently confused about whether using an external service is a good idea. In some sense it makes things mode modular, but it also limits the UI design-space a lot and lengthens the feedback loop. I think I am currently tending towards rolling our own solution and maybe allowing others to integrate it into their site.
4Rob Bensinger
One small thing you could do is to have probability tools be collapsed by default on any AIAF posts (and maybe even on the LW versions of AIAF posts). Also, maybe someone should write a blog post that's a canonical reference for 'the relevant risks of using probabilities that haven't already been written up', in advance of the feature being released. Then you could just link to that a bunch. (Maybe even include it in the post that explains how the probability tools work, and/or link to that post from all instances of the probability tool.) Another idea: Arbital had a mix of (1) 'specialized pages that just include a single probability poll and nothing else'; (2) 'pages that are mainly just about listing a ton of probability polls'; and (3) 'pages that have a bunch of other content but incidentally include some probability polls'. If probability polls on LW mostly looked like 1 and 2 rather than 3, then that might make it easier to distinguish the parts of LW that should be very probability-focused from the parts that shouldn't. I.e., you could avoid adding Arbital's feature for easily embedding probability polls in arbitrary posts (and/or arbitrary comments), and instead treat this more as a distinct kind of page, like 'Questions'. You could still link to the 'Probability' pages prominently in your post, but the reduced prominence and site support might cause there to be less social pressure for people to avoid writing/posting things out of fears like 'if I don't provide probability assignments for all my claims in this blog post, or don't add a probability poll about something at the end, will I be seen as a Bad Rationalist?'
5Rob Bensinger
Also, if you do something Arbital-like, I'd find it valuable if the interface encourages people to keep updating their probabilities later as they change. E.g., some (preferably optional) way of tracking how your view has changed over time. Probably also make it easy for people to re-vote without checking (and getting anchored by) their old probability assignment, for people who want that.

Note that Paul Christiano warns against encouraging sluggish updating by massively publicising people’s updates and judging them on it. Not sure what implementation details this suggests yet, but I do want to think about it.

4Rob Bensinger
Yeah, strong upvote to this point. Having an Arbital-style system where people's probabilities aren't prominently timestamped might be the worst of both worlds, though, since it discourages updating and makes it look like most people never do it. I have an intuition that something socially good might be achieved by seeing high-status rationalists treat ass numbers as ass numbers, brazenly assign wildly different probabilities to the same proposition week-by-week, etc., especially if this is a casual and incidental thing rather than being the focus of any blog posts or comments. This might work better, though, if the earlier probabilities vanish by default and only show up again if the user decides to highlight them. (Also, if a user repeatedly abuses this feature to look a lot more accurate than they really were, this warrants mod intervention IMO.)

Had a very aggressive crawler basically DDos-ing us from a few dozen IPs for the last hour. Sorry for the slower server response times. Things should be fixed now.

Random thoughts on game theory and what it means to be a good person

It does seem to me like there doesn’t exist any good writing on game theory from a TDT perspective. Whenever I read classical game theory, I feel like the equilibria that are being described obviously fall apart when counterfactuals are being properly brought into the mix (like D/D in prisoners dilemmas).

The obvious problem with TDT-based game theory, just as it is with Bayesian epistemology, the vast majority of direct applications are completely computationally intractable. It’s kind of obvious what should happen in games with lots of copies of yourself, but as soon as anything participates that isn’t a precise copy, everything gets a lot more confusing. So it is not fully clear what a practical game-theory literature from a TDT-perspective would look like, though maybe the existing LessWrong literature on Bayesian epistemology might be a good inspiration.

Even when you can’t fully compute everything (and we even don’t really know how to compute everything in principle), you might still be able to go through concrete scenarios and list considerations and perspectives that incorporate TDT-perspectives. I guess in t

... (read more)

Reading through this, I went "well, obviously I pay the mugger...

...oh, I see what you're doing here."

I don't have a full answer to the problem you're specifying, but something that seems relevant is the question of "How much do you want to invest in the ability to punish defectors [both in terms of maximum power-to-punish, a-la nukes, and in terms of your ability to dole out fine-grained-exactly-correct punishment, a-la skilled assassins]"

The answer to this depends on your context. And how you have answered this question determines whether it makes sense to punish people in particular contexts.

In many cases there might want to be some amount of randomization where at least some of the time you really disproportionately punish people, but you don't have to pay the cost of doing so every time.

Answering a couple of the concrete questions:


Right now, in real life, I've never been mugged, and I feel fine basically investing zero effort into preparing for being mugged. If I do get mugged, I will just hand over my wallet.

If I was getting mugged all the time, I'd probably invest effort into a) figuring out what good policies existed ... (read more)

2Lukas Finnveden
Any reason why you mention timeless decision theory (TDT) specifically? My impression was that functional decision theory (as well as UDT, since they're basically the same thing) is regarded as a strict improvement over TDT.
Same thing, it's just the handle that stuck in my mind. I think of the whole class as "timeless", since I don't think there exists a good handle that describes all of them.

Making yourself understandable to other people

(Epistemic status: Processing obvious things that have likely been written many times before, but that are still useful to have written up in my own language)

How do you act in the context of a community that is vetting constrained? I think there are fundamentally two approaches you can use to establish coordination with other parties:

1. Professionalism: Establish that you are taking concrete actions with predictable consequences that are definitely positive

2. Alignment: Establish that you are a competent actor that is acting with intentions that are aligned with the aims of others

I think a lot of the concepts around professionalism arise when you have a group of people who are trying to coordinate, but do not actually have aligned interests. In those situations you will have lots of contracts and commitments to actions that have well-specified outcomes and deviations from those outcomes are generally considered bad. It also encourages a certain suppression of agency and a fear of people doing independent optimization in a way that is not transparent to the rest of the group.

Given a lot of these drawbacks, it seems natural to aim for e... (read more)

I had forgotten this post, reread it and still think it's one of the better things of it's length I've read recently.
Glad to hear that! Seems like a good reason to publish this as a top-level post. Might go ahead and do that in the next few days.
+1 for publishing as a top level post

This FB post by Matt Bell on the Delta Variant helped me orient a good amount:

As has been the case for almost the entire pandemic, we can predict the future by looking at the present. Let’s tackle the question of “Should I worry about the Delta variant?” There’s now enough data out of Israel and the UK to get a good picture of this, as nearly all cases in Israel and the UK for the last few weeks have been the Delta variant. [1] Israel was until recently the most-vaccinated major country in the world, and is a good analog to the US because they’ve almost entirely used mRNA vaccines.

- If you’re fully vaccinated and aren’t in a high risk group, the Delta variant looks like it might be “just the flu”. There are some scary headlines going around, like “Half of new cases in Israel are among vaccinated people”, but they’re misleading for a couple of reasons. First, since Israel has vaccinated over 80% of the eligible population, the mRNA vaccine still is 1-((0.5/0.8)/(0.5/0.2)) = 75% effective against infection with the Delta variant. Furthermore, the efficacy of the mRNA vaccine is still very high ( > 90%) against hosp

... (read more)

This seems like potentially a big deal:

> Troubling—the worst variant to date, the #DeltaVariant is now the new fastest growing variant in US. This is the so-called “Indian” variant #B16172 that is ravaging the UK despite high vaccinations because it has immune evasion properties. Here is why it’s trouble—Thread. #COVID19

There's also a strong chance that delta is the most transmissible variant we know even without its immune evasion (source: I work on this, don't have a public source to share). I agree with your assessment that delta is a big deal.
The fact that we still use the same sequence to vaccinate seems like civilisational failure. 
Those graphs all show the percentage share of the different variants, but more important would be the actual growth rate. Is the delta variant growing, or is it just shrinking less quickly than the others?

@Elizabeth was interested in me crossposting this comment from the EA Forum since she thinks there isn't enough writing on the importance of design on LW. So here it is.

Atlas reportedly spent $10,000 on a coffee table. Is this true? Why was the table so expensive?

Atlas at some point bought this table, I think: At that link it costs around $2200, so I highly doubt the $10,000 number.

Lightcone then bought that table from Atlas a few months ago at the listing price, since Jonas thought the purchase ... (read more)

I like this shortform feed idea!

We launched the books on Product Hunt today! 

Leaving this here: 2d9797e61e533f03382a515b61e6d6ef2fac514f

Since this hash is publicly posted, is there any timescale for when we should check back to see the preimage?

If relevant, I will reveal it within the next week.
Preimage was:  Hashed using using the SHA-1 hash.