Recent Discussion

When there is a train, plane, or bus crash, it's newsworthy: it doesn't happen very often, lots of lives at stake, lots of people are interested in it. Multiple news outlets will send out reporters, and we will hear a lot of details. On the other hand, a car crash does not get this treatment unless there is something unusual about it like a driverless car or an already newsworthy person involved.

The effects are not great: while driving is relatively dangerous, both to the occupants and people outside, our sense of danger and impact is poorly calibrated by the news we read. My guess is that most people's intuitive sense of the danger of cars versus trains, planes, and buses has been distorted by this coverage, where most people, say, do not expect buses to be >16x safer than cars. This also...

Well, even after eyeballing this graph, I still don't expect to be 16x safer on a bus than while driving my car. My experience is that car crashes are covered at least by local news, and the overwhelming majority of car crashes I've heard of involved drunk drivers, ludicrous speed, or idiots looking at their smartphones instead of the road. A bus is safer than a car mainly because the average bus driver is more scrupulous than the average car driver. Do we have data on car crash fatalities limited to public services like taxi? My best guess is that fatalities would decrease by an order of magnitude when you restrict to rule-abiding drivers.

Hi! This is my area of expertise - I work in the road safety field and spent 9 months investigating fatal car crashes. You are right that there are definite "Darwin Award" candidates but there are also deeply relateable ones that could happen to anyone. 

Some anecdotes off the top of my head:

  • A person accidentally had their car in reverse instead of forward when manouvering after leaving a parking spot. This resulted in their car falling into the ocean and the passenger dying.
  • A very common crash type that usually results in very little damage: two vehic
... (read more)
Personally, I’m more familiar with folks creating entirely new nonprofit media outlets to focus on reporting in an area that they believe to deserve better coverage (many of which then seek to partner with traditional publishers on specific projects once they have a demonstrated body of work), rather than directly funding that coverage at an existing paper. I think Religion News Service is basically an older representative of this approximate model, and topic-focused non-profit journalism organizations like this seem to be popping up more frequently as traditional models of funding journalism come under increasing strain. More current examples that appear to fit this approximate pattern include The Intercept for coverage on surveillance and adjacent issues, The Marshall Project for issues relevant to criminal justice reform, and Anthropocene Magazine for climate change solutions.
In Berkeley there's a sign outside city hall (put up by anti-car activists) listing the number of weeks since someone died of a car collision - perhaps that solution could work here. Altho TBH that sign updated me towards thinking cars are safer than I realized - current count is 59 weeks.

Difficulties with nutrition research:

  1. ~Impossible to collect information on a population level. We could dig into the reasons this is true, but it doesn't mater because...
  2. High variance between people means population data is of really limited applicability
  3. Under the best case circumstances, by the time you have results from an individual it means they've already hurt themselves. 


The best way through I see is to use population studies to find responsive markers, so people can run fast experiments on themselves. But it's still pretty iffy. 

This is a linkpost for It's intended as an introduction to practical Bayesian probability for those who are skeptical of the notion. I plan to keep the primary link up to date with improvements and corrections, but won't do the same with this LessWrong post, so see there for the most recent version.

Any time a controversial prediction about the future comes up, there's a type of interaction that's pretty much guaranteed to happen.

Alice: "I think this thing is 20% likely to occur."

Bob: "Huh? How could you know that. You just made that number up!".

Or in Twitterese:

That is, any time someone attempts to provide a specific numerical probability on a future event, they'll inundated with claims that that number is meaningless. Is this true? Does it make...

Flip it the same way every time, and it will land the same way every time.

You are assuming determinism. Determinism is not known to be true.

The traditional interpretation of probability is known as frequentist probability. Under this interpretation, items have some intrinsic “quality” of being some % likely to do one thing vs. another.

No, freequentists probability just says that events fall into sets of a comparable type which have relative frequencies. You don't need to assume indeterminism for frequentism.##> It’s obvious once you think about i... (read more)

I could get quite in depth about this but I'm going to assume most people have a fair amount of experience with this subject. Some examples to keep in mind so you have context for my point are Discord (new mobile app and changes to its' featureset over the years), Reddit (old.reddit compared to new), LessWrong (discussions feature).

Crux of my question is this: separate from enshittification due to capitalistic forces (changes made to attempt to please investors, create endless growth, make more money generally), are changes to apps and websites worse on-average in some clear and obvious way than their previous versions, or is the evident outrage for changes to the UI and concept of these platforms from a general 'fear or dislike of change' present in...

I think the point here is relevant as well. A site's users are usually the people for whom the existing UI works best, and changes are usually aimed at bringing new people to the site. 

3mako yass3h
My first guess would be that it just isn't common for organizations to constitute with resilient meritocratic internal hiring (sometimes because that's genuinely not in the interest of the founders), and they regress towards the mean (being shit) over time. And the rise happens especially quickly with web apps, so sustqinable internal dynamics aren't required for them to rise ti prominence, and the network effects are strong, so it's more visible?
4Answer by Dagon4h
All companies and products get changed over time.  And they're optimizing on multiple dimensions, some of which are cost and complexity of delivery, attractiveness to new users, usability by existing users, and different segments of paying customers (including, usually, advertisers and financiers).   In MOST cases, there's a selection effect of early (or current) users liking things well enough to overcome the hurdles of use, so many changes that are to optimize for other stakeholders (especially the very important potential users who haven't yet started using the product) are likely to seem counterproductive to those who find it good as-is. Add to this the uncertainty of WHAT precisely makes it good, and what will make it better for the target subsets, and a lot of changes are just bad, but there's no way to know that without trying it.   And it's known that many changes are path-dependent and change itself bothers many, so undoing a neutral or somewhat negative change (even when very negative to some users) is often seen as more costly than leaving it, and making other changes later.
2Garrett Baker4h
You seem possibly right, looking at some data with ChatGPT. It collected the following data: YearMAUs (Millions)Growth Rate (%)20139095.65201417493.33201519914.37201725025.63201833132.40201943029.91202061943.95202186139.102022119538.79 and graph: 2019 seems to have had a decrease in growth compared to 2018, maybe that's attributable to the redesign? If nothing else, the redesign doesn't look so impactful.

Note: this is a repost of a Facebook post I made back in December 2022 (plus some formatting).  I'm putting it up here to make it easier to link to and because it occurred to me that it might be a good idea to show it to the LW audience specifically.

Board Vision

As impressive as ChatGPT is on some axes, you shouldn't rely too hard on it for certain things because it's bad at what I'm going to call "board vision" (a term I'm borrowing from chess). This generalized "board vision" is the ability to concretely model or visualize the state of things (or how it might change depending on one's actions) like one might while playing a chess game.

I tested ChatGPT's board vision in chess itself. I...

Not sure what you mean by 100 percent accuracy and of course, you probably already know this but 3.5 Instruct Turbo plays chess at about 1800 ELO fulfilling your constraints (and has about 5 illegal moves (potentially less) in 8205)

Laws as Rules

We speak casually of the laws of nature determining the distribution of matter and energy, or governing the behavior of physical objects. Implicit in this rhetoric is a metaphysical picture: the laws are rules that constrain the temporal evolution of stuff in the universe. In some important sense, the laws are prior to the distribution of stuff. The physicist Paul Davies expresses this idea with a bit more flair: "[W]e have this image of really existing laws of physics ensconced in a transcendent aerie, lording it over lowly matter." The origins of this conception can be traced back to the beginnings of the scientific revolution, when Descartes and Newton established the discovery of laws as the central aim of physical inquiry. In a scientific culture...

laws of nature do have a privileged role in physical explanation, but that privilege is due to their simplicity and generality, not to some mysterious quasi-causal power they exert over matter. The fact that a certain generalization is a law of nature does not account for the truth and explanatory power of the generalization, any more than the fact that a soldier has won the Medal of Honor accounts for his or her courage in combat.

That's a self-defeating analogy. So long as the process of pinning a medal on someone is epistemically valid, it does indica... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

The LessWrong team is shipping a new experimental feature today: dialogue matching!

I've been leading work on this (together with Ben Pace, kave, Ricki Heicklen, habryka and RobertM), so wanted to take some time to introduce what we built and share some thoughts on why I wanted to build it. 

New feature! 🎉

There's now a dialogue matchmaking page at

Here's how it works:

  • You can check a user you'd potentially be interested in having a dialogue with, if they were too
  • They can't see your checks unless you match

It also shows you some interesting data: your top upvoted users over the last 18 months, how much you agreed/disagreed with them, what topics they most frequently commented on, and what posts of theirs you most recently read. 

  • Next, if you find a match,

I am in literally the exact same situation, and think your proposed remedy makes sense.

Tl;dr: Looking for hard debugging tasks for evals, paying greater of $60/hr or $200 per example.

METR (formerly ARC Evals) is interested in producing hard debugging tasks for models to attempt as part of an agentic capabilities evaluation. To create these tasks, we’re seeking repos containing extremely tricky bugs. If you send us a codebase that meets the criteria for submission (listed below), we will pay you $60/hr for time spent putting it into our required format, or $200, whichever is greater. (We won’t pay for submissions that don’t meet these requirements.) If we’re particularly excited about your submission, we may also be interested in purchasing IP rights to it. We expect to want about 10-30 examples overall depending on the diversity. We're likely to be putting bounties on additional...

One particularly amusing bug I was involved with was with an early version of the content recommendation engine at the company I worked at (this is used by websites to recommend related content on the website, such as related videos, articles, etc.). One of the customers for the recommendation engine was a music video service, and we/they noticed that One Direction's song called Infinity was showing up at the top of our recommendations a little too often. (I think this was triggered by the release of another One Direction song bringing the Infinity song in... (read more)

Cool idea! This is an odd phrasing to me in two ways 1. Contains a bug that would [be hard] to solve So I think a lot of this depends on your definition of "solve". I frequently run into bugs where I expect identifying and fixing the exact root cause of the bug would take upwards of 20 hours (e.g. it's clearly some sort of a race condition, but nobody has yet managed to reproduce it) but sidestepping the bug is fast (slap a lock on updating whatever entity around both entire operations that were trying to update that entity). For an example of what I mean, see the Segmentation Fault when Using SentenceTransformer Inside Docker Container question: the conclusion seems to be "there's a known bug with using pytorch with Python 3.11 on an Apple M2 Mac within a docker container, you can fix it by using a different version of Python, a different version of pytorch, or a different physical machine." 1. 6 hours for a skilled programmer to solve In my experience a lot of bugs are almost impossible to diagnose if you've never seen a bug in this class before and don't know how to use the debugging toolchain, and trivial to diagnose if you have seen and dealt with this kind of bug before. Looking at the same example from before, I bet there's at least one engineer at Apple and one pytorch dev who, if you got them together, could fire up Apple's internal equivalent of gdb and figure out exactly what tensor operations SentenceTransformer.forward() is trying to do, which operation failed, why it failed, and what equivalent operation would work in place of the failing one. It's likely something extremely dumb and shallow if you get the right people in a room together working on it. Without the ability to debug what's going on in the apple-specific part of the stack I bet this would take at least 20 hours to solve, probably much more (because the tooling sucks, not because the problem is inherently difficult). So I guess my question is whether "hard" bugs of the form "thi
3Michael Roe7h
Some examples of bugs that were particularly troublesome on a recent project.   1. in the MIPS backend for the LLVM conpiler there is one point where it ought to be checking whether the target cpu is 32 bit or 64 bit. Instead, it checks if the MIPS version number is mips64. Problem; there were 64 bit MIPS versions before mips64, e.g, mips4, so the check is wrong. Obvious when you see the line of code, but days of tracing though thousands and thousands of lines of code till you  it. 2. with a particularvversion of freebsd on MIPS, it works fine on single core but the console dies on multi core. The serial line interrupt is routed to one of the cores. On receiving an interupt, the oscdisables the interrupt and puts handling the interrupt on a scheduling queue. when the task is taken off the queue, the interrupt is handoed and then re-enabled. Problem: last step might be scheduled to run on a different core. If that happens, interrupt remains disabled on the core that receives the interrupt, and enabled on a core that never recieves it. Console output dies.
This reminded me of a bug I spent weeks figuring out, at the beginning of my career. Not sure if something like this would qualify, and I do not have the code anyway. I wrote a relatively simple code that called a C library produced at the same company. Other people have used the same library for years without any issues. My code worked correctly on my local machine; worked correctly during the testing; and when deployed to the production server, it worked correctly... for about an hour... and then it stopped working. I had no idea what to do about this. I was an inexperienced junior programmer; I didn't have a direct access to the production machine and there was nothing in the logs; and I could not reproduce the bug locally and neither could the tester. No one else had any problem using the library, and I couldn't see anything wrong in my code. About a month later, I figured out... ...that at some moment, the library generated a temporary file, in the system temporary directory... ...the temporary file had a name generated randomly... ...the library even checked for the (astronomically unlikely) possibility that a file with given name might already exist, in which case it would generate another random name and check again (up to 100 times, and then it would give up, because potentially infinite loops were not allowed by our strict security policy). Can you guess the problem now? If anyone wants to reproduce this in Python and collect the reward, feel free to do so.

A note on how to approach this sequence:

If you were exactly like me, I would ask you to savor this sequence, not scarf it. I would ask you to approach each of these essays in an expansive, lingering, thoughtful sort of mood. I would ask you to read them a little bit at a time, perhaps from a comfortable chair with a warm drink beside you, and to take breaks to make dinner, sing in the car, talk to your friends, and sleep.

These essays are reflections on the central principles I have gradually excavated from my past ten years of intellectual labor. I am a very slow thinker myself; if you move too quickly, I expect we’ll miss each other completely.

There’s a certain kind of thing that...

Review(This is a review of the entire sequence.) On the day when I first conceived of this sequence, my room was covered in giant graph paper sticky notes. The walls, the windows, the dressers, the floor. Sticky pads everywhere, and every one of them packed with word clouds and doodles in messy bold marker. My world is rich. The grain of the wood on the desk in front of me, the slightly raw sensation inside my nostrils that brightens each time I inhale, the pressure of my search for words as I write that rises up through my chest and makes my brain feel like it’s breathing through a straw. I know as well as almost anybody what MacNeice called “the drunkenness of things being various”, “incorrigibly plural”. I am awash in details; sometimes I swim, sometimes I drown, and in rare merciful moments, I float. People talk about missing the forest for the trees; I am a creature of individual leaves. The sticky notes with which I had covered my walls were my attempts to recall every twig and branch I had seen while developing my approach to rationality, ever since I asked myself what the existing art is missing back in 2013. Each page was an attempted portrait of a different tree.  The sentence I somehow pulled together for this sequence—”Knowing the territory takes patient and direct observation”—was my sketch of the entire forest all at once.      On that day, it had seemed a literally incomprehensible pile of details, as nearly everything I write about does until some time after I’ve published. Yet after two more years of work on this project, I still think that sketch is not only accurate, but pretty close to complete. I am proud of this sequence. It’s far from perfect; it’s far from adequate, in fact. And I’ll talk about that, too. But as a first-pass summary of how I think about “Intro to Naturalism”, it’s right to say that overall, I think it may be the best thing I’ve done so far. * I doubt it’s worth much on its own, though. It was really never meant to be

Oh man 

a) I am excited by the prospect of there eventually being some kind of naturalism book

b) I like the idea of either reframing away from Naturalism, or introducing the concept more thoroughly. I was definitely among the people going "huh?" at it, but I feel interested in the prospect of establishing the version of the concept that was inspiring to you.

c) I feel especially excited for you doing an exploration of decisiveness/rapid-iteration/efficiency/other-virtues-that-seem-maybe-at-odds with patience, and then saying more things about patience.&... (read more)