When there is a train, plane, or bus crash, it's newsworthy: it doesn't happen very often, lots of lives at stake, lots of people are interested in it. Multiple news outlets will send out reporters, and we will hear a lot of details. On the other hand, a car crash does not get this treatment unless there is something unusual about it like a driverless car or an already newsworthy person involved.
The effects are not great: while driving is relatively dangerous, both to the occupants and people outside, our sense of danger and impact is poorly calibrated by the news we read. My guess is that most people's intuitive sense of the danger of cars versus trains, planes, and buses has been distorted by this coverage, where most people, say, do not expect buses to be >16x safer than cars. This also...
This is a linkpost for https://outsidetheasylum.blog/understanding-subjective-probabilities/. It's intended as an introduction to practical Bayesian probability for those who are skeptical of the notion. I plan to keep the primary link up to date with improvements and corrections, but won't do the same with this LessWrong post, so see there for the most recent version.
Any time a controversial prediction about the future comes up, there's a type of interaction that's pretty much guaranteed to happen.
Alice: "I think this thing is 20% likely to occur."
Bob: "Huh? How could you know that. You just made that number up!".
Or in Twitterese:
That is, any time someone attempts to provide a specific numerical probability on a future event, they'll inundated with claims that that number is meaningless. Is this true? Does it make...
I could get quite in depth about this but I'm going to assume most people have a fair amount of experience with this subject. Some examples to keep in mind so you have context for my point are Discord (new mobile app and changes to its' featureset over the years), Reddit (old.reddit compared to new), LessWrong (discussions feature).
Crux of my question is this: separate from enshittification due to capitalistic forces (changes made to attempt to please investors, create endless growth, make more money generally), are changes to apps and websites worse on-average in some clear and obvious way than their previous versions, or is the evident outrage for changes to the UI and concept of these platforms from a general 'fear or dislike of change' present in...
Note: this is a repost of a Facebook post I made back in December 2022 (plus some formatting). I'm putting it up here to make it easier to link to and because it occurred to me that it might be a good idea to show it to the LW audience specifically.
As impressive as ChatGPT is on some axes, you shouldn't rely too hard on it for certain things because it's bad at what I'm going to call "board vision" (a term I'm borrowing from chess). This generalized "board vision" is the ability to concretely model or visualize the state of things (or how it might change depending on one's actions) like one might while playing a chess game.
I tested ChatGPT's board vision in chess itself. I...
We speak casually of the laws of nature determining the distribution of matter and energy, or governing the behavior of physical objects. Implicit in this rhetoric is a metaphysical picture: the laws are rules that constrain the temporal evolution of stuff in the universe. In some important sense, the laws are prior to the distribution of stuff. The physicist Paul Davies expresses this idea with a bit more flair: "[W]e have this image of really existing laws of physics ensconced in a transcendent aerie, lording it over lowly matter." The origins of this conception can be traced back to the beginnings of the scientific revolution, when Descartes and Newton established the discovery of laws as the central aim of physical inquiry. In a scientific culture...
The LessWrong team is shipping a new experimental feature today: dialogue matching!
I've been leading work on this (together with Ben Pace, kave, Ricki Heicklen, habryka and RobertM), so wanted to take some time to introduce what we built and share some thoughts on why I wanted to build it.
There's now a dialogue matchmaking page at lesswrong.com/dialogueMatching
Here's how it works:
It also shows you some interesting data: your top upvoted users over the last 18 months, how much you agreed/disagreed with them, what topics they most frequently commented on, and what posts of theirs you most recently read.
Tl;dr: Looking for hard debugging tasks for evals, paying greater of $60/hr or $200 per example.
METR (formerly ARC Evals) is interested in producing hard debugging tasks for models to attempt as part of an agentic capabilities evaluation. To create these tasks, we’re seeking repos containing extremely tricky bugs. If you send us a codebase that meets the criteria for submission (listed below), we will pay you $60/hr for time spent putting it into our required format, or $200, whichever is greater. (We won’t pay for submissions that don’t meet these requirements.) If we’re particularly excited about your submission, we may also be interested in purchasing IP rights to it. We expect to want about 10-30 examples overall depending on the diversity. We're likely to be putting bounties on additional...
A note on how to approach this sequence:
If you were exactly like me, I would ask you to savor this sequence, not scarf it. I would ask you to approach each of these essays in an expansive, lingering, thoughtful sort of mood. I would ask you to read them a little bit at a time, perhaps from a comfortable chair with a warm drink beside you, and to take breaks to make dinner, sing in the car, talk to your friends, and sleep.
These essays are reflections on the central principles I have gradually excavated from my past ten years of intellectual labor. I am a very slow thinker myself; if you move too quickly, I expect we’ll miss each other completely.
There’s a certain kind of thing that...