When there is a train, plane, or bus crash, it's newsworthy: it doesn't happen very often, lots of lives at stake, lots of people are interested in it. Multiple news outlets will send out reporters, and we will hear a lot of details. On the other hand, a car crash does not get this treatment unless there is something unusual about it like a driverless car or an already newsworthy person involved.
The effects are not great: while driving is relatively dangerous, both to the occupants and people outside, our sense of danger and impact is poorly calibrated by the news we read. My guess is that most people's intuitive sense of the danger of cars versus trains, planes, and buses has been distorted by this coverage, where most people, say, do not expect buses to be >16x safer than cars. This also...
Difficulties with nutrition research:
The best way through I see is to use population studies to find responsive markers, so people can run fast experiments on themselves. But it's still pretty iffy.
This is a linkpost for https://outsidetheasylum.blog/understanding-subjective-probabilities/. It's intended as an introduction to practical Bayesian probability for those who are skeptical of the notion. I plan to keep the primary link up to date with improvements and corrections, but won't do the same with this LessWrong post, so see there for the most recent version.
Any time a controversial prediction about the future comes up, there's a type of interaction that's pretty much guaranteed to happen.
Alice: "I think this thing is 20% likely to occur."
Bob: "Huh? How could you know that. You just made that number up!".
Or in Twitterese:
That is, any time someone attempts to provide a specific numerical probability on a future event, they'll inundated with claims that that number is meaningless. Is this true? Does it make...
Flip it the same way every time, and it will land the same way every time.
You are assuming determinism. Determinism is not known to be true.
The traditional interpretation of probability is known as frequentist probability. Under this interpretation, items have some intrinsic “quality” of being some % likely to do one thing vs. another.
No, freequentists probability just says that events fall into sets of a comparable type which have relative frequencies. You don't need to assume indeterminism for frequentism.##> It’s obvious once you think about i...
I could get quite in depth about this but I'm going to assume most people have a fair amount of experience with this subject. Some examples to keep in mind so you have context for my point are Discord (new mobile app and changes to its' featureset over the years), Reddit (old.reddit compared to new), LessWrong (discussions feature).
Crux of my question is this: separate from enshittification due to capitalistic forces (changes made to attempt to please investors, create endless growth, make more money generally), are changes to apps and websites worse on-average in some clear and obvious way than their previous versions, or is the evident outrage for changes to the UI and concept of these platforms from a general 'fear or dislike of change' present in...
Note: this is a repost of a Facebook post I made back in December 2022 (plus some formatting). I'm putting it up here to make it easier to link to and because it occurred to me that it might be a good idea to show it to the LW audience specifically.
As impressive as ChatGPT is on some axes, you shouldn't rely too hard on it for certain things because it's bad at what I'm going to call "board vision" (a term I'm borrowing from chess). This generalized "board vision" is the ability to concretely model or visualize the state of things (or how it might change depending on one's actions) like one might while playing a chess game.
I tested ChatGPT's board vision in chess itself. I...
Not sure what you mean by 100 percent accuracy and of course, you probably already know this but 3.5 Instruct Turbo plays chess at about 1800 ELO fulfilling your constraints (and has about 5 illegal moves (potentially less) in 8205) https://github.com/adamkarvonen/chess_gpt_eval
We speak casually of the laws of nature determining the distribution of matter and energy, or governing the behavior of physical objects. Implicit in this rhetoric is a metaphysical picture: the laws are rules that constrain the temporal evolution of stuff in the universe. In some important sense, the laws are prior to the distribution of stuff. The physicist Paul Davies expresses this idea with a bit more flair: "[W]e have this image of really existing laws of physics ensconced in a transcendent aerie, lording it over lowly matter." The origins of this conception can be traced back to the beginnings of the scientific revolution, when Descartes and Newton established the discovery of laws as the central aim of physical inquiry. In a scientific culture...
laws of nature do have a privileged role in physical explanation, but that privilege is due to their simplicity and generality, not to some mysterious quasi-causal power they exert over matter. The fact that a certain generalization is a law of nature does not account for the truth and explanatory power of the generalization, any more than the fact that a soldier has won the Medal of Honor accounts for his or her courage in combat.
That's a self-defeating analogy. So long as the process of pinning a medal on someone is epistemically valid, it does indica...
The LessWrong team is shipping a new experimental feature today: dialogue matching!
I've been leading work on this (together with Ben Pace, kave, Ricki Heicklen, habryka and RobertM), so wanted to take some time to introduce what we built and share some thoughts on why I wanted to build it.
There's now a dialogue matchmaking page at lesswrong.com/dialogueMatching
Here's how it works:
It also shows you some interesting data: your top upvoted users over the last 18 months, how much you agreed/disagreed with them, what topics they most frequently commented on, and what posts of theirs you most recently read.
Tl;dr: Looking for hard debugging tasks for evals, paying greater of $60/hr or $200 per example.
METR (formerly ARC Evals) is interested in producing hard debugging tasks for models to attempt as part of an agentic capabilities evaluation. To create these tasks, we’re seeking repos containing extremely tricky bugs. If you send us a codebase that meets the criteria for submission (listed below), we will pay you $60/hr for time spent putting it into our required format, or $200, whichever is greater. (We won’t pay for submissions that don’t meet these requirements.) If we’re particularly excited about your submission, we may also be interested in purchasing IP rights to it. We expect to want about 10-30 examples overall depending on the diversity. We're likely to be putting bounties on additional...
One particularly amusing bug I was involved with was with an early version of the content recommendation engine at the company I worked at (this is used by websites to recommend related content on the website, such as related videos, articles, etc.). One of the customers for the recommendation engine was a music video service, and we/they noticed that One Direction's song called Infinity was showing up at the top of our recommendations a little too often. (I think this was triggered by the release of another One Direction song bringing the Infinity song in...
A note on how to approach this sequence:
If you were exactly like me, I would ask you to savor this sequence, not scarf it. I would ask you to approach each of these essays in an expansive, lingering, thoughtful sort of mood. I would ask you to read them a little bit at a time, perhaps from a comfortable chair with a warm drink beside you, and to take breaks to make dinner, sing in the car, talk to your friends, and sleep.
These essays are reflections on the central principles I have gradually excavated from my past ten years of intellectual labor. I am a very slow thinker myself; if you move too quickly, I expect we’ll miss each other completely.
There’s a certain kind of thing that...
Oh man
a) I am excited by the prospect of there eventually being some kind of naturalism book
b) I like the idea of either reframing away from Naturalism, or introducing the concept more thoroughly. I was definitely among the people going "huh?" at it, but I feel interested in the prospect of establishing the version of the concept that was inspiring to you.
c) I feel especially excited for you doing an exploration of decisiveness/rapid-iteration/efficiency/other-virtues-that-seem-maybe-at-odds with patience, and then saying more things about patience.&...
Hi! This is my area of expertise - I work in the road safety field and spent 9 months investigating fatal car crashes. You are right that there are definite "Darwin Award" candidates but there are also deeply relateable ones that could happen to anyone.
Some anecdotes off the top of my head:
- A person accidentally had their car in reverse instead of forward when manouvering after leaving a parking spot. This resulted in their car falling into the ocean and the passenger dying.
- A very common crash type that usually results in very little damage: two vehic
... (read more)