Why do some societies exhibit more antisocial punishment than others? Martin explores both some literature on the subject, and his own experience living in a country where "punishment of cooperators" was fairly common.
This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.
This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.
We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review.
Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."
We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model...
If anyone wants to try this on llama-3 7b, I converted the collab to baukit, and it's available here.
A couple years ago, I had a great conversation at a research retreat about the cool things we could do if only we had safe, reliable amnesic drugs - i.e. drugs which would allow us to act more-or-less normally for some time, but not remember it at all later on. And then nothing came of that conversation, because as far as any of us knew such drugs were science fiction.
… so yesterday when I read Eric Neyman’s fun post My hour of memoryless lucidity, I was pretty surprised to learn that what sounded like a pretty ideal amnesic drug was used in routine surgery. A little googling suggested that the drug was probably a benzodiazepine (think valium). Which means it’s not only a great amnesic, it’s also apparently one...
For those who don't get the joke: benzos are depressants, and will (temporarily) significantly reduce your cognitive function if you take enough to have amnesia.
this might not make john's idea pointless, if the tested interventions's effect on cognitive performance still correlates strongly with sober performance. but there may be some interventions whose main effect is to offset benzos effects whose usefulness does not generalize to sober.
The Brownian motion assumption is rather strong but not required for the conclusion. Consider the stock market, which famously has heavy-tailed, bursty returns. It happens all the time for the S&P 500 to move 1% in a week, but a 10% move in a week only happens a couple of times per decade. I would guess (and we can check) that most weeks have >0.6x of the average per-week variance of the market, which causes the median weekly absolute return to be well over half of what it would be if the market were Brownian motion with the same long-term variance.
Also, Lawrence tells me that in Tetlock's studies, superforecasters tend to make updates of 1-2% every week, which actually improves their accuracy.
Yesterday, I had a coronectomy: the top halves of my bottom wisdom teeth were surgically removed. It was my first time being sedated, and I didn’t know what to expect. While I was unconscious during the surgery, the hour after surgery turned out to be a fascinating experience, because I was completely lucid but had almost zero short-term memory.
My girlfriend, who had kindly agreed to accompany me to the surgery, was with me during that hour. And so — apparently against the advice of the nurses — I spent that whole hour talking to her and asking her questions.
The biggest reason I find my experience fascinating is that it has mostly answered a question that I’ve had about myself for quite a long time: how deterministic am...
Clive Wearing's story might be interesting to you: https://m.youtube.com/watch?v=k_P7Y0-wgos&feature=youtu.be
Happy May the 4th from Convergence Analysis! Cross-posted on the EA Forum.
As part of Convergence Analysis’s scenario research, we’ve been looking into how AI organisations, experts, and forecasters make predictions about the future of AI. In February 2023, the AI research institute Epoch published a report in which its authors use neural scaling laws to make quantitative predictions about when AI will reach human-level performance and become transformative. The report has a corresponding blog post, an interactive model, and a Python notebook.
We found this approach really interesting, but also hard to understand intuitively. While trying to follow how the authors derive a forecast from their assumptions, we wrote a breakdown that may be useful to others thinking about AI timelines and forecasting.
In what follows, we set out our interpretation of...
The title is strong with this one. I like it.
I wish there were more discussion posts on LessWrong.
Right now it feels like it weakly if not moderately violates some sort of cultural norm to publish a discussion post (similar but to a lesser extent on the Shortform). Something low effort of the form "X is a topic I'd like to discuss. A, B and C are a few initial thoughts I have about it. Thoughts?"
It seems to me like something we should encourage though. Here's how I'm thinking about it. Such "discussion posts" currently happen informally in social circles. Maybe you'll text a friend. Maybe you'll brin...
(ElevenLabs reading of this post:)
I'm excited to share a project I've been working on that I think many in the Lesswrong community will appreciate - converting some rational fiction into high-quality audiobooks using cutting-edge AI voice technology from ElevenLabs, under the name "Askwho Casts AI".
The keystone of this project is an audiobook version of Planecrash (AKA Project Lawful), the epic glowfic authored by Eliezer Yudkowsky and Lintamande. Given the scope and scale of this work, with its large cast of characters, I'm using ElevenLabs to give each character their own distinct voice. It's a labor of love to convert this audiobook version of this story, and I hope if anyone has bounced off it before, this...
Thanks! Glad you are enjoying it.
A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:
“When are you going to write about seed oils?”
“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”
“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”
“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”
He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...
I suspect the word 'pre-prepared' is doing a lot of the heavy lifting here--when I see that item on the list I think things like pre-fried chicken, frozen burger patties, veggie pakora, veggies in a sauce for a stir-fry, stuff like that (like you'd find in a ready-made frozen meal). Not like, frozen peas.