Why do some societies exhibit more antisocial punishment than others? Martin explores both some literature on the subject, and his own experience living in a country where "punishment of cooperators" was fairly common.
A few days ago I came upstairs to:
Me: how did you get in there?Nora: all by myself!
Either we needed to be done with the crib, which had a good chance of much less sleeping at naptime, or we needed a taller crib. This is also something we went through when Lily was little, and that time what worked was removing the bottom of the crib.
It's a basic crib, a lot like this one. The mattress sits on a metal frame, which attaches to a set of holes along the side of the crib. On it's lowest setting, the mattress is still ~6" above the floor. Which means if we remove the frame and sit the mattress on the floor, we gain ~6".
Without the mattress weighing it down, though, the crib...
That ought to buy you a couple weeks, anyway. ;)
Any pinching concern with those straps?
As part of Spring Meetups Everywhere 2024 (https://www.astralcodexten.com/p/spring-meetups-everywhere-2024), we're having another meetup.
Time: Tuesday 7.5.2024, 18:00 onwards
Place: Kitty's Public House, Mannerheimintie 5, Helsinki
How to find us: We'll be in the private room called Kitty's Lounge, find it and come in.
See you there!
GPT-5 training is probably starting around now. It seems very unlikely that GPT-5 will cause the end of the world. But it’s hard to be sure. I would guess that GPT-5 is more likely to kill me than an asteroid, a supervolcano, a plane crash or a brain tumor. We can predict fairly well what the cross-entropy loss will be, but pretty much nothing else.
Maybe we will suddenly discover that the difference between GPT-4 and superhuman level is actually quite small. Maybe GPT-5 will be extremely good at interpretability, such that it can recursively self improve by rewriting its own weights.
Hopefully model evaluations can catch catastrophic risks before wide deployment, but again, it’s hard to be sure. GPT-5 could plausibly be devious enough to circumvent all of...
Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights?
I was thinking of a scenario where OpenAI deliberately gives it access to its own weights to see if it can self improve.
I agree that it would be more like to just speed up normal ML research.
Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout).
TL,DR: I introduce a method for eliciting latent behaviors in language models by learning unsupervised perturbations of an early layer of an LLM. These perturbations are trained to maximize changes in downstream activations. The method discovers diverse and meaningful behaviors with just one prompt, including perturbations overriding safety training, eliciting backdoored behaviors and uncovering latent capabilities.
Summary In the simplest case, the unsupervised perturbations I learn are given by unsupervised steering vectors - vectors added to the residual stream as a bias term in the MLP outputs of a given layer. I also report preliminary results on unsupervised steering adapters - these are LoRA adapters of the MLP output weights of a given...
Enjoyed this post! Quick question about obtaining the steering vectors:
Do you train them one at a time, possibly adding an additional orthogonality constraint between each train?
Happy May the 4th from Convergence Analysis! Cross-posted on the EA Forum.
As part of Convergence Analysis’s scenario research, we’ve been looking into how AI organisations, experts, and forecasters make predictions about the future of AI. In February 2023, the AI research institute Epoch published a report in which its authors use neural scaling laws to make quantitative predictions about when AI will reach human-level performance and become transformative. The report has a corresponding blog post, an interactive model, and a Python notebook.
We found this approach really interesting, but also hard to understand intuitively. While trying to follow how the authors derive a forecast from their assumptions, we wrote a breakdown that may be useful to others thinking about AI timelines and forecasting.
In what follows, we set out our interpretation of...
Under the current version of the interactive model, its median prediction is just two decades earlier than that from Cotra’s forecast
Just?
Question about the "rules of the game" you present. Are you allowed to simply look at layer 0 transcoder features for the final 10 tokens - you could probably roughly estimate the input string from these features' top activators. From you case study, it seems that you effectively look at layer 0 transcoder features for a few of the final tokens through a backwards search, but wonder if you can skip the search and simply look at transcoder features. Thank you.
By "gag order" do you mean just as a matter of private agreement, or something heavier-handed, with e.g. potential criminal consequences?
I have trouble understanding the absolute silence we seem to be having. There seem to be very few leaks, and all of them are very mild-mannered and are failing to build any consensus narrative that challenges OA's press in the public sphere.
Are people not able to share info over Signal or otherwise tolerate some risk here? It doesn't add up to me if the risk is just some chance of OA trying to then sue you to bankruptcy, ...
[If you haven't come since we started meeting at Rocky Hill Cohousing, make sure to read this for more details about where to go and park.]
We're the regular Northampton area meetup for Astral Codex Ten readers, and (as far as I know) the only rationalist or EA meetup in Western Massachusetts.
We started as part of the blog's 2018 "Meetups Everywhere" event, and have been holding meetups with varying degrees of regularity ever since. At most meetups we get about 4-7 people out of a rotation of 15-20, with a nice mix of regular faces, people who only drop in once in a while, and occasionally total newcomers.
After meeting more sporadically in the past, we recently started meeting biweekly (currently every other Saturday, although there's a chance I'll...
I'm excited to share a project I've been working on that I think many in the Lesswrong community will appreciate - converting some rational fiction into high-quality audiobooks using cutting-edge AI voice technology from ElevenLabs, under the name "Askwho Casts AI".
The keystone of this project is an audiobook version of Planecrash (AKA Project Lawful), the epic glowfic authored by Eliezer Yudkowsky and Lintamande. Given the scope and scale of this work, with its large cast of characters, I'm using ElevenLabs to give each character their own distinct voice. It's a labor of love to convert this audiobook version of this story, and I hope if anyone has bounced...