William_S4dΩ681558
27
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
Pretending not to see when a rule you've set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes). Example: suppose you have a toddler and a "rule" that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There's cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more painful than it's worth in that moment (meaning, fully discounting future consequences). If you fail to enforce the rule, you undermine your authority which results in your toddler fighting future enforcement (of this and possibly all other rules!) much harder, as he realizes that the rule is in fact negotiable / flexible. However, you have a third choice, which is to credibly pretend to not see that he's doing it. It's true that this will undermine your perceived competence, as an authority, somewhat. However, it does not undermine the perception that the rule is to be fully enforced if only you noticed the violation. You get to "skip" a particularly costly enforcement, without taking steps back that compromise future enforcement much. I bet this happens sometimes in classrooms (re: disruptive students) and prisons (re: troublesome prisoners) and regulation (re: companies that operate in legally aggressive ways). Of course, this stops working and becomes a farce once the pretense is clearly visible. Once your toddler knows that sometimes you pretend not to see things to avoid a fight, the benefit totally goes away. So it must be used judiciously and artfully.
I wish there were more discussion posts on LessWrong. Right now it feels like it weakly if not moderately violates some sort of cultural norm to publish a discussion post (similar but to a lesser extent on the Shortform). Something low effort of the form "X is a topic I'd like to discuss. A, B and C are a few initial thoughts I have about it. What do you guys think?" It seems to me like something we should encourage though. Here's how I'm thinking about it. Such "discussion posts" currently happen informally in social circles. Maybe you'll text a friend. Maybe you'll bring it up at a meetup. Maybe you'll post about it in a private Slack group. But if it's appropriate in those contexts, why shouldn't it be appropriate on LessWrong? Why not benefit from having it be visible to more people? The more eyes you get on it, the better the chance someone has something helpful, insightful, or just generally useful to contribute. The big downside I see is that it would screw up the post feed. Like when you go to lesswrong.com and see the list of posts, you don't want that list to have a bunch of low quality discussion posts you're not interested in. You don't want to spend time and energy sifting through the noise to find the signal. But this is easily solved with filters. Authors could mark/categorize/tag their posts as being a low-effort discussion post, and people who don't want to see such posts in their feed can apply a filter to filter these discussion posts out. Context: I was listening to the Bayesian Conspiracy podcast's episode on LessOnline. Hearing them talk about the sorts of discussions they envision happening there made me think about why that sort of thing doesn't happen more on LessWrong. Like, whatever you'd say to the group of people you're hanging out with at LessOnline, why not publish a quick discussion post about it on LessWrong?
habryka4d4922
7
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.  Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
Dalcy4d447
1
Thoughtdump on why I'm interested in computational mechanics: * one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. 'discover' fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool * ... but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions. * re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don't know how good, but effort-wise far too less invested compared to theory work * would be interested in these reconstruction algorithms, eg what are the bottlenecks to scaling them up, etc. * tangent: epsilon transducers seem cool. if the reconstruction algorithm is good, a prototypical example i'm thinking of is something like: pick some input-output region within a model, and literally try to discover the hmm model reconstructing it? of course it's gonna be unwieldly large. but, to shift the thread in the direction of bright-eyed theorizing ... * the foundational Calculi of Emergence paper talked about the possibility of hierarchical epsilon machines, where you do epsilon machines on top of epsilon machines and for simple examples where you can analytically do this, you get wild things like coming up with more and more compact representations of stochastic processes (eg data stream -> tree -> markov model -> stack automata -> ... ?) * this ... sounds like natural abstractions in its wildest dreams? literally point at some raw datastream and automatically build hierarchical abstractions that get more compact as you go up * haha but alas, (almost) no development afaik since the original paper. seems cool * and also more tangentially, compmech seemed to have a lot to talk about providing interesting semantics to various information measures aka True Names, so another angle i was interested in was to learn about them. * eg crutchfield talks a lot about developing a right notion of information flow - obvious usefulness in eg formalizing boundaries? * many other information measures from compmech with suggestive semantics—cryptic order? gauge information? synchronization order? check ruro1 and ruro2 for more.

Popular Comments

Recent Discussion

TLDR:

  1. Around Einstein-level, relatively small changes in intelligence can lead to large changes in what one is capable to accomplish.
    1. E.g. Einstein was a bit better than the other best physi at seeing deep connections and reasoning, but was able to accomplish much more in terms of impressive scientific output.
  2. There are architectures where small changes can have significant effects on intelligence.
    1. E.g. small changes in human-brain-hyperparameters: Einstein’s brain didn’t need to be trained on 3x the compute than normal physics professors for him to become much better at forming deep understanding, even without intelligence improving intelligence.

Einstein and the heavytail of human intelligence

1905 is often described as the "annus mirabilis" of Albert Einstein. He founded quantum physics by postulating the existence of (light) quanta, explained Brownian motion, introduced the special relativity theory and...

1Towards_Keeperhood6h
I think research on what you propose should definitely not be public and I'd recommend against publicly trying to push this alignment agenda.
3Towards_Keeperhood6h
(I think) Planck found the formula that matched the empirically observed distribution, but had no explanation for why it should hold. Einstein found the justification for this formula.
1RussellThor13h
OK but if that were true then there would have been many more Einstein like breakthroughs since then. More likely is that such low hanging fruit have been plucked and a similar intellect is well into diminishing returns. That is given our current technological society and >50 year history of smart people trying to work on everything if there are such breakthroughs to be made, then the IQ required is now higher than in Einsteins day.

I think you are misjudging the mental attributes that are conducive to scientific breakthroughs. 

My (not very well informed) understanding is that Einstein was not especially brilliant in terms of raw brainpower (better at math and such than the average person, of course, but not much better than the average physicist). His advantage was instead being able to envision theories that did not occur to other people. What might be described as high creativity rather than high intelligence.

Other attributes conducive to breakthroughs are a willingness to wor... (read more)

If you’ve ever been to Amsterdam, you’ve probably visited, or at least heard about the famous cookie store that sells only one cookie. I mean, not a piece, but a single flavor.

I’m talking about Van Stapele Koekmakerij of course—where you can get one of the world's most delicious chocolate chip cookies. If not arriving at opening hour, it’s likely to find a long queue extending from the store’s doorstep through the street it resides. When I visited the city a few years ago, I watched the sensation myself: a nervous crowd awaited as the rumor of ‘out of stock’ cookies spreaded across the line.

Van Stapele Koekmakerij - Cookie Shop in Amsterdam
Owner Vera Van Stapele with fresh-baked cookies, via store website

The store, despite becoming a landmark for tourists, stands for an idea that seems to...

Dagon14m20

I think this depends a whole lot on the domain/product, the scalability vs locality question (cookies get worse if more are made in the same place and then distributed, most software doesn't), and the network effect (software that depends on many people using the same thing).

I love my Kindle (and have since the ugly angular V1).  It would be very hard to argue that Amazon is particularly good at "one thing well", though.  Almost none of it's other products are that focused, and it's only because serious readers spend so much on books that they've... (read more)

I just finished a program where I taught two classes of high school seniors, two classes a day for four weeks, as part of my grad program. 

This experience was a lot of fun and it was rewarding, but it was really surprising, and even if only in small ways prompted me to update my beliefs about the experience of being a professor. Here are the three biggest surprises I encountered.

 

1: The Absent-Minded Professor Thing is Real

I used to be confused and even a little bit offended when at my meetings with my advisor every week, he wouldn't be able to remember anything about my projects, our recent steps, or what we talked about last week. 

Now I get it. Even after just one week of classes, my short-term...

What are your goals when you teach?

What gives you pleasure when teaching?

2sapphire10h
Did the students really want to learn? A few times I de facto taught a course on 'calculus with proofs' to a few students who wanted to learn from someone who seemed smart and motivated. I didn't get any money and they didnt get paid.  We met twice a week. I could give some lectures and they discuss problems for a few hours. There was homework. We all took it very seriously.  It was clearly not a small amount of work but I frankly found it invigorating. Normal classes were usually not invigorating. I will say I found tutoring much more invigorating than teaching courses. When a student comes to you for tutoring they tend to REALLY wanna learn the material fast. Often there is a test coming up. They pay attention. If you are good they are grateful for your attention. And you feel grateful to them too! Its wonderful to see someone learn fast. Many students will be genuinely sincerely thankful to their good tutors. It makes sense the tutors were trying to help them learn as efficiently as possible! I think teaching is soul crushing because neither the student nor the teacher is properly motivated. Teachers are not providing their students with optimal learning environments, they aren't even trying (where as a good tutor is actually trying. Even at a commercial tutoring center). And the students aren't trying all that hard to learn. This is my usual conclusion but the issue is not lack of skill in most cases. Its lack of 'actually trying' and lack of 'minimal attempt at good practices'. Professors/teachers just have to teach whoever shows up and needs the class. This is true in tutoring centers too, to some degree. But in tutoring there is much more expectation students will find a tutor they vibe with.  Education could be fun and invigorating for both sides if both sides came into the experience with a sincere attitude to try. But unless goals are aligned where could such an attitude come from.  And goals are often not aligned. I will say as a professor I also wa
3Viliam10h
Former teacher here. Like avancil said, education is organized by amateurs. Having it organized by non-teachers has its own risks (optimizing for legible goals, ignoring all tacit knowledge of teachers), but there should be some way to get best practices from other professions to teachers. Also, university education of teachers is horribly inadequate (at least at my school it was), and the on-job training is mostly letting the new guy sink or swim. To handle multiple things, you need to keep notes. As a software developer, I just carry my notebook everywhere, and I have a note-keeping program (cherrytree) where I make a new node for each task. So if I was a teacher again, I would either do this, or a paper equivalent of it. (Maybe keep a notebook with one page per student. And one page per week, for short notes about things that need to be done that week. I would just start with something, and then adapt as needed.) Yeah, the inability to take a bathroom break when you need it can be really bad. There should be a standard mechanism to call for help; just someone to come and take care of the class for 10 minutes. More generally, to call for assistance when needed; for example what would you do if a student got hurt somehow, and you need to find help, but you also cannot leave the class alone. (Schools sometimes offer a solution, which usually turns out to be completely inadequate, e.g. "call this specific person for help", and when you do, "sorry I am busy right now".) There should probably be a phone for that in the teachers' room, and someone specific should be assigned phone duty every moment between 8AM and 3PM, and it's their job to come no questions asked. Debates about education are usually horribly asymmetric, because everyone had the experience of being a student, but many of them naively assume they know what it is like to be a teacher. Now you know the constraints the teachers work under; some of them are difficult to communicate. I think the task switc
9avancil16h
As a former teacher, I firmly believe that if we want to reform schools, we must reform the teaching profession and school management structures. At least, we should address the things that are most insane: * A school district is a big operation, with many having thousands of employees, and budgets running into the hundreds of millions of dollars. And it is usually run by literal amateurs. As in, the school board is a group of unpaid volunteers. * As tough as it is to be a teacher, consider what it's like to be a principal: You get the most odious parts of being a teacher (dealing with discipline, contentious meetings with parents), with a longer workday, shorter (if any) summer vacation, much greater responsibility, much greater public exposure (and corresponding chance of getting fired for some perceived failure), but not really that much more pay. It's hardly surprising that it's hard to find good people to take that job. So, as a teacher, you can't count on competent support from management. But, you really need it. * The teaching profession takes a lot of skills. Yet, the job description for a first year teacher, and the job description for a 30th year teacher are identical. Imagine hiring an engineer fresh out of college and asking them to do what a senior architect does. * But, from a practical standpoint, the job of the inexperienced teacher is often much more challenging. The experienced teacher gets to pick the honors classes, the electives, etc., to teach. The inexperienced teacher gets stuck with the remedial classes. It's not uncommon for a new teacher to get hired to teach class sections that were added at the last minute -- and those sections will be full of students who got put into those sections at the last minute, because they didn't have their act together, didn't pass, didn't register, etc. * As you discovered, an inexperienced teacher will find it a lot of work to deal with even one or two classes. Where I work now, if someone was asked t

Previously: On the Proposed California SB 1047.

Text of the bill is here. It focuses on safety requirements for highly capable AI models.

This is written as an FAQ, tackling all questions or points I saw raised.

Safe & Secure AI Innovation Act also has a description page.

Why Are We Here Again?

There have been many highly vocal and forceful objections to SB 1047 this week, in reaction to a (disputed and seemingly incorrect) claim that the bill has been ‘fast tracked.’ 

The bill continues to have substantial chance of becoming law according to Manifold, where the market has not moved on recent events. The bill has been referred to two policy committees one of which put out this 38 page analysis

The purpose of this post is to gather and analyze all...

Jiro32m20

"This very clearly does not" apply to X and "I have an argument that it doesn't apply to X" are not the same thing.

(And it wouldn't be hard for a court to make some excuse like "these specific harms have to be $500m, and other harms 'of similar severity' means either worse things with less than $500m damage or less bad things with more than $500m damage". That would explain the need to detail specific harms while putting no practical restriction on what the law covers, since the court can claim that anything is a worse harm.

Always assume that laws of this type are interpreted by an autistic, malicious, genie.)

2lc6h
Robin Hanson has apparently asked the same thing. It seems like such a bizarre question to me: * Most people do not have the constitution or agency for criminal murder * Most companies do not have secrets large enough that assassinations would reduce the size of their problems on expectation * Most people who work at large companies don't really give a shit if that company gets fined or into legal trouble, and so they don't have the motivation to personally risk anything organizing murders to prevent lawsuits
2Ben Pace1h
I think my model of people is that people are very much changed by the affordances that society gives them and the pressures they are under. In contrast with this statement, a lot of hunter-gatherer people had to be able to fight to the death, so I don't buy that it's entirely about the human constitution. I think if it was a known thing that you could hire an assassin on an employee and unless you messed up and left quite explicit evidence connecting you, you'd get away with it, then there'd be enough pressures to cause people in-extremis to do it a few times per year even in just high-stakes business settings. Also my impression is that business or political assassinations exist to this day in many countries; a little searching suggests Russia, Mexico, Venezuela, possibly Nigeria, and more. I generally put a lot more importance on tracking which norms are actually being endorsed and enforced by the group / society as opposed to primarily counting on individual ethical reasoning or individual ethical consciences. (TBC I also am not currently buying that this is an assassination in the US, but I didn't find this reasoning compelling.)
lc40m20

Also my impression is that business or political assassinations exist to this day in many countries; a little searching suggests Russia, Mexico, Venezuela, possibly Nigeria, and more.

Oh definitely. In Mexico in particular business pairs up with organized crime all of the time to strong-arm competitors. But this happens when there's an "organized crime" tycoons can cheaply (in terms of risk) pair up with. Also, OP asked about why companies don't assassinate whistlebowers all the time specifically.

a lot of hunter-gatherer people had to be able to fight

... (read more)

YouTube link

What’s going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us.

Topics we discuss:

...
3Seth Herd14h
Please just wait until you have the podcast link to post these to LW? We probably don't want to read it if you went to the trouble of making a podcast. This is now available as a podcast if you search. I don't have the RSS feed link handy.
2DanielFilan12h
Sorry - YouTube's taking an abnormally long time to process the video.

Update: there's now a YouTube link

2DanielFilan12h
I've added a link to listen on Apple Podcasts.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Some people have suggested that a lot of the danger of training a powerful AI comes from reinforcement learning. Given an objective, RL will reinforce any method of achieving the objective that the model tries and finds to be successful including things like deceiving us or increasing its power.

If this were the case, then if we want to build a model with capability level X, it might make sense to try to train that model either without RL or with as little RL as possible. For example, we could attempt to achieve the objective using imitation learning instead. 

However, if, for example, the alternate was imitation learning, it would be possible to push back and argue that this is still a black-box that uses gradient descent so we...

2Chris_Leong14h
You mention that society may do too little of the safer types of RL. Can you clarify what you mean by this?

In brief: large amounts of high quality process based RL might result in AI being more useful earlier (prior to them becoming much smarter). This might be expensive and annoying (e.g. it might require huge amounts of high quality human labor) such that by default labs do less of this relative to just scaling up models than would be optimal from a safety perspective.

5porby15h
Calling MuZero RL makes sense. The scare quotes are not meant to imply that it's not "real" RL, but rather that the category of RL is broad enough that it belonging to it does not constrain expectation much in the relevant way. The thing that actually matters is how much the optimizer can roam in ways that are inconsistent with the design intent. For example, MuZero can explore the superhuman play space during training, but it is guided by the structure of the game and how it is modeled. Because of that structure, we can be quite confident that the optimizer isn't going to wander down a path to general superintelligence with strong preferences about paperclips.
4Steven Byrnes7h
Right, and that wouldn’t apply to a model-based RL system that could learn an open-ended model of any aspect of the world and itself, right? I think your “it is nearly impossible for any computationally tractable optimizer to find any implementation for a sparse/distant reward function” should have some caveat that it only clearly applies to currently-known techniques. In the future there could be better automatic-world-model-builders, and/or future generic techniques to do automatic unsupervised reward-shaping for an arbitrary reward, such that AIs could find out-of-the-box ways to solve hard problems without handholding.

A couple years ago, I had a great conversation at a research retreat about the cool things we could do if only we had safe, reliable amnestic drugs - i.e. drugs which would allow us to act more-or-less normally for some time, but not remember it at all later on. And then nothing came of that conversation, because as far as any of us knew such drugs were science fiction.

… so yesterday when I read Eric Neyman’s fun post My hour of memoryless lucidity, I was pretty surprised to learn that what sounded like a pretty ideal amnestic drug was used in routine surgery. A little googling suggested that the drug was probably a benzodiazepine (think valium). Which means it’s not only a great amnestic, it’s also apparently one...

4Algon7h
@habryka this comment has an anomalous amount of karma. It showed up on popular comments, I think, and I'm wondering if people liked the comment when they saw it there which lead to a feedback loop of more eyeballs on the comment, more likes, more eyeball etc. If so, is that the intended behaviour of the popular comments feature? It seems like it shouldn't be.
6habryka2h
Yeah, seems like a kinda bad feedback loop. It doesn't seem to usually happen in that the comments I've seen upvoted in that section usually don't get this extremely many upvotes on a comment this short. I don't have a great solution. We could do something that's more clever and algorithmic, which doesn't seem crazy but I am also hesitant to do because it's a lot of work and also I like more straightforward and simple algorithms for transparency reasons.

IDK, I think this comment warrants the level of karma. OP is proposing messing around with a drug class that kills thousands of people per year. Even only counting benzo overdoses that don't involve opioids, it kills ~1500 people per year. Source: https://nida.nih.gov/research-topics/trends-statistics/overdose-death-rates (you can download the data from that page to see precise numbers).

It's not often that a forum comment could save a life!

Linch1h20

I can see some arguments in your direction but would tentatively guess the opposite. 

Fooming Shoggoths Dance Concert

June 1st at LessOnline

After their debut album I Have Been A Good Bing, the Fooming Shoggoths are performing at the LessOnline festival. They'll be unveiling several previously unpublished tracks, such as
"Nothing is Mere", feat. Richard Feynman.

Ticket prices raise $100 on May 13th