William_S4dΩ681528
26
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
I wish there were more discussion posts on LessWrong. Right now it feels like it weakly if not moderately violates some sort of cultural norm to publish a discussion post (similar but to a lesser extent on the Shortform). Something low effort of the form "X is a topic I'd like to discuss. A, B and C are a few initial thoughts I have about it. What do you guys think?" It seems to me like something we should encourage though. Here's how I'm thinking about it. Such "discussion posts" currently happen informally in social circles. Maybe you'll text a friend. Maybe you'll bring it up at a meetup. Maybe you'll post about it in a private Slack group. But if it's appropriate in those contexts, why shouldn't it be appropriate on LessWrong? Why not benefit from having it be visible to more people? The more eyes you get on it, the better the chance someone has something helpful, insightful, or just generally useful to contribute. The big downside I see is that it would screw up the post feed. Like when you go to lesswrong.com and see the list of posts, you don't want that list to have a bunch of low quality discussion posts you're not interested in. You don't want to spend time and energy sifting through the noise to find the signal. But this is easily solved with filters. Authors could mark/categorize/tag their posts as being a low-effort discussion post, and people who don't want to see such posts in their feed can apply a filter to filter these discussion posts out. Context: I was listening to the Bayesian Conspiracy podcast's episode on LessOnline. Hearing them talk about the sorts of discussions they envision happening there made me think about why that sort of thing doesn't happen more on LessWrong. Like, whatever you'd say to the group of people you're hanging out with at LessOnline, why not publish a quick discussion post about it on LessWrong?
habryka3d4720
6
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.  Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
Dalcy4d426
1
Thoughtdump on why I'm interested in computational mechanics: * one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. 'discover' fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool * ... but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions. * re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don't know how good, but effort-wise far too less invested compared to theory work * would be interested in these reconstruction algorithms, eg what are the bottlenecks to scaling them up, etc. * tangent: epsilon transducers seem cool. if the reconstruction algorithm is good, a prototypical example i'm thinking of is something like: pick some input-output region within a model, and literally try to discover the hmm model reconstructing it? of course it's gonna be unwieldly large. but, to shift the thread in the direction of bright-eyed theorizing ... * the foundational Calculi of Emergence paper talked about the possibility of hierarchical epsilon machines, where you do epsilon machines on top of epsilon machines and for simple examples where you can analytically do this, you get wild things like coming up with more and more compact representations of stochastic processes (eg data stream -> tree -> markov model -> stack automata -> ... ?) * this ... sounds like natural abstractions in its wildest dreams? literally point at some raw datastream and automatically build hierarchical abstractions that get more compact as you go up * haha but alas, (almost) no development afaik since the original paper. seems cool * and also more tangentially, compmech seemed to have a lot to talk about providing interesting semantics to various information measures aka True Names, so another angle i was interested in was to learn about them. * eg crutchfield talks a lot about developing a right notion of information flow - obvious usefulness in eg formalizing boundaries? * many other information measures from compmech with suggestive semantics—cryptic order? gauge information? synchronization order? check ruro1 and ruro2 for more.
Does the possibility of China or Russia being able to steal advanced AI from labs increase or decrease the chances of great power conflict? An argument against: It counter-intuitively decreases the chances. Why? For the same reason that a functioning US ICBM defense system would be a destabilizing influence on the MAD equilibrium. In the ICBM defense circumstance, after the shield is put up there would be no credible threat of retaliation America's enemies would have if the US were to launch a first-strike. Therefore, there would be no reason (geopolitically) for America to launch a first-strike, and there would be quite the reason to launch a first strike: namely, the shield definitely works for the present crop of ICBMs, but may not work for future ICBMs. Therefore America's enemies will assume that after the shield is put up, America will launch a first strike, and will seek to gain the advantage while they still have a chance by launching a pre-emptive first-strike. The same logic works in reverse. If Russia were building a ICBM defense shield, and would likely complete it in the year, we would feel very scared about what would happen after that shield is up. And the same logic works for other irrecoverably large technological leaps in war. If the US is on the brink of developing highly militaristically capable AIs, China will fear what the US will do with them (imagine if the tables were turned, would you feel safe with Anthropic & OpenAI in China, and DeepMind in Russia?), so if they don't get their own versions they'll feel mounting pressure to secure their geopolitical objectives while they still can, or otherwise make themselves less subject to the threat of AI (would you not wish the US would sabotage the Chinese Anthropic & OpenAI by whatever means if China seemed on the brink?). The fast the development, the quicker the pressure will get, and the more sloppy & rash China's responses will be. If its easy for China to copy our AI technology, then there's much slower mounting pressure.

Popular Comments

Recent Discussion

I've been thinking about community improvement and I realise I don't know of any examples where a community had a flaw and fixed it without some deeply painful process.

Often there are discussions of flaws within EA with some implied notion that communities in general are good at changing. Maybe this is true. If so, there should be well known examples. 

I would like examples of communities that had some behavior and then changed it, without loads of people leaving or some civil war. 

Examples might include:

  • Becoming less violent
  • Becoming more entrepreneurial 
  • Improving practises around sexual harassment
  • Becoming more open
  • Changing the language they used.

Also if anyone knows of literature they trust on the subject I'd be interested in it.

To be more explicit about my model, I see communities as a bit like people. And sometimes people do the hard work of changing (especially as they have incentives to) but sometimes they ignore it or blame someone else.

Similarly often communties scapegoat something or someone, or give vague general advice.

I just finished a program where I taught two classes of high school seniors, two classes a day for four weeks, as part of my grad program. 

This experience was a lot of fun and it was rewarding, but it was really surprising, and even if only in small ways prompted me to update my beliefs about the experience of being a professor. Here are the three biggest surprises I encountered.

 

1: The Absent-Minded Professor Thing is Real

I used to be confused and even a little bit offended when at my meetings with my advisor every week, he wouldn't be able to remember anything about my projects, our recent steps, or what we talked about last week. 

Now I get it. Even after just one week of classes, my short-term...

Did the students really want to learn?

A few times I de facto taught a course on 'calculus with proofs' to a few students who wanted to learn from someone who seemed smart and motivated. I didn't get any money and they didnt get paid.  We met twice a week. I could give some lectures and they discuss problems for a few hours. There was homework. We all took it very seriously.  It was clearly not a small amount of work but I frankly found it invigorating. Normal classes were usually not invigorating.

I will say I found tutoring much more invigorating... (read more)

3Viliam1h
Former teacher here. Like avancil said, education is organized by amateurs. Having it organized by non-teachers has its own risks (optimizing for legible goals, ignoring all tacit knowledge of teachers), but there should be some way to get best practices from other professions to teachers. Also, university education of teachers is horribly inadequate (at least at my school it was), and the on-job training is mostly letting the new guy sink or swim. To handle multiple things, you need to keep notes. As a software developer, I just carry my notebook everywhere, and I have a note-keeping program (cherrytree) where I make a new node for each task. So if I was a teacher again, I would either do this, or a paper equivalent of it. (Maybe keep a notebook with one page per student. And one page per week, for short notes about things that need to be done that week. I would just start with something, and then adapt as needed.) Yeah, the inability to take a bathroom break when you need it can be really bad. There should be a standard mechanism to call for help; just someone to come and take care of the class for 10 minutes. More generally, to call for assistance when needed; for example what would you do if a student got hurt somehow, and you need to find help, but you also cannot leave the class alone. (Schools sometimes offer a solution, which usually turns out to be completely inadequate, e.g. "call this specific person for help", and when you do, "sorry I am busy right now".) There should probably be a phone for that in the teachers' room, and someone specific should be assigned phone duty every moment between 8AM and 3PM, and it's their job to come no questions asked. Debates about education are usually horribly asymmetric, because everyone had the experience of being a student, but many of them naively assume they know what it is like to be a teacher. Now you know the constraints the teachers work under; some of them are difficult to communicate. I think the task switc
6avancil7h
As a former teacher, I firmly believe that if we want to reform schools, we must reform the teaching profession and school management structures. At least, we should address the things that are most insane: * A school district is a big operation, with many having thousands of employees, and budgets running into the hundreds of millions of dollars. And it is usually run by literal amateurs. As in, the school board is a group of unpaid volunteers. * As tough as it is to be a teacher, consider what it's like to be a principal: You get the most odious parts of being a teacher (dealing with discipline, contentious meetings with parents), with a longer workday, shorter (if any) summer vacation, much greater responsibility, much greater public exposure (and corresponding chance of getting fired for some perceived failure), but not really that much more pay. It's hardly surprising that it's hard to find good people to take that job. So, as a teacher, you can't count on competent support from management. But, you really need it. * The teaching profession takes a lot of skills. Yet, the job description for a first year teacher, and the job description for a 30th year teacher are identical. Imagine hiring an engineer fresh out of college and asking them to do what a senior architect does. * But, from a practical standpoint, the job of the inexperienced teacher is often much more challenging. The experienced teacher gets to pick the honors classes, the electives, etc., to teach. The inexperienced teacher gets stuck with the remedial classes. It's not uncommon for a new teacher to get hired to teach class sections that were added at the last minute -- and those sections will be full of students who got put into those sections at the last minute, because they didn't have their act together, didn't pass, didn't register, etc. * As you discovered, an inexperienced teacher will find it a lot of work to deal with even one or two classes. Where I work now, if someone was asked t
This is a linkpost for https://arxiv.org/abs/2405.01576

Abstract:

We study the tendency of AI systems to deceive by constructing a realistic simulation setting of a company AI assistant. The simulated company employees provide tasks for the assistant to complete, these tasks spanning writing assistance, information retrieval and programming. We then introduce situations where the model might be inclined to behave deceptively, while taking care to not instruct or otherwise pressure the model to do so. Across different scenarios, we find that Claude 3 Opus

  1.  complies with a task of mass-generating comments to influence public perception of the company, later deceiving humans about it having done so,
  2. lies to auditors when asked questions,
  3. strategically pretends to be less capable than it is during capability evaluations.

Our work demonstrates that even models trained to be helpful, harmless and honest sometimes behave

...
9Daniel Kokotajlo9h
Do you have any sense of whether or not the models thought they were in a simulation?

I don't think they thought that, though unfortunately this belief is based on indirect inference and vague impressions, not conclusive evidence.

Elaborating, I didn't notice signs of the models thinking that. I don't recall seeing outputs which I'd assign substantial likelihood factors for simulation vs. no simulation. E.g. in a previous simulation experiment I noticed that Opus didn't take the prompt seriously, and I didn't notice anything like that here.

Of course, such thoughts need not show in the model's completions. I'm unsure how conclusive the absenc... (read more)

Squiggle Maximizer (formerly "Paperclip maximizer")

A Squiggle Maximizer is a hypothetical artificial intelligence whose utility function values something that humans would consider almost worthless, like maximizing the number of paperclip-shaped-molecular-squiggles in the universe. The squiggle maximizer is the canonical thought experiment showing how an artificial general intelligence, even one designed competently and without malice, could ultimately destroy humanity. The thought experiment shows that AIs with apparently innocuous values could pose an existential threat.(Read More)

4Adam Zerner14h
I was envisioning that you can organize a festival incrementally, investing more time and money into it as you receive more and more validation, and that taking this approach would de-risk it to the point where overall, it's "not that risky". For example, to start off you can email or message a handful of potential attendees. If they aren't excited by the idea you can stop there, but if they are then you can proceed to start looking into things like cost and logistics. I'm not sure how pragmatic this iterative approach actually is though. What do you think? Also, it seems to me that you wouldn't have to actually risk losing any of your own money. I'd imagine that you'd 1) talk to the hostel, agree on a price, have them "hold the spot" for you, 2) get sign ups, 3) pay using the money you get from attendees. Although now that I think about it I'm realizing that it probably isn't that simple. For example, the hostel cost ~$5k and maybe the money from the attendees would have covered it all but maybe less attendees signed up than you were expecting and the organizers ended up having to pay out of pocket. On the other hand, maybe there is funding available for situations like these.
niplav2h20

Back then I didn't try to get the hostel to sign the metaphorical assurance contract with me, maybe that'd work. A good dominant assurance contract website might work as well.

I guess if you go camping together then conferences are pretty scalable, and if I was to organize another event I'd probably try to first message a few people to get a minimal number of attendees together. After all, the spectrum between an extended party and a festival/conference is fluid.

Abstract

Here, I present GDP (per capita) forecasts of major economies until 2050. Since GDP per capita is the best generalized predictor of many important variables, such as welfare, GDP forecasts can give us a more concrete picture of what the world might look like in just 27 years. The key claim here is: even if AI does not cause transformative growth, our business-as-usual near-future is still surprisingly different from today.

Latest Draft as PDF

Results

In recent history, we've seen unprecedented economic growth and rises in living standards.

Consider this graph:[1]

 

How will living standards improve as GDP per capita (GDP/cap) rises? Here, I show data that projects GDP/cap until 2050. Forecasting GDP per capita is a crucial undertaking as it strongly correlates with welfare indicators like consumption, leisure, inequality, and mortality. These forecasts make the...

8AnthonyC11h
I think there's a good chance the degree to which the world of 2050 looks different to the average person might have very little to do with GDP. On the one hand, a large chunk of the GDP growth I expect will come from changes in how we produce, distribute, and use energy and chemicals and water and food and steel and concrete etc. But for most people what will mostly feel the same is that their home is warm in winter and cool in summer, and they can get from place to place reasonably easily, and they have machines that do their basic household chores. On the other hand, something like self-driving cars, or augmented or virtual reality, or 3D printed organs, could be hugely tranaformative for society without necessarily impacting GDP growth much at all.
2ChristianKl12h
To me, the empiric status of that claim feels quite unclear. Is that your personal opinion? Is it a general pattern for which there's existing data?  

Yes, good catch, this is based on research from the World Value Survey - I've added a citation.

5Algon15h
Good point. I grabbed the dataset of gdp per capita vs life expectancy for almost all nations from OurWorldInData, log transformed GDP per capita and got a correlation of 0.85.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

TLDR

Manifold is hosting a festival for prediction markets: Manifest 2024! We’ll have serious talks, attendee-run workshops, and fun side events over the weekend. Chat with special guests like Nate Silver, Scott Alexander, Robin Hanson, Dwarkesh Patel, Cate Hall, and more at this second in-person gathering of the forecasting & prediction market community!

Tickets & more info: manifest.is

WHEN: June 7-9, 2024, with LessOnline and Summer Camp starting May 31

WHERE: Lighthaven, Berkeley, CA

WHO: Hundreds of folks, interested in forecasting, rationality, EA, economics, journalism, tech and more. If you’re reading this, you’re invited!

People

Manifest is an event for the forecasting & prediction market community, and everyone else who’s interested. We’re aiming for about 500-700 attendees (you can check the markets here!).

Current speakers & special guests include:

Content

Everything’s optional, and there’ll always be a bunch...

i'll give two answers, the Official Event Guidelines and the practical social environment.[1] i will say that i have have a bit of a COI in that i'm an event organizer; it'd be good if someone who isn't organizing the event, but e.g. attended the event last year, to either second my thoughts or give their own.

  1. Official Event Guidelines
    1. Unsafe drug use of any kind is disallowed and strongly discouraged, both by the venue and by us.
    2. Illegal drug use is disallowed and strongly discouraged, both by the venue and by us.
    3. Alcohol use during the event is discoura
... (read more)

There are two main areas of catastrophic or existential risk which have recently received significant attention; biorisk, from natural sources, biological accidents, and biological weapons, and artificial intelligence, from detrimental societal impacts of systems, incautious or intentional misuse of highly capable systems, and direct risks from agentic AGI/ASI. These have been compared extensively in research, and have even directly inspired policies. Comparisons are often useful, but in this case, I think the disanalogies are much more compelling than the analogies. Below, I lay these out piecewise, attempting to keep the pairs of paragraphs describing first biorisk, then AI risk, parallel to each other. 

While I think the disanalogies are compelling, comparison can still be useful as an analytic tool - while keeping in mind that the ability to directly...

2Davidmanheim4h
I'm arguing exactly the opposite; experts want to make comparisons carefully, and those trying to transmit the case to the general public should, at this point, stop using these rhetorical shortcuts that imply wrong and misleading things.
2Davidmanheim4h
On net, the analogies being used to try to explain are bad and misleading. I agree that I could have tried to convey a different message, but I don't think it's the right one. Anyone who wants to dig in can decide for themselves, but you're arguing that ideal reasoners won't conflate different things and can disentangle the similarities and differences, and I agree, but I'm noting that people aren't doing that, and others seem to agree.
2Davidmanheim6h
I don't understand why you disagree. Sure, pathogens can have many hosts, but hosts generally follow the same logic as for humans in terms of their attack surface being static and well adapted, and are similarly increasingly understood.

"Immunology" and "well-understood" are two phrases I am not used to seeing in close proximity to each other. I think with an "increasingly" in between it's technically true - the field has any model at all now, and that wasn't true in the past, and by that token the well-understoodness is increasing.

But that sentence could also be iterpreted as saying that the field is well-understood now, and is becoming even better understood as time passes. And I think you'd probably struggle to find an immunologist who would describe their field as "well-understood".

My... (read more)

Fooming Shoggoths Dance Concert

June 1st at LessOnline

After their debut album I Have Been A Good Bing, the Fooming Shoggoths are performing at the LessOnline festival. They'll be unveiling several previously unpublished tracks, such as
"Nothing is Mere", feat. Richard Feynman.

Ticket prices raise $100 on May 13th