Gear-level models are expensive - often prohibitively expensive. Black-box approaches are usually much cheaper and faster. But black-box approaches rarely generalize - they're subject to Goodhart, need to be rebuilt when conditions change, don't identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.

Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Our current big stupid: not preparing for 40% agreement Epistemic status: lukewarm take from the gut (not brain) that feels rightish The "Big Stupid" of the AI doomers 2013-2023 was AI nerds' solution to the problem "How do we stop people from building dangerous AIs?" was "research how to build AIs".  Methods normal people would consider to stop people from building dangerous AIs, like asking governments to make it illegal to build dangerous AIs, were considered gauche.  When the public turned out to be somewhat receptive to the idea of regulating AIs, doomers were unprepared. Take: The "Big Stupid" of right now is still the same thing.  (We've not corrected enough).  Between now and transformative AGI we are likely to encounter a moment where 40% of people realize AIs really could take over (say if every month another 1% of the population loses their job).  If 40% of the world were as scared of AI loss-of-control as you, what could the world do? I think a lot!  Do we have a plan for then? Almost every LessWrong post on AIs are about analyzing AIs.  Almost none are about how, given widespread public support, people/governments could stop bad AIs from being built. [Example: if 40% of people were as worried about AI as I was, the US would treat GPU manufacture like uranium enrichment.  And fortunately GPU manufacture is hundreds of time harder than uranium enrichment!  We should be nerding out researching integrated circuit supply chains, choke points, foundry logistics in jurisdictions the US can't unilaterally sanction, that sort of thing.] TLDR, stopping deadly AIs from being built needs less research on AIs and more research on how to stop AIs from being built. *My research included 😬
Very Spicy Take Epistemic Note:  Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion. Premise 1:  It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research. Premise 2: This was the default outcome.  Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.  Premise 3: Without repercussions for terrible decisions, decision makers have no skin in the game.  Conclusion: Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future. To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.  This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.  To quote OpenPhil: "OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."
My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc. Some quick thoughts: * Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area.  * Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc. * People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.  * So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there. * Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil * Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs. * Subjectivity of "good judgment"– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil). * Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”. With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs). Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman, and of course Jan Leike and Daniel K. 
If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.
I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.

Popular Comments

Recent Discussion


I'm not sure if those are precisely the terms of the charter, but that's besides the point. It is still "private" in the sense that there is a small group of private citizens who own the thing and decide what it should do with no political accountability to anyone else. As for the "non-profit" part, we've seen what happens to that as soon as it's in the way.

I largely disagree (even now I think having tried to play the inside game at labs looks pretty good, although I have sometimes disagreed with particular decisions in that direction because of opportunity costs). I'd be happy to debate if you'd find it productive (although I'm not sure whether I'm disagreeable enough to be a good choice).
2Wei Dai
Agreed that it reflects on badly on the people involved, although less on Paul since he was only a "technical advisor" and arguably less responsible for thinking through / due diligence on the social aspects. It's frustrating to see the EA community (on EAF and Twitter at least) and those directly involved all ignoring this. ("shouldn’t be allowed anywhere near AI Safety decision making in the future" may be going too far though.)
I don't think this is true. Nonprofits can aim to amass large amounts of wealth, they just aren't allowed to distribute that wealth to its shareholders. A good chunk of obviously very wealthy and powerful companies are nonprofits.

What's the risk that AI, agi, asi or whatever either

  1. Causes extinction
  2. Decides to torture us for science or whatever else reason

I've heard the likelihood of doom is high but is the likelihood of torture even higher?


When do you think it would happen if it did happen?

2Answer by Dave Orr
If you want a far future fictional treatment of this kind of situation, I recommend Surface Detail by Iain Banks.
2Answer by Seth Herd
The keyword for discussions of this topic is s-risk, for suffering risk. People don't generally think it's too likely, but as with other topics in alignment, there is reasonable disagreement. One source of s-risk is curiosity as a core drive in AI. Depending on how you define curiosity, that could lead to AGIs trying to "learn everything about humans", including how we'd react to really awful situations (as well as, presumably, really great situations). So I think it's possible but unlikely. But you can find more discussion by searching for s-risks
What do you think the likelihood of extinction is and when would it probably happen?
2Thomas Kwa
Seems reasonable except that Eliezer's p(doom | trying to solve alignment) in early 2023 was much higher than 50%, probably more like 98%. AGI Ruin was published in June 2022 and drafts existed since early 2022. MIRI leadership had been pretty pessimistic ever since AlphaGo in 2016 and especially since their research agenda collapsed in 2019.
I am talking about belief state in ~2015, because everyone was already skeptical about policy approach at that time.
This seems wrong. Scott Alexander and Robin Hanson are two of the most thoughtful thinkers on policy in the world and have a long history of engaging with LessWrong and writing on here. Zvi is IMO also one of the top AI policy analysts right now. Definitely true policy thinking here has a huge libertarian bent, but I think it's pretty straightforwardly wrong to claim that LW does not have a history of being a thoughtful place to have policy discussions (indeed, I am hard-pressed to find any place in public with a better history)

Oh good point– I think my original phrasing was too broad. I didn't mean to suggest that there were no high-quality policy discussions on LW, moreso meant to claim that the proportion/frequency of policy content is relatively limited. I've edited to reflect a more precise claim:

The vast majority of high-quality content on LessWrong is about technical stuff, and it's pretty rare to see high-quality policy discussions on LW these days (Zvi's coverage of various bills would be a notable exception). Partially as a result of this, some "serious policy people" d

... (read more)

The forum has been very much focused on AI safety for some time now, thought I'd post something different for a change. Privilege.

Here I define Privilege as an advantage over others that is invisible to the beholder. This may not be the only definition, or the central definition, or not how you see it, but that's the definition I use for the purposes of this post. I also do not mean it in the culture-war sense as a way to undercut others as in "check your privilege". My point is that we all have some privileges [we are not aware of], and also that nearly each one has a flip side. 

In some way this is the inverse of The Lens That Does Not See Its Flaws: The...


What are the advantages of noticing all of this?

  • better model of the world;
  • not being an asshole, i.e. not assuming that other people could do just as well as you, if they only were not so fucking lazy;
  • realizing that your chances to achieve something may be better than you expected, because you have all these advantages over most potential competitors, so if you hesitated to do something because "there are so many people, many of them could do it much better than I could", the actual number of people who could do it may be much smaller than you have assumed, and most of them will be busy doing something else instead.
Hmmm, I think the original post was an interesting idea. I think your comment points to something related but different. Perhaps taboo words?
The article suggests "invisible advantage". Other options: "unnoticed advantage", "unknown advantage".

Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.

Reasons are unclear (as usual when safety people leave OpenAI).

The NYT piece (archive) and others I've seen don't really have details.

OpenAI announced Sutskever's departure in a blogpost.

Sutskever and Leike confirmed their departures in tweets.


Friday May 17:

Superalignment dissolves.

Leike tweets, including:

I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point.

I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.

These problems are quite hard to get right,


This interview was terrifying to me (and I think to Dwarkesh as well), Schulman continually demonstrates that he hasn't really thought about the AGI future scenarios in that much depth and sort of handwaves away any talk of future dangers. 

Right off the bat he acknowledges that they reasonably expect AGI in 1-5 years or so, and even though Dwarkesh pushes him he doesn't present any more detailed plan for safety than "Oh we'll need to be careful and cooperate with the other companies...I guess..."


In case people missed this, another safety researcher recently left OpenAI: Ryan Lowe. I don't know Ryan's situation, but he was a "research manager working on AI alignment."
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
The best method of improving sample efficiency might be more like AlphaZero. The simplest method that's more likely to be discovered might be more like training on the same data over and over with diminishing returns. Since we are talking low-hanging fruit, I think it's reasonable that first forays into significantly improved sample efficiency with respect to real data are not yet much better than simply using more unique real data.
2Alexander Gietelink Oldenziel
I would be genuinely surprised if training a transformer on the pre2014 human Go data over and over would lead it to spontaneously develop alphaZero capacity. I would expect it to do what it is trained to: emulate / predict as best as possible the distribution of human play. To some degree I would anticipate the transformer might develop some emergent ability that might make it slightly better than Go-Magnus - as we've seen in other cases - but I'd be surprised if this would be unbounded. This is simply not what the training signal is.

We start with an LLM trained on 50T tokens of real data, however capable it ends up being, and ask how to reach the same level of capability with synthetic data. If it takes more than 50T tokens of synthetic data, then it was less valuable per token than real data.

But at the same time, 500T tokens of synthetic data might train an LLM more capable than if trained on the 50T tokens of real data for 10 epochs. In that case, synthetic data helps with scaling capabilities beyond what real data enables, even though it's still less valuable per token.

With Go, we ... (read more)

I'm ambivalent on this. If the analogy between improvement of sample efficiency and generation of synthetic data holds, synthetic data seems reasonably likely to be less valuable than real data (per token). In that case we'd be using all the real data we have anyway, which with repetition is sufficient for up to about $100 billion training runs (we are at $100 million right now). Without autonomous agency (not necessarily at researcher level) before that point, there won't be investment to go over that scale until much later, when hardware improves and the cost goes down.

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.


1yanni kyriacos
Hi Jonas! Would you mind saying about more about TMI + Seeing That Frees? Thanks!
1Jonas Hallgren
Sure! Anything more specific that you want to know about? Practice advice or more theory?

Thanks :) Uh, good question. Making some good links? Have you done much nondual practice? I highly recommend Loch Kelly :)

I’ve seen a lot about GPT4o being kinda bad, and I’ve experienced that myself. This surprises me. Now I will say something that feels like a silly idea. Is it possible that having the audio/visual part of the network cut off results in 4o’s poor reasoning? As in, the whole model is doing some sort of audio/visual reasoning. But we don’t have the whole model, so it can’t reason in the way it was trained to. If that is the case, I’d expect that when those parts are publicly released, scores on benchmarks shoot up? Do people smarter and more informed than me have predictions about this?
I'm confused by what you mean that GPT-4o is bad? In my experience it has been stronger than plain GPT-4, especially at more complex stuff. I do physics research and it's the first model that can actually improve the computational efficiency of parts of my code that implement physical models. It has also become more useful for discussing my research, in the sense that it dives deeper into specialized topics, while the previous GPT-4 would just respond in a very handwavy way. 

Man, I wish that was my experience. I feel like I’m constantly asking GPT4o a question, getting a weird or bad response. Then switching to 4 to finish the job.

LessOnline Festival

May 31st to June 2nd, Berkely CA