Mostly agree. For some more starting points, see posts with the AI-assisted alignment tag. I recently did a rough categorization of strategies for AI-assisted alignment here.
If this strategy is promising, it likely recommends fairly different prioritisation from what the alignment community is currently doing.
Not totally sure about this, my impression (see chart here) is that much of the community already considers some form of AI-assisted alignment to be our best shot. But I'd still be excited for more in-depth categorization and prioritization of strateg...
Agree directionally. I made a similar point in my review of "Is power-seeking AI an existential risk?":
In one sentence, my concern is that the framing of the report and decomposition is more like “avoid existential catastrophe” than “achieve a state where existential catastrophe is extremely unlikely and we are fulfilling humanity’s potential”, and this will bias readers toward lower estimates.
Meanwhile Rationality A-Z is just super long. I think anyone who's a longterm member of LessWrong or the alignment community should read the whole thing sooner or later – it covers a lot of different subtle errors and philosophical confusions that are likely to come up (both in AI alignment and in other difficult challenges)
My current guess is that the meme "every alignment person needs to read the Sequences / Rationality A-Z" is net harmful. They seem to have been valuable for some people but I think many people can contribute to reducing AI x-risk ...
Written and forecasted quickly, numbers are very rough. Thomas requested I make a forecast before anchoring on his comment (and I also haven't read others).
I’ll make a forecast for the question: What’s the chance a set of >=1 warning shots counterfactually tips the scales between doom and a flourishing future, conditional on a default of doom without warning shots?
We can roughly break this down into:
Just made a bet with Jeremy Gillen that may be of interest to some LWers, would be curious for opinions:
Sure, I wasn't clear enough about this in the post (there was also some confusion on Twitter about whether I was only referring to Christiano and Garfinkel rather than any "followers").
I was thinking about roughly hundreds of people in each cluster, with the bar being something like "has made at least a few comments on LW or EAF related to alignment and/or works or is upskilling to work on alignment".
Epistemic status: Exploratory
My overall chance of existential catastrophe from AI is ~50%.
My split of worlds we succeed is something like:
Good point, and you definitely have more expertise on the subject than I do. I think my updated view is ~5% on this step.
I might be underconfident about my pessimism on the first step (competitiveness of process-based systems) though. Overall I've updated to be slightly more optimistic about this route to impact.
Most problems that people work on in research are roughly the right difficulty, because the ambition level is adjusted to be somewhat challenging but not unachievable. If it's too hard then the researcher just moves on to another project. This is the problem selection process we're used to, and might bias our intuitions here.
On the other hand, we want to align AGI because it's a really important problem, and have no control over the difficulty of the problem. And if you think about the distribution of difficulties of all possible problems, it would be a hu...
Thanks for clarifying your views; makes sense that there isn't a clean distinction between accelerating alignment and theoretical thinking.
I do think there is a distinction between doing theoretical thinking that might be a prerequisite to safely accelerate alignment research substantially, and directly accelerating theoretical alignment. I thought you had updated between these two, toward the second; do you disagree with that?
My understanding is that they have very short (by my lights) timelines which recently updated them toward pushing much more toward just trying to automate alignment research rather than thinking about the theory.
Our plan to accelerate alignment does not preclude theoretical thinking, but rather requires it. The mainline agenda atm is not full automation (which I expect to be both more dangerous and less useful in the short term), but what I've been calling "cyborgism": I want to maximize the bandwidth between human alignment researchers and AI tools/oracles/assistants/simulations. It is essential that these tools are developed by (or in a tight feedback loop with) actual alignment researchers doing theory work, because we want to simulate and play with thought pro...
Haven’t yet had a chance to read the article, but from verbal conversations I’d guess they’d endorse something similar (though probably not every word) to Thomas Larsen’s opinion on this in Footnote 5 in this post:
...Answer: I see a categorical distinction between trying to align agentic and oracle AIs. Conjecture is trying only for oracle LLMs, trained without any RL pressure giving them goals, which seems way safer. OpenAI doing recursive reward modeling / IDA type schemes involves creating agentic AGIs and therefore faces also a lot more alignment issues
See The academic contribution to AI safety seems large and comments for some existing discussion related to this point
It’s probably a bit frustrating to not have their work summarized, and then be asked to explain their own work, when all of their work is published already
Fair, I see why this would be frustrating and apologize for any frustration caused. In an ideal world we would have read many of these papers and summarized them ourselves, but that would have taken a lot of time and I think the post was valuable to get out ASAP.
ETA: Probably it would have been better to include more of a disclaimer on the "everyone" point from the get-go, I think not doing this was a mistake.
"strongly influences the organization that builds AGI" applies to all alignment research initiatives right? Alignment researchers at e.g. DeepMind have less of an uphill battle but they still have to convince the rest of DeepMind to adopt their work.
Yes, I didn't mean to imply this was necessarily an Ought-specific problem and I guess it may have been a bit unfair for me to only do a BOTEC on Ought. I included it because I had the most fleshed-out thoughts on it but it could give the wrong impression about relative promise when others don't hav...
(speaking for just myself, not Thomas but I think it’s likely he’d endorse most of this)
I agree it would be great to include many of these academic groups; the exclusion wasn’t out of any sort of malice. Personally I don’t know very much about what most of these groups are doing or their motivations; if any of them want to submit brief write ups I‘d be happy to add them! :)
edit: lol, Thomas responded with a similar tone while I was typing
Good point. For myself:
Given that all the forecasts seem to be wrong in the "things happened faster than we expected" direction, we should probably expect HLAI to happen faster than expected as well.
I don't think we should update too strongly on these few data points; e.g. a previous analysis of Metaculus' AI predictions found "weak evidence to suggest the community expected more AI progress than actually occurred, but this was not conclusive". MATH and MMLU feel more relevant than the average Metaculus AI prediction but not enough to strongly outweigh the previous finding...
Steelmanning might be particularly useful in cases where we have reason to believe those who have engaged most with the arguments are biased toward ones side of the debate.
As described in But Have They Engaged with the Arguments?, perhaps a reason many who dismiss AI risk haven't engaged much with the arguments is the selection effect of engaging more if the first arguments one hears seems true. Therefore it might be useful to steelman arguments by generally reasonable people against AI risk that might seem off due to lack of engagement with existing count...
Overall agree that progress was very surprising and I'll be thinking about how it affects my big picture views on AI risk and timelines; a few relatively minor nitpicks/clarifications below.
For instance, superforecaster Eli Lifland posted predictions for these forecasts on his blog.
I'm not a superforecaster (TM) though I think some now use the phrase to describe any forecasters with good ~generalist track records?
...While he notes that the Hypermind interface limited his ability to provide wide intervals on some questions, he doesn’t make that complaint for t
Yeah I've been sporadically making progress on a personal forecasting retrospective, will include reflections and updated forecasts if/when I get around to finishing that.
Given the success of this experiment, we should propose a modified version of futarchy where laws are similarly written letter by letter!
Thanks, I agree with this and it's probably not good branding anyway.
I was thinking the "challenge" was just doing the intervention (e.g. being vegan), but agree that the framing is confusing since it refers to something different in the clinical context. I will edit my shortforms to reflect this updated view.
[crossposted from EA Forum]
Reflecting a little on my shortform from a few years ago, I think I wasn't ambitious enough in trying to actually move this forward.
I want there to be an org that does "human challenge"-style RCTs across lots of important questions that are extremely hard to get at otherwise, including (top 2 are repeated from previous shortform):
Edited to add: I no longer think "human challen...
(epistemic status: exploratory)
I think more people into LessWrong in high school - college should consider trying Battlecode. It's somewhat similar to The Darwin Game which was pretty popular on here and I think generally the type of people who like LessWrong will both enjoy and be good at Battlecode. (edited to add: A short description of Battlecode is that you write a bot to beat other bots at a turn-based strategy game. Each unit executes its own code so communication/coordination is often one of the most interesting parts.)
I did it with friends f...
Your prior is for discontinuities throughout the entire development of a technology, so shouldn't your prior be for discontinuity at any point during the development of AI, rather than discontinuity at or around the specific point when AI becomes AGI? It seems this would be much lower, though we could then adjust upward based on the particulars of why we think a discontinuity is more likely at AGI.
Holden Karnofsky wrote on Cold Takes:
I estimate that there is more than a 10% chance we'll see transformative AI within 15 years (by 2036); a ~50% chance we'll see it within 40 years (by 2060); and a ~2/3 chance we'll see it this century (by 2100).
I copied these bins to create Holden's approximate forecasted distribution (note that Holden's forecast is for Transformative AI rather than human-level AGI):

Compared to the upvote-weighted mixture in the OP, it puts more probability on longer timelines, with a median of 2060 vs. 2047 and 1/3 vs. 1/5 on after 210...
It's very likely that when the US intelligence community reports on 25. August on their data about the orgins of the COVID-19 they will conclude that it was a lab leak.
Are you open to betting on this? GJOpen community is at 9% that the report will conclude that lab leak is more likely than not, I’m at 12%.
In particular, my actual credence in lab leak is higher (~45%) but I’m guessing the most likely outcome of the report is that it’s inconclusive, and that political pressures will play a large role in the outcome.
Someone who is near the top of the leaderboard is both accurate and highly experienced
I think this unfortunately isn't true right now, and just copying the community prediction would place very highly (I'm guessing if made as soon as the community prediction appeared and updated every day, easily top 3 (edit: top 10)). See my comment below for more details.
You can look at someone's track record in detail, but we're also planning to roll out a more ways to compare people with each other.
I'm very glad to hear this. I really enjoy Metaculus but my main gripe ...
I actually think it's worth tracking: ConsensusBot should be a user, it should always update continuously to the public consensus prediction in its absence, and it shouldn't be counted as a prediction, so we can see what it looks like and how it scores.
And there should be a contest to see if anyone can use a rule that looks only at predictions, and does better than ConsensusBot (e.g. by deciding whose predictions to care about more vs. less, or accounting for systematic bias, etc).
If the user is interested in getting into the top ranks, this strategy won't be anything like enough.
I think this isn't true empirically for a reasonable interpretation of top ranks. For example, I'm ranked 5th on questions that have resolved in the past 3 months due to predicting on almost every question.
Looking at my track record, for questions resolved in the last 3 months, evaluated at all times, here's how my log score looks compared to the community:
It looks like people can change their predictions after they initially submit them. Is this history recorded somewhere, or just the current distribution?
We do store the history. You can view them by going https://elicit.org/binary then searching for the question, e.g. https://elicit.org/binary?binaryQuestions.search=Will%20there%20be%20more%20than%2050. Although as noted by Oli, we currently only display predictions that haven't been withdrawn.
...Is there an option to have people "lock in" their answer? (Maybe they can still edit/delete for a shor
Epistemic status: extremely uncertain
I created my Elicit forecast by:

[I work for Ought.]
I must admit I haven't followed the discussions you're referring to but if I were to spend more time forecasting this question I would look into them.
I didn't include effects of COVID in my forecast as it looks like the Zillow Home Value Index for Seattle has remained relatively steady since March (2% drop). I'm skeptical that there are likely to be large effects from COVID in the future when there hasn't been a large effect from COVID thus far,
A few reasons I could be wrong:
My forecast is based on:
I don't have a background in quantum computing, so there's a chance I'm misinterpreting the question in some way, but I learned a lot doing the research for the forecast (like that there's a lot of controversy regarding whether quantum supremacy has been achieved yet).
Amusingly, during my research I stumbled upon this Metaculus...
My forecast is based on historical data from Zillow. I explained my reasoning in the notes. The summary is that housing prices haven't changed very much in Seattle since April 2019 (on the whole it's risen 1%). On the other hand, prices in more expensive areas have stayed the same or declined slightly. I settled on a boring median of the price staying the same. Due to how stable the prices have been recently, I think most of the variation will come from the individual house and which neighborhood it's in, with an outside chance of large Seat...
I think it's >1% likely that the one of the first few surveys Rohin conducted would result in a fraction of >0.5.
Evidence from When Will AI Exceed Human Performance?, in the form of median survey responses of researchers who published at ICML and NIPS in 2015:
I'd be curious to see how well The alignment problem from a deep learning perspective and Without specific countermeasures... would do.