With AI capabilities advancing in several domains from elementary-school level (GPT-3, 2020) to beyond PhD-level (2025) in just five years, the AI safety field may face a critical challenge: developing and deploying effective solutions fast enough to manage catastrophic and existential risks from beyond-human level AI systems that may emerge on timelines shorter than we hope.
On October 10, 2025, I organized a hybrid symposium bringing together 52 researchers and founders (25 in-person in NYC) to explore technical methods for accelerating AI safety progress. The focus of the event was not to discuss any single research safety agenda, but how to reduce catastrophic/existential risk from near-term powerful AI faster, as a field. This post shares what we covered and learned in the talks of our five speakers.
Acceleration of AI safety progress appears valuable in particular in the context of short timelines, to reduce catastrophic or existential risks from beyond-human level AI systems in the near-term.
Excellent writing exists on this and related topics from established practitioners. This event aimed to test whether a hybrid symposium with structured presentations could complement the existing discourse and available online forums and facilitate researcher and founder connections and conversations.
While there are many lenses and perspectives on achieving accelerated AI safety effectiveness for managing x-risk in short timeline scenarios, the event agenda was driven by three themes that may point to key acceleration enablers:
- Start with the right ideas: Can we focus on those safety interventions that reduce catastrophic/existential risk from beyond-human level AI the most, and how could we quantify and automate such a selection?
- Mature/validate ideas fast: Can we automate safety R&D work to mature the best safety interventions faster towards implementation readiness, given resource and talent constraints?
- Implement ideas in frontier AI: Can we achieve implementation of the best matured safety interventions in frontier model development pipelines, to realize actual reduction of catastrophic/existential risk from near-term AI?
Presentations and discussions of the symposium described in this post primarily covered the first two items. The third item remains critical but was largely outside of the focus on technical methods that this event applied.
Presentation Summaries
Each speaker at the symposium brought distinct perspectives on AI safety acceleration themes. Below are summaries and key take-aways, with links to full materials for more detailed information.
1. Opening and Introduction on AI Safety Acceleration Methods
- Speaker: Martin Leitgab, Independent, martin.leitgab@gmail.com
- Slide link: Here
- Summary:
- How likely is near-term powerful AI, and what is its catastrophic/existential risk?
- AI capabilities in various domains have been advancing rapidly, from elementary school level (GPT-3) in 2020 to beyond PhD level in 2025- what about next five years?
- Very few systematic AI x-risk estimates exist, but indicate that AI x-risk may be distinctively higher than other x-risk (e.g. nuclear war x-risk)
- Is AI safety on track to limit x-risk from near-term powerful AI?
- Current research literature indicates that several AI safety approaches may not effectively limit x-risk beyond-human level AI, while misaligned behaviors continue to emerge and scale
- New/breakthrough or other/neglected ideas may need to be pursued that scale to beyond-human level AI
- AI safety progress acceleration domains include:
- 1. Finding the most effective x-risk interventions for beyond-human level AI
- 2. Maturing those fast to a deployable state
- 3. Achieving their implementation in frontier model company development pipelines.
The first three of the following presentations covered technical acceleration topics on item/domain #1, on finding most effective safety interventions and related frameworks.
2. AI Safety Research Futarchy: Using Prediction Markets to Choose Research Projects for MARS
- Speaker: Jason Brown, Geodesic/University of Cambridge, jrb239@cam.ac.uk
- Slide link: Here, LessWrong post here
- Summary:
- Using Prediction Markets to Prioritize Research Projects
- Applying futarchy to select which AI safety projects MARS mentors pursue for best impact, aggregating community knowledge across 10+ research ideas from mentor groups
- Predictions on multiple success metrics: LessWrong upvotes, arXiv publications/citations, and top-tier conference acceptances
- Prediction markets incentivize diligent analysis of participants to select most impactful research ideas- unlocking profit by trading on accurate predictions of research outcomes
- Markets closed after the symposium on Monday 10/13/2025
- Speaker: James Newport, Swift Centre, james@swiftcentre.org
- Slide link: Here
- Summary
- Why Use Forecasting for AI Risk
- Creates structured processes for identifying problems/risks/opportunities while facilitating transparent reasoning and incentivizing error reduction
- Assists in understanding predictiveness of real-world events/actions for better prioritization
- Technical Tools Available
- Prediction markets/forecasting platforms and Bayesian networks: Aggregate decentralized forecasts into single probabilities; map causal dependencies and propagate uncertainty through probability distributions
- Swift Centre application: Custom platform for rapidly eliciting, structuring, and visualizing group probability assessments
- Decision-Making Frameworks
- Dynamic Adaptive Decision Pathways: Designs sequence of low-regret actions with future decisions contingent on monitored tipping points as uncertainty reduces over time
- Example live forecast (open after event)
4. Towards Predicting X-risk Reduction of AI Safety Solution Candidates through an AI Preferences Proxy
- Speaker: Martin Leitgab, Independent, martin.leitgab@gmail.com
- Slide link: Here
- Summary:
- Create Measurement Proxy for X-risk in Loss of Control Scenario
- Based on few assumptions: AI scaling continues for 3-5 years; capabilities reach beyond-human levels and misalignment behaviors scale, leading to successful exfiltration
- Implementing measurement proxy as benchmark of AI preference rankings across Instrumental/Convergent, Pro-Human, and Anti-Human goals, via contextualized binary dilemmas
- Quantify X-Risk Reduction Potential of AI Safety Interventions
- Input: Extract logical paths how interventions address x-risks from AI safety literature (e.g. arXiv cs.AI papers, LessWrong articles)- Eleuther AI volunteer team built algorithm prototype on Alignment Research Dataset
- Forecasting/Simulator LLM reasons through benchmark changes assuming AI safety intervention is implemented- comparing predicted changes in preferences allows to identify safety interventions with highest x-risk reduction potential
The next two presentations cover technical acceleration topics on item/domain #2, automating the maturation of safety interventions through the safety R&D workflow.
5. Automating the AI/ML Safety Research Workflow- Challenges and Approaches
- Speakers: Ronak Mehta and Jacques Thibodeau, Coordinal Research, contact@coordinal.org
- Slide link: Here
- Summary
- Benchmark Under-Elicitation Problem vs Real Capabilities
- Most AI R&D benchmarks (like METR) use single autonomous agent scaffolds, but not able to elicit and leverage available capabilities
- Real deployments in labs use high-performing multi-agent systems with e.g. type-safe outputs, clean context management, proper error recovery
- Automation Timelines Perception Skewed
- R&D automation progress moves much faster through multi-agent systems than single-agent metrics indicate
- AI safety automation can leverage improved performance but needs to be done with awareness of risks
- Automated Alignment Risks & Defense-in-Depth
- Core issues and possible automation bottlenecks:
- Alignment work may be more difficult than capability work e.g. due to lack of verification ground truth and hard problems relating to values
- Model situational/evaluation awareness that may lead to scheming that is hard to detect (e.g. subtle research sabotage)
- Manage risks by e.g. Responsible Automation Policy with Defense-in-Depth
- Conditional safety layers between automated always-on (e.g. deception probes) and triggered-upon-suspicion(e.g. resampling, debate) layers, and rare high-cost human oversight layer
- AI safety automation opportunities: Trusted codebases, mixed model deployment, strong-to-weak handoff, detailed automation inputs (specs, schedules), and AI dealmaking considerations
6. Model Evaluation Automation- Technical Challenges and How to Make Progress
- Speaker: Justin Olive, Arcadia Impact, justin@arcadiaimpact.org
- Material: Here
- Summary:
- High-Quality Evaluations Are Essential But Resource-Constrained
- AI evaluations inform decision-making, underpin governance frameworks (AI Act, RSPs), and support AI safety research (assessing risky capabilities/propensities, testing solution efficacy)
- Costly to develop and maintain; limited resources constrain achievable quality and coverage (see The Evals Gap - Apollo Research)
- Example: Inspect Evals aims to provide centralized library of 90+ open-source benchmarks; upholding quality, reliability, and configurability standards is very resource-intensive
- Automation Through Standardization as Solution
- Can only high-quality evals with automated eval development and maintenance given resource constraints
- First step is standardization applied across evals- standard serves as source of truth guiding coding agents and verifying their outputs
- Ideal outcome: Coding agents run continuously using standard as reference to prioritize tasks, execute them, and self-assess work quality
7. Meeting Closeout
- Speaker: Martin Leitgab, Independent, martin.leitgab@gmail.com
- Slide link: Here
- Summary:
- Non-technical and technical methods are critical complementary components for success
- Funding, talent pipelines/field-building, and governance are needed to enable success of technical acceleration methods
Beyond The Talks- Discussion Themes
During Q&A and post-event conversations, several questions emerged that may be productive focal points for future events:
- Measurement problem A: How can we quantify x-risk? E.g. can we build top-level threat models/safety cases that can be calibrated with specific probability ranges for overall AI catastrophic or x-risk?
- Measurement problem B: How can we quantify x-risk reduction by safety interventions? How good are proxies like research taste or prediction markets for effective x-risk reduction, and what are their limitations?
- Automation trade-offs: How do we balance automation benefits against possible differential capability acceleration?
Looking Ahead
This event aimed to serve researchers and founders working on acceleration methods by providing a hybrid symposium venue to present their work and coordinate. I am thankful for the speakers who took time to share their work, and for everyone who attended the meeting in-person and virtually.
Attendance levels >50 and attendee feedback after the event suggest interest in continuing this type of format. Building on this initial experiment, future events will aim to reach more researchers, founders, funders, and forward thinkers in this domain. The goal will remain the same- to provide a structured forum for coordinating the mitigation of catastrophic or existential risk from beyond-human level AI systems that may emerge in the near term.
- Follow-on events are planned at this point around e.g. EAG Bay Area in February 2026, with focus on:
- Expanding reach for more diverse approaches and coordination of practitioners
- Deeper dives into next steps and challenges for implementing acceleration opportunities
- How to get involved:
- If you work on technical acceleration methods and are interested in contributing or collaborating on future versions of this event, please connect at martin.leitgab@gmail.com .
- What did we miss? What would make future events more valuable? Please comment so this effort can serve the community better.
Note: I organized this event independently. I will be joining a new employer later this month, however this work was done in my personal capacity.
All errors, misquotes of speaker material, and similar are entirely my own. Please let me know if you see any so I can fix the issue!