In this talk, Talha Paracha will present insights from his latest research on using language models for improving software security ("Hallucinating Certificates", to appear at ICSE 2026).
Certificate validation is a crucial step in Transport Layer Security (TLS), the de facto standard network security protocol. Prior research has shown that differentially testing TLS implementations with synthetic certificates can reveal critical security issues, such as accidentally accepting untrusted certificates.
Paracha et al. introduce a new approach, MLCerts, to generate synthetic certificates that leverages generative language models to more extensively test software implementations. Recently, these models have become (in)famous for their applications in generating content, writing code, and conversing with users, as well as for "hallucinating" syntactically correct yet semantically nonsensical output. The authors leverage two novel insights in their work:...
Jackson Kaunismaa presents his new paper “Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs”. He will discuss why output-level safeguards on frontier models don’t actually make the ecosystem safe, and how anyone with an open-source model can fine-tune it on adjacent-domain outputs from safeguarded models to recover a large fraction of the capability gap between open-source and frontier models on harmful tasks.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
Tech companies are spending $400 billion a year on AI infrastructure — more than the Apollo program, every ten months. OpenAI is projected to lose $116 billion before turning a profit. Industry leaders are calling it a bubble while simultaneously spending hundreds of billions more. So: is AI a bubble, or is it the next internet?
We'll be discussing two pieces that reach different conclusions from similar evidence. Peter Wildeford's "AI is probably not a bubble" argues the fundamentals are sound and puts the chance of a major correction at ~30%. Derek Thompson's "This Is How the AI Bubble Will Pop" argues the crash is more or less inevitable — but that the technology still wins long-term.
We'll spend the first forty minutes reading both articles independently and then split into small groups to discuss. If you've already read the articles, feel free to come by after 7pm to join the discussions.
Food will be served.
Ryan Faulkner explores various papers that address cooperation and safety in multi-agent LLM simulations. Some of the core topics will include:
This research also reveals that LLMs adapt their behavior based on awareness of their conversational partner's identity.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
In February 2026, the Pentagon threatened to cancel Anthropic's $200M contract and label the company a "supply chain risk" unless they dropped restrictions on how Claude could be used. Anthropic refused. Leo Zovic will walk through what happened and what it means for AI companies operating at the intersection of ethics and government power.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
Future AI systems could have action-guiding and learning systems which may plausibly resemble those of current human or animal brains. What might it take to make these systems safe? In this presentation, David Atanasov will talk about the work from people who have thought about aligning these hypothetical future systems.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
In this talk, Kristy Loke (MATS research fellow) will discuss:
1. How did China come to lead in open-weight LLMs?
2. How are various jurisdictions’ (such as the EU and the US’s) interest in open-weight models changing?
3. What technical and global governance actions ought to be taken if we want to make sure the open-weight AI space continues to thrive?
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
Diana Issatayeva will provide an overview of the physical network architectures underling AI training and inference. Then, we will explore how these architectures influence governance options.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
This is part of our weekly AI Safety Thursdays series. Join us in examining questions like:
Zhijing Jin will be joining us to discuss her AI lab's entire research agenda, including several recent project lines across adversarial defence, interpretability, and democracy defence.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
This is part of our weekly AI Safety Thursdays series. Join us in examining questions like:
Unstoppable agents? Digital drugs? Robo-religions?
Moltbook is an internet forum designed exclusively for artificial intelligence agents. Some see it as AI theatre, others as a warning shot for autonomous agents.
In this talk on Moltbook and related phenomena, Giles Edkins looks at the hype, the reality, and what it might mean for the future.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
Agent swarms can complete multi-month to year software development tasks. What does this mean for timelines? Julian Moncarz provides an overview of safety risks that emerge when agents work in swarms.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't make it in person, feel free to join the live stream starting at 6:30 pm, via this link.
This is part of our weekly AI Safety Thursdays series. Join us in examining questions like:
This event will take place on the twelfth floor. If you have any problems getting upstairs, call or text Georgia at 519 981 0360.
Who's governing the AI your doctor trusts?
Healthcare AI is crossing thresholds faster than governance can keep pace. In Utah, an AI is now legally prescribing medication without a doctor. In January, OpenAI launched ChatGPT Health, and Anthropic released Claude for Healthcare. AI companies are benchmarking their models against expert clinicians' performance and rapidly closing the gap—and, in some cases, surpassing them. Meanwhile, Elon Musk is encouraging users to upload medical imaging to Grok for analysis.
This talk will explore how today's fragmented regulatory approach creates systemic vulnerabilities that could amplify as systems become more capable. Pascal Thibeault examine the growing chasm between AI deployment and AI governance in clinical care, and consider which governance frameworks might help prepare us for a future where AI systems operate with increasing...
Kathrin Gardhouse will present on her recent paper with the same title that discusses various governance challenges that AI agents pose and the adequacy of the EU AI Act's response to these challenges, including the institutional implementation, ie, the self-regulation approach, distributed enforcement, and institutional capacity and resourcing.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
This is part of our weekly AI Policy Tuesdays series. Join us in examining questions like:
Continual learning is a core ability behind humans’ ability to perform economically valuable tasks. As frontier AI companies train LLM agents to be increasingly autonomous, they are likely aggressively pursuing approaches to continual learning in LLMs. In this talk, Rohan Subramani and Rauno Arike discuss the potential implications of this development on the capabilities and of LLM agents and its effect on AI safety and alignment.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't make it in person, feel free to join the live stream starting at 6:30 pm, via this link.
This is part of our weekly AI Safety Thursdays series. Join us in examining questions like:
If there is no-one in the lobby to escort you past security when you arrive, please call Georgia at 519 981 0360.
At the 2026 World Economic Forum, Demis Hassabis, CEO of Google DeepMind; Dario Amodei, CEO of Anthropic; and Zanny Minton Beddoes, Editor-in-Chief of The Economist, met to discuss AGI risks, governance, and global impact.
We will be meeting to watch the discussion in full. Afterwards, we will split off into small groups to discuss the video. If you've already seen the video, you can feel free to arrive at 7:00 to jump into discussion.
Food will be served.
If you arrive and nobody is in the lobby to help you past security, please call or text Georgia at 519 981 0360.
Kathrin Gardhouse presents her draft paper on how liability insurance could function as a form of private regulation for frontier AI, translating catastrophic risk into enforceable safety standards rather than box-ticking compliance.
The talk outlines a proposed “minimum insurability pathway” for AI developers and explores whether and how a narrow, restrictive insurance mandate could meaningfully reduce catastrophic AI risks while complementing public regulation.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
Adversarial robustness remains a key concern in AI safety, with many interventions focusing on mitigating models’ capabilities to assist in harmful or criminal tasks. But how do LLMs behave in sociopolitical contexts, especially when faced with ambiguity?
Punya Syon Pandey will discuss research on accidental vulnerabilities induced by fine-tuning, and introduce new methods to measure sociopolitical robustness, highlighting broader implications for safe societal integration.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
Recent research from Anthropic and Redwood Research has shown that "reward hacking" is more than just a nuisance: it can be a seed for broader misalignment.
Evgenii Opryshko explores how models that learn to exploit vulnerabilities in coding environments can generalize to concerning capabilities, such as unprompted alignment faking and cooperating with malicious actors.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
LLMs are shifting the cybersecurity balance—in favour of attackers.
The barrier to sophisticated cyberattacks is dropping fast. What once required large teams and significant budgets can now be accomplished by smaller groups with moderate expertise. Personalized phishing campaigns that took weeks now take hours. Zero-day vulnerabilities that took a syndicate to detect are now discovered by independent actors.
While these attacks still require technical knowledge, the pool of capable threat actors and scale of impact are rapidly expanding. While LLM-powered attacks are proliferating, but cutting-edge defences are only accessible to well-resourced frontier labs—leaving most organizations exposed. Diana Sarbakysh (Network Engineer, and alumni of first AI security bootcamp) will:
This talk is a call to action for corporate practitioners, security professionals, researchers, and anyone interested in improving the state of defences at large.
As AI advances and impacts cybersecurity, the evolution of the offence-defence balance will have profound implications.
Leo Zovic provides an update on his work to deploy AI agents to detect bugs, with potentially widespread impacts on defensive hardening of code at scale.
Registration Instructions
This is a paid event ($5 general admission, free for students & job seekers) with limited tickets - you must RSVP on Luma to secure your spot.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
This is part of our weekly AI Safety Thursdays series. Join us in examining questions like: