The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

[-]Ariel_6mo910

FYI I wouldn't say at all that AI safety is under-represented in the EU (if anything, it would be easier to argue the opposite). Many safety orgs (including mine) supported the Codes of Practice, and almost all the Chairs and vice chairs are respected governance researchers. But probably still good for people to give feedback, just don't want to give the impression that this is neglected.

Also no public mention of intention to sign the code has been made as far as I know. Though apart from copyright section, most of it is in line with RSPs, which makes signing more reasonable.

[-]Katalina Hernandez6mo10

Thank you, Ariel! I guess I've let my personal opinion shine through. I do not see many regulatory efforts in general on verbalizing alignment or interpretability necessities or translating them into actionable compliance requirements. The AI Act mentions alignment vaguely, for example.

And as far as I saw, the third draft of the Codes (Safety & Security) mentions alignment / misalignment with a "this may be relevant to include" tone rather than providing specifics as to how GenAI providers are expected to document individual misalignment risks and appropriate mitigation strategies.

And interpretability / mech interp is not mentioned at all, not even in the context of model explainability or transparency.

This is why I hoped to see feedback from this community: to know whether I am over-estimating my concerns.

[-]Ariel_6mo30

I'd suggest updating the language in the post to clarify things and not overstate :)

Regarding the 3rd draft - opinions varied between people I work with but we are generally happy. Loss of Control is included in the selected systemic risks, as well as CBRN. Appendix 1.2 also has useful things, though some valid concerns got raised there on compatibility with the AI Act language that still need tweaking (possobly merging parts of 1.2 into selected systemic risks). As far as interpretability - the code is meant to be outcome based, and the main reason evals are mentioned is that they are in the act. Prescribing interpretability isn't something the code can do, and also probably shouldn't as these techniques arent good enough yet to be prescribed as mandatory for mitigating systemic risks.

[-]Katalina Hernandez6mo10

I was reading "The Urgency of Interpretability" by Dario Amodei, and the following part made me think about our discussion.

"Second, governments can use light-touch rules to encourage the development of interpretability research and its application to addressing problems with frontier AI models. Given how nascent and undeveloped the practice of “AI MRI” is, it should be clear why it doesn’t make sense to regulate or mandate that companies conduct them, at least at this stage: it’s not even clear what a prospective law should ask companies to do. But a requirement for companies to transparently disclose their safety and security practices (their Responsible Scaling Policy, or RSP, and its execution), including how they’re using interpretability to test models before release, would allow companies to learn from each other while also making clear who is behaving more responsibly, fostering a “race to the top”."

Although I agree with you that (at this stage), regulatory requirement to disclose interpretability techniques used to test models before release would not be very useful for for outcome-based CoPs.

But I hope that there is a path forward for this approach in the near future.

[-]Katalina Hernandez6mo10

Thanks a lot for your follow up. I'd love to connect on LinkedIn if that's okay, I'm very grateful for your feedback!

I'd say: "I believe that more feedback from alignment and interpretability researchers is needed" instead. Thoughts?

[-]Ariel_6mo20

Sure! and yeah regarding edits - I have not gone through the full request for feedback yet, I expect to have a better sense late next week of which contributions are most needed and how to prioritize. I mainly wanted to comment first on obvious things that stood out to me from the post.

There is also an Evals workshop in Brussels on Monday where we might learn more. I've know of some some non-EU based technical safety researchers who are attending, which is great to see.

[-]Tenoke6mo30

Thanks for sharing this. Based on the About page, my 'vote' as a EU citizen working in an ML/AI position could conceivably count for a little more, so it seems worth doing it. I'll put it in my backlog and aim to get to it on time (it does seem like a lengthy task).

[-]Katalina Hernandez6mo30

It will probably be lengthy but thank you very much for contributing! DM me if you come across any "legal" question about the AI Act :).

[-]Katalina Hernandez6mo10

What AI safety researchers should weigh in on:
Whether training compute is a sufficient proxy for generality or risk, and what better metrics might exist.
How to define and detect emergent capabilities that warrant reclassification or new evaluations.
What kinds of model evaluations or interpretability audits should qualify as systemic risk mitigation.
How downstream fine-tuning workflows (RLHF, scaffolding, etc.) may create latent alignment risk even at moderate compute scales.
How the Code of Practice could embed meaningful safety standards beyond documentation. E.g., through commitments to research mechanistic transparency, continuous monitoring, and post-deployment control mechanisms.

This is where I'd personally hope that anyone giving feedback focuses their attention.
Even just to bring the right strategies to the attention of policy spheres.

Please, be thorough. I've provided a breakdown of the main points in the Targeted consultation document, but I'd recommend looking at the Safety and Security section of the Third draft of the General-Purpose AI Code of Practice.

It can be downloaded here: Third Draft of the General-Purpose AI Code of Practice published, written by independent experts | Shaping Europe’s digital future

[-]artemium6mo10

In an ideal world, well meaning regulation coming from EU could become a global standard and really make a difference. However, in reality, I see little value in EU-specific regulations like these. They are unlikely to impact frontier AI companies such as OpenAI, Anthropic, Google DeepMind, xAI, and DeepSeek, all of which are based outside the EU. These firms might accept the cost of exiting the EU market if regulations become too burdensome.

While the EU market is significant, in a fast-takeoff, winner-takes-all AI race (as outlined in the AI-2027 forecast), market access alone may not sway these companies’ safety policies. Worse, such regulations could backfire, locking the EU out of advanced AI models and crippling its competitiveness. This could deter other nations from adopting similar rules, further isolating the EU.

As an EU citizen, I view the game theory in an "AGI-soon" world as follows:

Alignment Hard
EU imposes strict AI regulations → Frontier companies exit the EU or withhold their latest models, continuing the AI race → Unaligned AI emerges, potentially catastrophic for all, including Europeans. Regulations prove futile.

Alignment Easy
EU imposes strict AI regulations → Frontier companies exit the EU, continuing the AI race → Aligned AI creates a utopia elsewhere (e.g., the US), while the EU lags, stuck in a technological "stone age."

Both scenarios are grim for Europe.

I could be mistaken, but the current US administration and leaders of top AI labs seem fully committed to a cutthroat AGI race, as articulated in situational awareness narratives. They appear prepared to go to extraordinary lengths to maintain supremacy, undeterred by EU demands. Their primary constraints are compute and, soon, energy - not money! If AI becomes a national security priority, access to near-infinite resources could render EU market losses a minor inconvenience. Notably, the comprehensive AI-2027 forecast barely mentions Europe, underscoring its diminishing relevance.

For the EU to remain significant, I see two viable strategies:

Full integration with US AI efforts, securing a guarantee of equal benefits from aligned superintelligence. This could also give EU AI safety labs a seat at the table for alignment discussions.
Develop an autonomous EU AI leader, excelling in capabilities and alignment research to negotiate with the US and China as an equal. This would demand a drastic policy shift, massive investment in data centers and nuclear power, and deregulation, likely unrealistic in the short term.

[-]Katalina Hernandez6mo10

OpenAI, Anthropic and Google DeepMind are the main signatories already to these Codes of Practice.

So, whatever is agreed / negotiated is what will impact frontier AI companies. That is the problem.

I'd love to see specific criticisms from you on sections 3, 4 or 5 of this post! I am happy to provide feedback myself based on useful suggestions that come up in this thread.

[-]Henry Papadatos6mo30

Do you have any public evidence that OpenAI, Anthropic and Google DeepMind will sign?

From my perspective, this remains uncertain and will likely depend on several factors, including the position of the US government on this, and the final code's content (particularly regarding unpopular measures among companies like the independent third-party assessment in measure 11).

[-]Katalina Hernandez6mo*10

My understanding is that they expressed willingness to sign, but lobbying efforts on their side are still ongoing, as is the entire negotiation.

The only big provider I've heard that explicitly refused to sign is Meta: EIPA in Conversation WIth - Preparing for the EU GPAI Codes of Practice (somewhere from minute 34 to 38).

^{^}

LESSWRONG
LW

LESSWRONG
LW

62

The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

62

62

What AI safety researchers should weigh in on:

Why this matters for AI Safety:

Purpose of this post:

TL;DR

What the GPAI Codes of Practice will actually Regulate

What AI safety researchers should weigh in on^[1]:

1. Content of the Guidelines

2. What counts as a General-Purpose AI Model ?

2.1 Conditions for Sufficient Generality and Capabilities

2.2 Differentiation Between Distinct Models and Model Versions

Why this matters for AI Safety:

3. What counts as a Provider of a General-Purpose AI Model ?

3.1 What Triggers Provider Status?

Downstream Modifiers as Providers

3.2 GPAI with Systemic Risk: Stricter Thresholds

4. Exemptions for Open-Source models

5. Estimating Compute: The First Scalable Safety Trigger

5.1 How to Estimate Compute

5.2 What "Counts" Toward Cumulative Compute

6. Other Legal & Enforcement details

62

The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

62

62

What AI safety researchers should weigh in on:

Why this matters for AI Safety:

Purpose of this post:

TL;DR

What the GPAI Codes of Practice will actually Regulate

What AI safety researchers should weigh in on[1]:

1. Content of the Guidelines

2. What counts as a General-Purpose AI Model ?

2.1 Conditions for Sufficient Generality and Capabilities

2.2 Differentiation Between Distinct Models and Model Versions

Why this matters for AI Safety:

3. What counts as a Provider of a General-Purpose AI Model ?

3.1 What Triggers Provider Status?

Downstream Modifiers as Providers

3.2 GPAI with Systemic Risk: Stricter Thresholds

4. Exemptions for Open-Source models

5. Estimating Compute: The First Scalable Safety Trigger

5.1 How to Estimate Compute

5.2 What "Counts" Toward Cumulative Compute

6. Other Legal & Enforcement details

What AI safety researchers should weigh in on^[1]: