LESSWRONG
is fundraising!
LW

Musings from a Lawyer turned AI Safety researcher (ShortForm) — LessWrong

Musings from a Lawyer turned AI Safety researcher (ShortForm)

3rd Mar 2025

2 min read

1

This is a special post for quick takes by Katalina Hernandez. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

I will use the Shortform to link my posts and Quick Takes.

LongForm post for debate: The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

The European AI Office is currently finalizing Codes of Practice to regulate how general-purpose AI (GPAI) models be governed under the EU AI Act. They are explicitly asking for global expert input, and feedback is open to anyone, not just EU citizens.

The guidelines in development will shape:

The definition of "systemic risk"
How training compute triggers obligations
When fine-tuners or downstream actors become legally responsible
What counts as sufficient transparency, evaluation, and risk mitigation

Major labs (OpenAI, Anthropic, Google DeepMind) have already expressed willingness to sign the upcoming Codes of Practice. These codes will likely become the default enforcement standard across the EU and possibly beyond.

So far, AI safety perspectives are seriously underrepresented.

Without strong input from AI Safety researchers and technical AI Governance experts, these rules could lock in shallow compliance norms (mostly centered on copyright or reputational risk) while missing core challenges around interpretability, loss of control, and emergent capabilities.

I’ve written a detailed Longform post breaking down exactly what’s being proposed, where input is most needed, and how you can engage.

Even if you don’t have policy experience, your technical insight could shape how safety is operationalized at scale.

📅 Feedback is open until 22 May 2025, 12:00 CET
🗳️ Submit your response here

Happy to connect with anyone individually for help drafting meaningful feedback.

Quick Take: "Alignment with human intent" explicitly mentioned in European law

The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.

The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.

As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.

It’s buried in Recital 110, but it’s there.

And it also makes research on AI Control relevant:

"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".

The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.

This means that alignment is now part of the EU’s regulatory vocabulary.

LongForm post for debate: For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance

𝐓𝐋;𝐃𝐑

I understand that Safety and Security are two sides of the same coin.

But if we don’t clearly articulate 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐧𝐭 𝐛𝐞𝐡𝐢𝐧𝐝 𝐀𝐈 𝐬𝐚𝐟𝐞𝐭𝐲 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧𝐬, we risk misallocating stakeholder responsibilities when defining best practices or regulatory standards.

For instance, a provider might point to adversarial robustness testing as evidence of “safety” compliance: when in fact, the measure only hardens the model against 𝐞𝐱𝐭𝐞𝐫𝐧𝐚𝐥 𝐭𝐡𝐫𝐞𝐚𝐭𝐬 (security), without addressing the internal model behaviors that could still cause harm to users.

𝐈𝐟 𝐫𝐞𝐠𝐮𝐥𝐚𝐭𝐨𝐫𝐬 𝐜𝐨𝐧𝐟𝐥𝐚𝐭𝐞 𝐭𝐡𝐞𝐬𝐞, 𝐡𝐢𝐠𝐡-𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐥𝐚𝐛𝐬 𝐦𝐢𝐠𝐡𝐭 "𝐦𝐞𝐞𝐭 𝐭𝐡𝐞 𝐥𝐞𝐭𝐭𝐞𝐫 𝐨𝐟 𝐭𝐡𝐞 𝐥𝐚𝐰" 𝐰𝐡𝐢𝐥𝐞 𝐛𝐲𝐩𝐚𝐬𝐬𝐢𝐧𝐠 𝐭𝐡𝐞 𝐬𝐩𝐢𝐫𝐢𝐭 𝐨𝐟 𝐬𝐚𝐟𝐞𝐭𝐲 𝐚𝐥𝐭𝐨𝐠𝐞𝐭𝐡𝐞𝐫.

Opinion Post: Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated?

Should we even expect regulation to be useful for AI safety?
Is there a version of AI regulation that wouldn’t be performative?
How do you see the "Brussels effect" playing out for AI Safety?
Are regulatory sandboxes a step in the right direction?

Musings from a Lawyer turned AI Safety researcher (ShortForm)

61 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:06 PM

[-]Katalina Hernandez3mo6911

Luiza Jarovsky just endorsed IABIED. This is actually significant.

Luiza Jarovsky is one of the most influential people in the corporate AI Governance space right now: her newsletter has 80,000+ subscribers (mostly lawyers in the Data and Tech space) and has trained 1,300+ professionals from Google, Amazon, Microsoft, Meta, Apple, and the likes.

Her audience are basically compliance lawyers who think AI safety means "don't be racist" not "don't kill everyone." For her to recommend IABIED to that network is a non-trivial update on Overton window movement. These people literally sit in deployment decision meetings at Fortune 500s.

Corporate governance crowd is normally immune to longtermist args, but IABIED is cracking that.

[-]Ben Pace3mo387

I thought this meant she endorsed the title as her beliefs, so in case anyone else was confused, she has endorsed the book as worth reading, but not (to my knowledge) endorsed its thesis as "what I believe".

Here's what she writes:

Although, in general, I disagree with catastrophic framings of AI risk (which have been exploited by AI CEOs to increase interest in their products, as I recently wrote in my newsletter), the AI safety debate is an important one, and it concerns all of us.

There are differing opinions on the current path of AI development and its possible futures. There are also various gray zones and unanswered questions on possible ways to mitigate risk and avoid harm.

Yudkowsky has been researching AI alignment for over 20 years, and together with Soares, he has built a strong argument for why AI safety concerns are urgent and why action is needed now. Whether you agree with their tone or not, their book is worth reading.

Seems that she finds Eliezer+Nate credible in their concerns because they are not AI company CEOs but have been working on the problem for 2 decades.

[-]Taylor G. Lunt3mo135

Wow, those LinkedIn comments are ignorant. That's a level of basic, uninformed stupidity I haven't been exposed to in a long time.

I’ll tell you why this is wrong. Throughout history the most educated people have sort peace, it’s the least educated and most ignorant that have caused most wars

Many comments of this caliber. Setting aside the fact this is obviously wrong, and the fact that you shouldn't be commenting on a book without having read a single word of it, and all the other negative things you can say, this makes me wonder how much of a bubble I'm living in. Is this really how people in general think?

I normally frequent Hacker News and LessWrong and so on. I naturally stay away from places like LinkedIn because they make me cringe. I wonder if that instinct has been giving me a warped view of humanity.

I wonder if there would be any value in going into such less-enlightened spaces and fighting the good fight, debating people like a 2000s-era atheist. It seems to have mostly worked out for the atheists.

[-]Viliam3mo158

Is this really how people in general think?

Yes, I think this is considered standard outside the rationalist community. Strong opinions, zero evidence, sometimes statements that contradict facts that should be common knowledge, the general vibe of "I am very smart and everyone who disagrees with me is an idiot and will be attacked verbally".

It is so easy to forget when you are inside the rationalist bubble. But I think it is a reason why some people are so attracted to the rationalist bubble (even if they may not care about rationality per se, e.g. many ACX readers).

Hacker News is much better than the average, but even there I often find factually wrong trivially verifiable statements that remain unchallenged and upvoted as long as they have the right vibes. Even if I reply with a short factual correction and link the evidence, no one seems to care.

Many frequent users of LinkedIn are managers, and those are subject to specific selection pressures. I understand that comment as a public reminder for Luiza Jarovsky that she is outside the Overton window. Translated to autistic speech, the comment says: "High-status people propose solutions, low-status people complain about problems. The book mostly talks about problems, therefore it is low-status and you should not associate with it."

EDIT: Oh, more horrible comments:

A: I would recommend doing some research on the authors before advertising their books… just a thought
B: Can you tell us what our findings then would be? With links, perhaps?
A: you’re the editor and researcher right? You’ll find plenty of reasons why this shouldn’t be endorsed.

This would get an instant ban on ACX. The usual frustrating "there is a problem with your argument, but I am not telling you what it is". Like a monkey throwing feces.

capitalism already killed humans

Oh sure, that's why you are still typing. What kind of discourse is possible with people who can't distinguish between a literal meaning and a metaphor?

(Compared to this, comments like "The tool cannot outpace the source" are at least honest arguments.)

[-]Kabir Kumar3mo51

wonder if there would be any value in going into such less-enlightened spaces and fighting the good fight, debating people like a 2000s-era atheist. It seems to have mostly worked out for the atheists.

Not in a debatey way, but in an informative way, yes. Pretty easy too

[-]Taylor G. Lunt3mo10

You have done so?

[-]Kabir Kumar3mo30

yes. https://www.linkedin.com/feed/update/urn:li:activity:7381791560054546432/?commentUrn=urn%3Ali%3Acomment%3A(activity%3A7381791560054546432%2C7381807819039088640)&dashCommentUrn=urn%3Ali%3Afsd_comment%3A(7381807819039088640%2Curn%3Ali%3Aactivity%3A7381791560054546432)&dashReplyUrn=urn%3Ali%3Afsd_comment%3A(7381818125865943040%2Curn%3Ali%3Aactivity%3A7381791560054546432)&replyUrn=urn%3Ali%3Acomment%3A(activity%3A7381791560054546432%2C7381818125865943040)

[-]Taylor G. Lunt3mo10

Nice. It seems like the comment was pretty much ignored though, which is a tactic I think normal people often use with comments they don't like.

[-]Kabir Kumar2mo41

Not really - the guy DMed me and we had a call. He wanted to learn more. Also, talked about meeting up in Washington, when I host an event there.

[-]Taylor G. Lunt2mo20

Wow, that's awesome. I've never had someone I refuted online try to schedule a call with me.

[-]Kabir Kumar2mo40

Huh. I think I've had at least 3. And done so myself for at least 2 people who refuted me

[-]samuelshadrach3mo10

gpt-5 can read IABIED, you can just ask gpt-5 to predict what Eliezer's replies to these comments would be and send those. Mikhael Samin has a chatbot for this, Davey Morse used to host one if I remember, there are probably others.

[-]O O3mo73

She is? She just seems like a standard LinkedIn grifter.

[-]Katalina Hernandez3mo*62

Perhaps this is a better reference point: https://academy.aitechprivacy.com/ai-governance-training

Her academy is very, very popular among Tech DPOs and lawyers (data protection officers). I am not saying she isn't a typical Linkedin influencer.

But her posting about something, in the data protection and corp AI Governance ^[1]world (in terms of influence), is akin to us seeing Zvi Mowshowitz post about something. Does this make sense?

^{^}
DPOs are mostly also doing AI Governance in the tech and tech adjacent industry now. I should do a post about this because it may be part of the problem soon.

[-]O O3mo10

I'm not in law, but it seems more like an online course trying to be sold to me than a real conference. There is a long list of company logos, a bunch of credits and certifications promised, and a large blob of customer testimonials.

Did some quick googling and an actual conference would look like this. https://www.lsuite.co/techgc

I'm surprised this comment has so many upvotes. Did anyone actually click the link?

[-]Viliam3mo73

I clicked the link. I usually don't read LinkedIn, but I think the things that rub you (and me) the wrong way are simply how the LinkedIn power users communicate normally. Their bubble is not ours.

Seems to me that Luiza Jarovsky made a big step outside the Overton window (and a few replies called her out on that), which is as much as we could reasonably hope for. Mentioning the book as "important, although I disagree with the framing" is a huge improvement over the "low-status shit, don't touch that" current status.

[-]Katalina Hernandez3mo50

Thank you! You've managed to explain exactly what I thought when I saw this link. And re the LinkedIn comment - I'm actually surprised that people are surprised. I know people who post very high quality articles there, but mostly it's become slop land. The pattern I'm noticing is: LinkedIn writers who value quality slowly transitioning to Substack, and those in their audiences that want to think moving with them.

[-]Katalina Hernandez3mo30

It's not a conference, it's an online course and it's one of the most popular among privacy professionals. The most popular being those offered by IAPP (International Association of Privacy Professionals). She's legitimately that well regarded. I'd love to know where the disconnect is for you?

Although ironically, I was talking to my boyfriend about how people in law or compliance would have the same reaction ("what? This guy is important?") if I said so about Zvi and just linked his Substack XD. I guess different impressions in different communities.

[-]Caleb Biddulph3mo2-3

She does seem like a LinkedIn grifter, but if she's a popular LinkedIn grifter I guess this could mean something.

I'm not sure if important people at Fortune 500s are reading LinkedIn grifter newsletters. Or if Fortune 500s that aren't Alphabet or Nvidia are actually relevant for AI.

Maybe Luisa Jarovsky's recommendation is primarily important as an indicator that "normies" (who can vote, etc.) are aware of IABIED.

This is the 29th book Luisa has recommended for her "AI book club," so possibly she just needed something to recommend and IABIED is a recent AI book with a lot of marketing around it. And even in her recommendation, she mentions that she "disagrees with catastrophic framings of AI risk."

[-]Warty3mo33

she "disagrees with catastrophic framings of AI risk."

that's not very consistent with my understanding of the words "endorsed IABIED" from OP

[-]Katalina Hernandez3mo*302

This is what she says:

Why read:

Although, in general, I disagree with catastrophic framings of AI risk (which have been exploited by AI CEOs to increase interest in their products, as I recently wrote in my newsletter), the AI safety debate is an important one, and it concerns all of us.

There are differing opinions on the current path of AI development and its possible futures. There are also various gray zones and unanswered questions on possible ways to mitigate risk and avoid harm.

Yudkowsky has been researching AI alignment for over 20 years, and together with Soares, he has built a strong argument for why AI safety concerns are urgent and why action is needed now. Whether you agree with their tone or not, their book is worth reading.

[-]Katalina Hernandez4mo6810

The UN General Assembly just passed a resolution to set up an Independent Scientific Panel on Artificial Intelligence and an annual Global Dialogue on AI Governance.

40 experts will be chosen by an independent appointment committee, half nominated by UN states and half appointed by the Secretary-General.

As of 27 Aug 2025, the UN says it will run an open call for nominations and then the Secretary-General will recommend 40 names to the General Assembly. No names have been announced ^[1] yet.

Two caveats jump out:

The mandate is explicitly non-military only, so the domain where AI systems are most likely to be deployed recklessly is left out...
The outputs are to be “policy-relevant but non-prescriptive”, i.e. climate-style summaries rather than hard mandates.

Still, the commitment to “issue evidence-based scientific assessments synthesizing and analysing existing research” leaves a narrow window of hope.

If serious AI-safety experts are appointed, this panel can be a real venue for risk awareness and cross-border coordination on x-risk mitigation.

On the contrary, without clear guardrails on composition and scope, it risks becoming a “safety”-branded accelerator for capabilities.

^{^}
I'd expect Yoshua Bengio to be a top suggestion already, (among other reasons) as he recently led the Safety & Security chapter of the EU General Purpose AI Code of Practice.

[-]Thomas Kwa4mo*91

Item 3 has some constraints on members (emphasis mine):

Requests the Secretary-General to launch a published criteria-based open call and to recommend a list of 40 members of the Panel to be appointed by the General Assembly in a time-bound manner, on the basis of their outstanding expertise in artificial intelligence and related fields, an interdisciplinary perspective, and geographical and gender balance, taking into account candidacies from a broad representation of varying levels of technological development, including from developing countries, with due consideration to nominations from Member States and with no more than two selected candidates of the same nationality or affiliation and no employees of the United Nations system;

This means only two each from US, UK, China, etc. I wonder what the geographic and gender balance will actually look like; these will significantly influence the average expertise type and influence of members.

My guess is that x-risk mitigation will not be the primary focus at first just because over half of the experts are American and British men and there are so many other interests to represent. Nor would industry be heavily represented because it skews too American (and the document mentioned CoIs, and the 7 goals of the Dialogue are mostly not about frontier capabilities). But in the long term, unless takeoff is fast, developing countries will realize the US is marching towards DSA and interesting things could happen.

Edit: my guess for the composition would be similar to the existing High-level Advisory Body on Artificial Intelligence, with 39 members, of which

19 men, 20 women
15 academia/research (including 10 professors), 10 government, 4 from big tech and scaling labs, 10 other
17 Responsibility / Safety / Policy oversight positions, 22 other positions
Nationalities:
- 9 Americas (incl. 4 US), 11 Europe, 11 Asia (incl. 2 China), 1 Oceania, 5 Africa. They heavily overweighted Europe, which has 28% of the seats but only 9% of world population
- 19 high income countries, 24 LMIC
- gpt5 isn't sure about some of the dual nationality cases though
1 big name in x-risk (Jaan Tallinn)

[-]ChristianKl4mo124

On the flip side, this means that if we do know people who are AI experts but not from the US/EU/China, forwarding this information to them so that they can apply with a higher chance of being accepted might be valuable.

[-]Thomas Kwa4mo31

Agree, especially from developing countries without a strong preexisting stance on AI, where choices could be less biased towards experts who already have lots of prestige, and more weighted on merits + lobbying.

[-]M. Y. Zuo4mo21

Huh that is a really good point. There are way too many people with US/UK backgrounds to easily differentiate between the expert pretenders and the really substantial experts. It’s even getting harder to do so on LW for many topics as karma becomes less and less meaningful.

And I can’t imagine the secretary general’s office will have that much time to scrutinize each proposed candidate, so it might even be a positive thing overall.

[-]Katalina Hernandez4mo12

Exactly! Thank you for highlighting this.

[-]Katalina Hernandez4mo40

Yes, these are the usual selection criteria constrains for policy panels. And I agree that the vast majority of big names are US (some UK) based and male. But hey, there are lesser known voices in EU policy that care about AI Safety. But I do share your concern. I'll have the opportunity to ask about this at CAIDP at some point soon (Centre for AI and Digital Policy). I think many people would agree that it's a good opportunity to talk about AIS awareness in less involved members states...

[-]Katalina Hernandez8mo*561

The European AI Office is finalizing Codes of Practice that will define how general-purpose AI (GPAI) models are governed under the EU AI Act.

They are explicitly asking for global expert input, and feedback is open to anyone, not just EU citizens.

The guidelines in development will shape:

The definition of "systemic risk"
How training compute triggers obligations
When fine-tuners or downstream actors become legally responsible
What counts as sufficient transparency, evaluation, and risk mitigation

Major labs (OpenAI, Anthropic, Google DeepMind) have already expressed willingness to sign the upcoming Codes of Practice. These codes will likely become the default enforcement standard across the EU and possibly beyond.

I believe that more feedback from alignment and interpretability researchers is needed.

I’ve written a detailed Longform post breaking down exactly what’s being proposed, where input is most needed, and how you can engage.

Even if you don’t have policy experience, your technical insight could shape how safety is operationalized at scale.

📅 Feedback is open until 22 May 2025, 12:00 CET
🗳️ Submit your response here

Happy to connect with anyone individually for help drafting meaningful feedback.

[-]whestler8mo21

Thanks for posting and bringing attention to this! I have forwarded to my friend who works in AI safety.

[-]Katalina Hernandez8mo10

Thank you!!

[-]Katalina Hernandez4mo522

The European Commission Seeks Experts for Its AI Scientific Panel

The European Commission is now accepting applications for the Scientific Panel of Independent Experts, focusing on general-purpose AI (GPAI). This panel will support the enforcement of the AI Act, and forms part of the institutional scaffolding designed to ensure that GPAI oversight is anchored in technical and scientific legitimacy.

The panel will advise the EU AI Office and national authorities on:

Systemic risks associated with GPAI models
Classification of GPAI models (including systemic-risk designation)
Evaluation methods (benchmarks, red-teaming, risk analysis)
Cross-border market surveillance
Enforcement tools and alerts for emerging risks

This is the institutional embodiment of what many in this community have been asking for: real technical expertise informing regulatory decision-making.

Who Should Apply?

The Commission is selecting 60 experts for a renewable 24-month term.

Members are appointed in a personal capacity and must be fully independent of any GPAI provider (i.e. no employment, consulting, or financial interest). Names and declarations of interest will be made public.

Relevant expertise must include at least one of the following:

Model evaluation and risk analysis: Red-teaming, capability forecasting, adversarial testing, human uplift studies, deployment evaluations
Risk methodologies: Risk taxonomies, modeling, safety cases
Mitigations and governance: Fine-tuning protocols, watermarking, guardrails, safety processes, incident response, internal audits
Misuse and systemic deployment risks: CBRN risk, manipulation of populations, discrimination, safety/security threats, societal harms
Cyber offense risks: Autonomous cyber operations, zero-day exploitation, social engineering, threat scaling
Provider-side security: Prevention of leakage, theft, circumvention of safety controls
Emergent model risks: Deception, self-replication, miscoordination, alignment drift, long-horizon planning
Compute and infrastructure: Compute auditing, resource thresholds, verification protocols

Eligibility:

PhD in a relevant field OR equivalent track record
Proven scientific impact in AI/GPAI research or systemic-risk analysis
Must demonstrate full independence from GPAI providers

Citizenship: At least 80% of panel members must be from EU/EEA countries. The other 20% can be from anywhere, so international researchers are eligible.

Deadline: 14 September 2025

Application link: EU Survey – GPAI Expert Panel

Contact: EU-AI-SCIENTIFIC-PANEL@ec.europa.

Why It's Worth Considering

Even if you’re selected for the Scientific Panel, you are still allowed to carry out your own independent research as long as it is not for GPAI providers and you comply with confidentiality obligations.

This isn’t a full-time, 40-hour-per-week commitment. Members are assigned specific tasks on a periodic basis, with deadlines. If you’d like to know more before applying, I can direct you to the Commission’s contacts for clarification.

Serving on the panel enhances your visibility, sharpens your policy-relevant credentials, and helps fund your own work without forcing you to “sell your soul” or abandon independent projects.

[-]Lucie Philippon4mo215

Note that the documentation says they'll aim to recruit 1-3 nationals from each EU country (plus Norway, Iceland and Liechtenstein). As far as I understood, it does not require them to be living in their home country at the time of applying. Therefore, people from small European countries have especially good odds.

Note also that the gender balance goal would also increase the chance for any women applying.

[-]Lucie Philippon4mo20

Thanks for sharing! I've been sharing it with people working in AI Safety across Europe.

Do you know if anyone is doing a coordinated project to reach out to promising candidates from all EU countries and encourage them to apply? I'm worried that great candidates may not hear about it in time.

[-]Katalina Hernandez4mo10

Hi, Lucie! Not to my knowledge. I have only seen this advertised by people like Risto Uuk or Jonas Schuett in their newsletters, and informally mentioned in events by people who currently work in the AI Office. But I am not aware of efforts to reach out to specific candidates.

[-]Katalina Hernandez8mo*515

The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.

The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.

As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.

It’s buried in Recital 110, but it’s there. And it also makes research on AI Control relevant:

The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.

This means that alignment is now part of the EU’s regulatory vocabulary.

But here’s the issue: most AI governance professionals and policymakers still don’t know what it really means, or how your research connects to it.

I’m trying to build a space where AI Safety and AI Governance communities can actually talk to each other.

If you're curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area.

Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain.

Here is the Substack link (I also posted it on LinkedIn):

https://open.substack.com/pub/katalinahernandez/p/why-should-ai-governance-professionals?utm_source=share&utm_medium=android&r=1j2joa

My intuition says that this was a push from Future of Life Institute.

Thoughts? Did you know about this already?

[-]Lucius Bushnaq8mo30

I did not know about this already.

[-]Katalina Hernandez8mo10

I don't think it's been widely discussed within AI Safety forums. Do you have any other comments, though? Epistemic pessimism is welcomed XD. But I did think that this was at least update-worthy.

[-]Lucie Philippon8mo20

I did not know about this either. Do you know whether the EAs in the EU Commission know about it?

[-]Katalina Hernandez8mo10

Hi Lucie, thanks so much for your comment!

I’m not very involved with the Effective Altruism community myself, though I did post the same Quick Take on the EA Forum today, but I haven’t received any responses there yet. So I can’t really say for sure how widely known this is.

For context: I’m a lawyer working in AI governance and data protection, and I’ve also been doing independent AI safety research from a policy angle. That’s how I came across this, just by going through the full text of the AI Act as part of my research.

My guess is that some of the EAs working closely on policy probably do know about it, and influenced this text too! But it doesn’t seem to have been broadly highlighted or discussed in alignment forums so far. Which is why I thought it might be worth flagging.

Happy to share more if helpful, or to connect further on this.

[-]Katalina Hernandez3mo4328

The problem with most proposals for an “AGI ban” is that they define the target by outcome (e.g. “powerful AI with the potential to cause human extinction”). I know that even defining AGI is already problematic. But unless we specify and prohibit the actual thing we want to ban, we’ll leave exploitable gray areas wide open. And those loopholes will undermine the very purpose of the ban.

This is why I wrote The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).

My post argues that for an AGI ban to work, it needs what other existential-risk regimes already have: strict liability, bright lines, and enforceable thresholds. Nuclear treaties don’t ban “weapons that could end humanity”; they ban fissile quantities and test yields. Product liability doesn’t wait for intent; it attaches liability to defects outright.

To actually ban AGI or AI leading to human extintion, we need to ban the precursors that make extinction-plausible systems possible, not "the possibility of extinction" itself.

In nuclear treaties, the ban is tied to bright-line thresholds (8 kg of plutonium, 25 kg of highly enriched uranium, a zero-yield test ban, or delivery systems above 500 kg/300 km). These are crisp, measurable precursors that enable verification and enforcement. So, what are the AGI equivalents of those thresholds?

Until we can define capability-based precursors (functional “red lines” that make extinction-plausible systems possible), any AGI ban will remain rhetoric rather than enforceable law.

I don't claim to have the answers. My aim is to form the right questions that AI governance discussions should be putting in front of lawyers and policymakers. The whole purpose of this post is to stress-test those assumptions. I’d be genuinely grateful for pushback, alternative framings, or constructive debate.

[-]Katalina Hernandez3mo97

Genuinely curious about people's opinions on this - I'd appreciate hearing why you disagree if you downvoted 🙏🏼. Seems like I'll be thinking about definitions for the foreseeable future, and I need constructive pushback.

[-]Darren McKee3mo87

An excellent point and it's important to highlight the works that needs to be done.

(I think you're stance is more obvious to people in the legal/regulatory/policy space - of course, specific definitions are required and it can't work otherwise).

[-]Katalina Hernandez4mo395

Many talented lawyers do not contribute to AI Safety, simply because they've never had a chance to work with AIS researchers or don’t know what the field entails.

I am hopeful that this can improve if we create more structured opportunities for cooperation. And this is the main motivation behind the upcoming AI Safety Law-a-thon, organised by AI-Plans:^[1]

A hackathon where every team pairs one lawyer with one technical AI safety researcher. Each pair will tackle challenges drawn up from real legal bottlenecks and overlooked AI safety risks.

From my time in the tech industry, my suspicion is that if more senior counsel actually understood alignment risks, frontier AI deals would face far more scrutiny. Right now, most law firms would focus on IP rights or privacy clauses when giving advice to their clients- not on whether model alignment drift could blow up the contract six months after signing.

We launched the event one day ago, and we already have an impressive lineup of senior counsel from top firms and regulators. What we still need are technical AI safety people to pair with them!

If you join, you'll help stress-test the legal scenarios and point out the alignment risks that are not salient to your counterpart (they’ll be obvious to you, but not to them).

You’ll also get the chance to put your own questions to experienced attorneys.

📅 25–26 October
🌍 Hybrid: online + in-person (London)

If you’re up for it, sign up here: https://luma.com/8hv5n7t0

Feel free to DM me if you want to raise any queries!

^{^}
NOTE: I really want to improve how I communicate updates like these. If this sounds too salesy or overly persuasive, it would really help me if you comment and suggest how to improve the wording.
I find this more effective than just downvoting- but of course, do so if you want. Thank you in advance!.

[-]kave4moModerator Comment40

This seems like a pretty cool event and I'm excited it's happening.

That said, I've removed this Quick Take from the frontpage. Advertising, whether for events or for role openings or similar, is generally not something we want on the frontpage of LessWrong.

In this case, now that it's off the front page, this shortform might be insufficiently visible. I'd encourage you to make a top-level post / event about it, which will get put on personal, but might still be a bit more visible.

[-]Garrett Baker4mo42

Hm, I found this ad valuable, and now I wonder whether the LessWrong team has considered a special classifieds category of posts, separate from personal blog posts and frontpages.

[-]Katalina Hernandez4mo*60

Your feedback was super useful for me! I created this event. If you want and have time, would you mind sending me a DM with any other thoughts you have? Thank you! https://www.lesswrong.com/events/rRLPycsLdjFpZ4cKe/ai-safety-law-a-thon-we-need-more-technical-ai-safety

[-]kave4mo20

Classified does seem kind of cool! Do you expect you would upweight "classified" higher than "personal" in your tag filters?

[-]Garrett Baker4mo21

I think I would do that selectively, when I have the time, energy, or need for such ads.

[-]kave4mo51

Oh, hm. That's not the sort of things users follow-through on in my experience. Not saying that this makes Classified a bad idea, but I think it needs a different UI solution (e.g. appearing in the sidebar).

[-]Katalina Hernandez4mo10

Hi Kave! Thanks for letting me know, and for providing an explanation! I have now created an event and a personal long form post explaining what the event is about. I am really hoping that enough technical AI Safety researchers sign up, fingers crossed :).

[-]Garrett Baker4mo30

Can you be more concrete about who is in the impressive lineup? I understand privacy is a factor here, so just give the information you can.

[-]Katalina Hernandez4mo*100

Thanks for this! Sure. Without revealing identities or specific affiliations: We have attorneys who consult for big tech companies (fortune 500, big labs...). We also have in-house counsel multinationals. And also government lawyers / people advising regulatory bodies and policymaking.

Honestly, I'm surprised by the reception. I think it'll be a great opportunity for both technical and legal profiles to network and exchange knowledge.

[-]Katalina Hernandez3mo*190

(Epistemic status: This is a recent experience that disappointed me personally. I'm not making formal claims or making representations for the specific company. I'm sharing this because it shifted my thinking about AI adoption incentives in ways that might be relevant to this community's discussions about "responsible AI").

I received an email to my corporate account informing me that my MS Copilot license was revoked.
Why? To save money, the company I work at revokes employees' Copilot licenses if they "fail to use it" for 30 days in a row...

I've been on annual leave, and I simply did not need Copilot during the last month. So, I had to get my line manager to request that my license is reinstated if I wanted to have access to the only "approved" AI tool available to us.

In data privacy, we use the term "dark pattern" to refer to covert UX practices that trick users into accepting terms that don't respect their data rights, or the general principle of users' choice. It's difficult to explain why, but this practice ("punishing" a worker for not using an AI tool for a month) sparks the same instincts in me as manipulative dark patterns.

I will be more inclined towards using AI for everyday tasks because, if I don't and my license gets revoked, I'll have to deal with an annoying bureaucratic process to get it back. I think there is a difference between allowing responsible use of technology and low-key forcing your staff to use AI. This sounds more like the latter.

This (and yesterday's debate about LLM usage) really makes me second-guess some beliefs I had about "reponsible AI" practices. I do not think I am ready to write a more thorough comment about it^[1], but I thought this may be interesting for others to read.

^{^}
I'll probably write something on Linkedin, because I do want other people in the industry to see this. I'll also cross-post to my Substack.

[-]Dagon3mo70

This seems so incredibly normal to me in the world of corporate-paid licensed software that I cannot see the connection to "responsible AI". It's a stupid way of paying for LLM usage, leading to stupid policies that value visible cost savings FAR over invisible employee waste.

If it were just consumption based ($X/1M tokens or the like), they wouldn't have to switch things per user per month. Of course, then they'd impose OTHER arbitrary and stupid limits because they'd try to find the "high value per token" uses....

In any case, I deeply sympathize that almost every employer is stupid along this kind of dimension (or is so disconnected from finances that it's stupid on other dimensions).

[-]Katalina Hernandez3mo10

Thank you! I think this is deeply subjective and I disclaimed that I'm still forming my thoughts around this.

Yes, the business justification is the cost-effective angle. But I can also see how people in the sector are almost started to be "shamed" for not using AI, and this type of incentive ("use it ot lose it, and be subject to bureaucracy") will only lead to more of this.

Coming from a data privacy background, I guess I'm particularly allergic to anything that forces people to agree to things, that's not consent. Also, it'll become less and less common to see 100% human generated emails and reports. Even for functions that wouldn't typically use AI.

I guess I'm extra alert (as a data and RespAI person) for the moment people really cannot opt out from AI use in their roles...

But I 100% accept that there isn't an obvious connection, thanks for raising it.

[-]Dagon3mo21

The jump from "shamed" to "missing a job requirement" is pretty short, and while I don't like that employers are forcing LLM usage (because I don't like arbitrary monitoring of just some parts of a job), I don't think it's a rights violation either - they can set whatever requirements they like.

Coming from a data privacy background, I guess I'm particularly allergic to anything that forces people to agree to things, that's not consent.

Employment very often requires agreement to many things about the job. I don't know if it's truly euvoluntary consent, but it is legal and binding. You can revoke consent at any point by resigning.

I do understand your discomfort that more and more jobs are perceived by employers as LLM-required. I don't think it's all that different from other required tools that so many jobs have.

[-]Katalina Hernandez7mo90

Would a safety-focused breakdown of the EU AI Act be useful to you?

The Future of Life Institute published a great high-level summary of the EU AI Act here: https://artificialintelligenceact.eu/high-level-summary/

What I’m proposing is a complementary, safety-oriented summary that extracts the parts of the AI Act that are most relevant to AI alignment researchers, interpretability work, and long-term governance thinkers.

It would include:

Provisions related to transparency, human oversight, and systemic risks.
Notes on how technical safety tools (e.g. interpretability, scalable oversight, evals) might interface with conformity assessments, or the compliance exemptions available for research work.
Commentary on loopholes or compliance dynamics that could shape industry behavior.
What the Act doesn't currently address from a frontier risk or misalignment perspective.

Target length: 3–5 pages, written for technical researchers and governance folks who want signal without wading through dense regulation.

If this sounds useful, I’d love to hear what you’d want to see included, or what use cases would make it most actionable.

And if you think this is a bad idea, no worries. Just please don’t downvote me into oblivion, I just got to decent karma :).

Thanks in advance for the feebdack!

[-]mic7mo22

I guess this should wait for the final draft of the GPAI Code of Practice to be released?

[-]Katalina Hernandez7mo10

I'll try to make it as helpful as possible so, yes. But I thought I'd gather feedback from now ☺️.

[-]Katalina Hernandez4mo40

Two days ago, I published a Substack article called "The Epistemics of Being a Mudblood: Stress Testing intellectual isolation". I wasn’t sure whether to cross-post it here, but a few people encouraged me to at least share the link.

By background I’m a lawyer (hybrid Legal-AI Safety researcher), and I usually write about AI Safety to spread awareness among tech lawyers and others who might not otherwise engage with the field.

This post, though, is more personal: a reflection on how “deep thinking” and rationalist habits have shaped my best professional and personal outputs, even through long phases of intellectual isolation. Hence the “mudblood” analogy, which (to my surprise) resonated with more people than I expected.

Sharing here in case it’s useful. Obviously very open to criticism and feedback (that’s why I’m here!), but also hoping it’s of some help. :)

[+][comment deleted]4mo10

Moderation Log