LESSWRONG
LW

936
Stephen McAleese
1260Ω29132021
Message
Dialogue
Subscribe

Software Engineer interested in AI and AI safety.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
3Stephen McAleese's Shortform
3y
14
AI Safety Field Growth Analysis 2025
Stephen McAleese2d20

That's a good question. One approach I took is to look at the research agendas and outputs (e.g. Google DeepMinds AI safety research agenda) and estimate the number of FTEs based on those.

I would say that I'm including teams that are working full-time on advancing technical AI safety or interpretability (e.g. the GDM Mechanistic Interpretability Team). 

To the best of my knowledge, there are a few teams like that at Google DeepMind and Anthropic though I could be underestimating given that these organizations have been growing rapidly over the past few years.

A weakness of this approach is that there could be large numbers of staff who sometimes work on AI safety and significantly increase the effective number of AI safety FTEs at the organization.

Reply
AI Safety Field Growth Analysis 2025
Stephen McAleese2d20

Good observation, thanks for sharing.

One possible reason is that I've included more organizations in this updated post and this would raise many estimates.

Another reason is that in the old post, I used a linear model that assumed that an organization started with 1 FTE when founded and linearly increased until the current number (example: an organization has 10 FTEs in 2025 and was founded in 2015. Assume 1 FTE in 2015, 2 FTEs in 2016 ... 10 in 2025).

The new model is simpler and just assumes the current number for all years (e.g. 10 in 2015 and 10 in 2025) so it's estimates for earlier years are higher than the previous model. See my response to Daniel above.

Reply
AI Safety Field Growth Analysis 2025
Stephen McAleese2d40

I think it's hard to pick a reference class for the field of AI safety because the number of FTEs working on comparable fields or projects can vary widely.

Two extremes examples:
- Apollo Program: ~400,000 FTEs
- Law of Universal Gravitation: 1 FTE (Newton)

Here are some historical challenges which seem comparable to AI safety since they are technical, focused on a specific challenge, and relatively recent [1]:

  • Pfizer-BioNTech vaccine (2020): ~2,000 researchers and ~3,000 FTEs for manufacturing and logistics
  • Human genome project (1990 - 2003): ~3,000 researchers across ~20 major centers
  • ITER fusion experiment (2006 - present): ~2,000 engineers and scientists, ~5000 FTEs in total
  • CERN and LHC (1994 - present): ~3000 researchers working onsite, ~15,000 collaborators arouond the world.

I think these projects show that it's possible to make progress on major technical problems with a few thousand talented and focused people.

  1. ^

    These estimates were produced using ChatGPT with web search.

Reply
AI Safety Field Growth Analysis 2025
Stephen McAleese2d30

I'm pretty sure that's just a mistake. Thanks for spotting it! I'll remove the duplicated row.

For each organization, I estimated the number of FTEs by looking at the team members page, LinkedIn, and what kinds of outputs have been produced by the organization and who is associated with them. Then the final estimate is an intuitive guess based on this information.

Reply
AI Safety Field Growth Analysis 2025
Stephen McAleese2d40

Thanks for your helpful feedback Daniel. I agree that the estimate for 2015 (~50 FTEs) is too high. The reason why is that the simple model assumes that the number of FTEs is constant over time as soon as the organization is founded.

For example, the FTE value associated with Google DeepMind is 30 today and the company was founded in 2010 so the value back then is probably too high.

Perhaps a more realistic model would assume that the organization has 1 FTE when founded and linearly increases. Though this model would be inaccurate for organizations that grow rapidly and then plateau in size after being founded.

Reply
Interpretability is the best path to alignment
Stephen McAleese1mo30

Thanks for the post. It covers an important debate: whether mechanistic interpretability is worth pursuing as a path towards safer AI. The post is logical and makes several good points but I find it's style too formal for LessWrong and it could be rewritten to be more readable.

Reply
Understanding LLMs: Insights from Mechanistic Interpretability
Stephen McAleese1mo40

Thank you for taking the time to comment and for pointing out some errors in the post! Your attention to detail is impressive. I updated the post to reflect your feedback:

  • I removed the references to S1 and S2 in the IOI description and fixed the typos you mentioned.
  • I changed "A typical vector of neuron activations such as the residual stream..." to "A typical activation vector such as the residual stream..."

Good luck with the rest of the ARENA curriculum! Let me know if you come across anything else.

Reply
Turing-Test-Passing AI implies Aligned AI
Stephen McAleese1mo20

While I disagree with a lot of this post, I thought it was interesting and I don't think it should have negative karma.

Reply
Stephen McAleese's Shortform
Stephen McAleese1mo50

I haven't heard anything about RULER on LessWrong yet:

RULER (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—no labeled data, expert feedback, or reward engineering required.

✨ Key Benefits:

  • 2-3x faster development - Skip reward function engineering entirely
  • General-purpose - Works across any task without modification
  • Strong performance - Matches or exceeds hand-crafted rewards in 3/4 benchmarks
  • Easy integration - Drop-in replacement for manual reward functions

Apparently it allows LLM agents to learn from experience and significantly improves reliability.

Link: https://github.com/OpenPipe/ART

Reply
Summary of our Workshop on Post-AGI Outcomes
Stephen McAleese1mo64

These talks are fascinating. Thanks for sharing.

Reply
Load More
28AI Safety Field Growth Analysis 2025
5d
13
40Understanding LLMs: Insights from Mechanistic Interpretability
1mo
2
16How Can Average People Contribute to AI Safety?
7mo
4
197Shallow review of technical AI safety, 2024
Ω
9mo
Ω
35
23Geoffrey Hinton on the Past, Present, and Future of AI
1y
5
34Could We Automate AI Alignment Research?
Ω
2y
Ω
10
73An Overview of the AI Safety Funding Situation
2y
10
26Retrospective on ‘GPT-4 Predictions’ After the Release of GPT-4
3y
6
112GPT-4 Predictions
3y
27
3Stephen McAleese's Shortform
3y
14
Load More
Road To AI Safety Excellence
3 years ago
(+3/-2)