Feedback wanted: Shortlist of AI safety ideas — LessWrong

Summary:

I made a prioritized list of 130 ideas on how to contribute to AI safety.
The top 12 ideas have similar overall scores, so I'd like to verify my assumptions.
In particular, I'd like to know which ones are already being worked on by other people or otherwise irrelevant. My main goal is to avoid duplicating existing work.
Comments are welcome with a very low threshold, I prefer unpolished thoughts over silence.

Background:

Hello everyone! 👋

At the start of year, I was trying to figure out what my next goals should be regarding AI safety. I ended up making a list of 130 small, concrete ideas ranging all kinds of domains. A lot of them are about gathering and summarizing information, skilling up myself, and helping others be more effective.

I needed a rough way to prioritize them, so I built a prioritization framework inspired by 80,000 Hours. I won't go into the details here, but basically it allowed me to filter out the top 12 ideas - one for each month.

In the first half of the year I did the most urgent ones, and now I'm in Q2 of the Eisenhower matrix. The shortlist still has 12 things to do, since I've come up with new ideas during the first half of 2025. However, I need a bit more input to narrow down the next steps.

This is where you come in. I'd like to hear your thoughts on the shortlist and the relative importance of its tasks. Especially valuable would be to know if something is already being done by someone who can get it done - then I can skip it and get closer to the margin.

The shortlist (by category):

Investigating and summarizing knowledge

1. Visualize the distribution of people in research/governance fields; make the safety "portfolio" clearer and more tangible

I'm thinking this would be useful for decision-making, so that people can get closer to the margin in their work.
If this is already done or being worked on, please point me to a relevant resource.

2. Find out what % of Schmidt Futures or UK AISI's safety-earmarked budget actually gets used

I've understood that there are plenty of orgs where a lot of budget is left unused. It would be interesting to find out the details and share them.
However, I'm expecting that others have already done this work?

3. Catalogue and evaluate practical plans to deter or influence RSI

Given recent advances, it sounds like the clock is ticking with recursive self-improvement. I'm not sure if we want a warning shot with it or put it on pause until safety is on an acceptable level.
At any rate, gathering viable plans in one place seems interesting and important.

4. Investigate and post about how crucial it is to get mainstream media out of the competition/race framing

Intuitively I feel like this would be very valuable, but I'm very unsure about the actual impact of media interaction. I'd like to investigate the topic, estimate what is the EV of such work is, and post my findings.

Fieldbuilding and productivity

5. Come up with a response to a friend's argument over lack of agency

A close friend of mine is in an excellent position to do AI safety research, but has argued himself into a bit of a rut.
Taking time to understand the situation and showing alternatives might bring a very talented person into the field.
In general, I keep wondering whether I should be focusing on getting more people into the field vs. skilling up myself vs. looking for funding sources. The current needs and bottlenecks of AI safety as a whole are unclear to me, so I default to doing "a bunch of random things that seem most important".

6. Talk about evaluations at Chiba University's AI safety workshop

There will be a workshop about evaluations next week, with Japan AISI and some government officials involved. It seems like an unusually high-leverage opportunity, so I'm planning to attend and share my thoughts.
Let me know if you have thoughts on how Japan should approach evals, or what the most relevant challenges are.

7. Offer productivity tools and coaching to AI safety researchers

I spoke to 20 researchers at TAIS 2025, and half of them cited personal lack of time as their biggest bottleneck. Two most common time sinks were grant proposals and lack of prioritization. Taking steps to resolve them could be a valuable use of my coaching background.
Other bottlenecks that came up more than once were lack of funding, difficulty of finding exceptional talent, and feelings of inadequacy or anxiety.

8. Do a calculation on whether it's net positive for Reaktor to work on AI safety

I've been pushing my employer to take on more consulting work in AI safety. We have about 40 experts in data science and ML engineering, and many would be excited to work on safety instead of capabilities.
However, most of the viable projects seem to be collaborations with scaling labs, since they have the strongest need and financial position. With the recent criticism towards said labs, I'm less confident that this would actually be valuable.
If you know of safety-related projects that would benefit from contracting a few designers or engineers, do let me know!

Writing and sharing thoughts

9. Write a LW/AF post convincing people in safety to take lower salaries

I've spent a lot of my life thinking about frugality and consequentialism, so I might have some useful things to say about this topic.
The general argument is that lower salaries would reduce tax overhead, and allow more people to partake in the available funding. It might also deter people from entering the field out of greed. My view is that a negligible decrease in individual happiness would lead to much higher utility overall.
It's also good to note that geoarbitrage is a thing: researcher salaries in Japan are 3-4x cheaper than those in the US. Of course outsourcing research has its own challenges, but this might be a valuable consideration for 10+ year timelines.

10. Write about how evals might actually push capabilities, as model developers take on new challenges

I've been thinking about this since a few years ago, but never shared my thoughts. However, it seems like the current opinion has started shifting towards this direction already?

11. Write an article on how easy to shut down AGI/ASI systems should be

I have hot take and chain of reasoning on this that might be interesting to write out.
Conclusion in one sentence: Superintelligent or self-improving systems should not be deployable if more than 0.000001% of humans that have existed since 2025 are or have been against their use (imo).

12. Write a LW/AF post about why we should divest half of interpretability

I'd like to shift the focus back to things that actually reduce x-risk, and help make the AI safety "portfolio" more balanced.
The overfocus on mechinterp was one of my main gripes last year when visiting Berkeley, but it seems like critique about it is already spreading across the community?

If you have thoughts or context to share about any of these, that would be much appreciated. Thank you!

A few personal reflections:

Doing the prioritization was a useful exercise in itself: I noticed most of my ideas seemed ok at the time of having them, but were actually pretty bad upon further inspection. Being able to prioritize and narrow them down by 90% was valuable. It even helped me generate better ideas afterwards, since I can easily compare new ideas to the existing ones.

This also resolved another problem of mine: ideas and advice from other people, while useful, often fails to account for my personal situation (strong desire to live in Japan) and unique opportunities (broad network across cultural and language borders).

Putting numbers to all the ideas made it easier to understand and articulate what I value and am motivated by. It's also nice to have immediate feedback about an idea in an unambiguous form, and I like being able to forget about all the random details: I can simply add notes to my spreadsheet and refer to them when needed.

However, now I've finished the highest-priority tasks and need to figure out what to do in the latter half of the year. My aim has been to do one thing per month; that seems sustainable with my other commitments. I'm hoping that sharing this publicly will encourage others to seek more feedback about their priorities as well.