Announcing: OpenAI's Alignment Research Blog

Naomi Bashkansky

LESSWRONG
LW

Announcing: OpenAI's Alignment Research Blog — LessWrong

120 Announcing: OpenAI's Alignment Research Blog

by Naomi Bashkansky

1st Dec 2025

1 min read

120

The OpenAI Alignment Research Blog launched today at 11 am PT! With 1 introductory post, and 2 technical posts.

Blog: https://alignment.openai.com/

Thread on X: https://x.com/j_asminewang/status/1995569301714325935

Speaking purely personally: when I joined the Alignment team at OpenAI in January, I saw there was more safety research than I'd expected. Not to mention interesting thinking on the future of alignment. But that research & thinking didn't really have a place to go, considering it's often too short or informal for the main OpenAI blog, and most OpenAI researchers aren't on LessWrong. I'm hoping the blog is a more informal, lower-friction home than the main blog, and this new avenue of publishing encourages sharing and transparency.

OpenAIAI

Personal Blog

120

Announcing: OpenAI's Alignment Research Blog

New Comment

11 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:32 PM

[-]habryka3mo5757

I generally think blogging is a good way to communicate intellectual progress, so see this as a good development!

Some thoughts on your first blogpost:

At OpenAI, we research how we can develop and deploy increasingly capable AI, and in particular AI capable of recursive self-improvement (RSI)

My reaction: Wait, what, why? I guess it's nice to be as direct, but it feels sad that this is written as the bottom line.

To be clear, I agree that given this being OpenAI's stance it's good to say it plainly! But I was hoping that at least the safety team would have the position that "we will try to determine whether there is any way to build RSI safely, and will strongly advocate for not doing so if we think it cannot be done safely".

Like, a thing that feels particularly sad here is that I was assuming that figuring out whether this can be done safely, or studying that question, is one of the key responsibilities of the safety team. This is an update that it isn't, which is sad (and IMO creates some responsibility for members of the safety team to express concern about that publicly, but IDK, OpenAI seems like in a messy state with regards to that kind of stuff).

[-]Boaz Barak3mo222

Thank you for pointing this out! While OpenAI have been public about our plans to build an AI scientists, it is of course crucial that we do this safely, and if it is not possible to do it safely, we should not do it at all.

We have written about this before:

OpenAI is deeply committed to safety⁠, which we think of as the practice of enabling AI’s positive impacts by mitigating the negative ones. Although the potential upsides are enormous, we treat the risks of superintelligent systems as potentially catastrophic and believe that empirically⁠ studying⁠ safety⁠ and alignment⁠ can help global decisions, like whether the whole field should slow development to more carefully study these systems as we get closer to systems capable of recursive self-improvement. Obviously, no one should deploy superintelligent systems without being able to robustly align and control them, and this requires more technical work.

but we should have mentioned this in the hello world post too. We now updated it with a link to this paragraph.

[-]danm3mo20

It might be productive to hash out a difference in thinking here as well. Perhaps a crux is: I (a safety researcher at OpenAI) think question (a) “is this path to RSI on track to be safe, based on what we see so far?” is likely amenable to empirical study. I think the best way to gather evidence on question (b) “is there any version of RSI that would be safe?” is to gather evidence on (a) for some particular approach.

One could measure and attempt to fix safety issues along the path to RSI (an example of what OpenAI calls “iterative deployment”, applied to in this case an internal deployment). If attempted fixes didn’t seem to be working, one could continuously gather and show evidence to that effect to present the most compelling case. This has more risk surface than determining whether the endeavor is safe ahead of time, but seems more tractable and more to OpenAI safety researchers' comparative advantage.

Curious @habryka if there is a different approach to question (b) that you think safety researchers at OpenAI are well positioned to pursue instead or in parallel.

[-]Caleb Biddulph3mo115

It looks like OpenAI is following Anthropic's lead, which is great!

Google DeepMind's alignment team also has a blog, but it's much more targeted towards laypeople, mostly shares papers as they come out rather than sharing informal research, and IMO isn't as nice compared to a dedicated website or even a Substack. It might be worth considering doing something like this at GDM, subject to tradeoffs on researchers' time and Google's internal publication restrictions.

[-]Rohin Shah3mo334

We mostly just post more informal stuff directly to LessWrong / Alignment Forum (see e.g. our interp updates).

Having a separate website doesn't seem that useful to readers. I generally see the value proposition of a separate website as attaching the company's branding to the post, which helps the company build a better reputation. It can also help boost the reach of an individual piece of research, but this is a symmetric weapon, and so applying it to informal research seems like a cost to me, not a benefit. Is there some other value you see?

(Incidentally, I would not say that our blog is targeted towards laypeople. I would say that it's targeted towards researchers in the safety community who have a small amount of time to spend and so aren't going to read a full paper. E.g. this post spends a single sentence explaining what scheming is and then goes on to discuss research about it; that would absolutely not work in a piece targeting laypeople.)

[-]Caleb Biddulph3mo50

We mostly just post more informal stuff directly to LessWrong / Alignment Forum

@Naomi Bashkansky Would it make sense for OpenAI to crosspost their blog posts to LW/AF? I'd personally be more likely to see them if they did.

[-]Naomi Bashkansky3mo120

Sure! I'll make sure they get cross-posted.

[-]Caleb Biddulph3mo40

Oh yep, I forgot about the interp updates and other informal posts from GDM researchers - these seem to occupy the same niche I was thinking the blog could have. The idea seemed worth suggesting, for the simple reason that Anthropic and OpenAI had both done it, but in retrospect I buy your argument that the main benefit of a blog is to the company rather than the readers, and I retract my suggestion.

My impression with the Medium blog is based pretty much entirely on the one paper I was involved with (MONA), where I remember hearing something like "we're going to write a separate Medium post and a LessWrong post, and the Medium post will assume (relatively) less knowledge of the field." I'm not sure if my impression was correct, or if things have changed since then.

[-]Rohin Shah3mo62

Ah yeah, I think with that one the audiences were "researchers heavily involved in AGI Safety" (LessWrong) and "ML researchers with some interest in reward hacking / safety" (Medium blog)

[-]Naomi Bashkansky3mo60

Agreed!

[-]Richard_Kennaway3mo60

Any chance of an RSS feed?

Moderation Log