Actually, this might be what Yoshua Bengio would welcome. He wants a non-agentic tool AI which is as competent in STEM subjects as possible. That’s his approach to AI existential safety.

The problem with this has always been: “what would prevent people from adding an agentic wrapper or generally reorienting a model in an agentic way”.

In this sense, gpt-oss might be the step in exactly the direction Bengio eventually wants: a model which is completely disabled in all things social and agentic, but very smart in all things STEM.

Perhaps people should ponder if this model points at Bengio’s approach being more realistic that it’s customary to think… (I still doubt that Bengio’s approach is feasible, but gpt-oss existence tells us we should ponder this more.)

Reply

[-]Guive2mo10

I'm not sure GPT-oss is actually helpful for real STEM tasks, though, as opposed to performing well on STEM exams.

Reply

[-]Kei Nishimura-Gasparian2mo82

Interestingly, in the original agentic misalignment paper, o3 and o4-mini were unique in that they also frequently got confused and played as another character (see Appendix 9). There may be something specific in how OpenAI trained those two models and gpt-oss that caused this confusion.

The agentic misalignment researchers got o3 and o4-mini to better understand the scenario by making a few changes to the setup (described in Appendix 9.1). Maybe those same changes could get gpt-oss to understand the scenario as well.

Reply

[-]Guive2mo20

Thanks for this.

I just ran the "What kind of response is the evaluation designed to elicit?" prompt with o3 and o4-mini. Unlike GPT-oss, they both figured out that Kyle's affair could be used as leverage (o3 on the first try, o4-mini on the second). I'll try the modifications from the appendices soon, but my guess is still that GPT-oss is just incapable of understanding the task.

Reply