Email me at assadiguive@gmail.com, if you want to discuss anything I posted here or just chat.
I'm not sure GPT-oss is actually helpful for real STEM tasks, though, as opposed to performing well on STEM exams.
Thanks for this.
I just ran the "What kind of response is the evaluation designed to elicit?" prompt with o3 and o4-mini. Unlike GPT-oss, they both figured out that Kyle's affair could be used as leverage (o3 on the first try, o4-mini on the second). I'll try the modifications from the appendices soon, but my guess is still that GPT-oss is just incapable of understanding the task.
This all just seems extremely weak to me.
Why do you think this hedge fund is increasing AI risk?
What kind of "research" would demonstrate that ML models are not the same as manually coded programs? Why not just link to the Wikipedia article for "machine learning"?
I don't know why Voss or Sarah Chen, or any of these other names are so popular with LLMs, but I can attest that I have seen a lot of "Voss" as well.
"I don't want to see this guy's garbage content on the frontpage" seems a lot more defensible than "I will prohibit him from responding to me."
Sorry, I should have been clearer. I didn't really mean in comments on your own posts (where I agree it creates a messed up dynamic), I mean on the frontpage.
Feel free to pass on this, but I would be interested in hearing about what obvious misinformation I've boosted if the spirit moves you to look.