Two more disclaimers from both policies that worry me:
Meta writes:
Security Mitigations - Access is strictly limited to a small number of experts, alongside security protections to prevent hacking or exfiltration insofar as is technically feasible and commercially practicable.
"Commercially practicable" is so load-bearing here. With a disclaimer like this, why not publicly commit to writing million-dollar checks to anyone who asks for one? It basically means "We'll do this if it's in our interest, and we won't if it's not." Which is like, duh. That's the decision procedure for everything you do.
I do think the public intention setting is good, and it might support the codification of these standards, but it does not a commitment make.
Google, on the other hand, has this disclaimer:
The safety of frontier AI systems is a global public good. The protocols here represent our current understanding and recommended approach of how severe frontier AI risks may be anticipated and addressed. Importantly, there are certain mitigations whose social value is significantly reduced if not broadly applied to AI systems reaching critical capabilities. These mitigations should be understood as recommendations for the industry collectively: our adoption of them would only result in effective risk mitigation for society if all relevant organizations provide similar levels of protection, and our adoption of the protocols described in this Framework may depend on whether such organizations across the field adopt similar protocols.
I think it's funny these are coming out this week as an implicit fulfillment of the Seoul commitments. Did I miss some language in the Seoul commitments saying "we can abandon these promises if others are doing worse or if it's not in our commercial interest?"
And, if not, do policies with disclaimers like these really count as fulfilling a commitment? Or is it more akin to Anthropic's non-binding anticipated safeguards for ASL-3? If it's the latter, then fine, but I wish they'd label it as such.
I recently created a simple workflow to allow people to write to the Attorneys General of California and Delaware to share thoughts + encourage scrutiny of the upcoming OpenAI nonprofit conversion attempt.
I think this might be a high-leverage opportunity for outreach. Both AG offices have already begun investigations, and Attorneys General are elected officials who are primarily tasked with protecting the public interest, so they should care what the public thinks and prioritizes. Unlike e.g. congresspeople, I don't AGs often receive grassroots outreach (I found ~0 examples of this in the past), and an influx of polite and thoughtful letters may have some influence — especially from CA and DE residents, although I think anyone impacted by their decision should feel comfortable contacting them.
Personally I don't expect the conversion to be blocked, but I do think the value and nature of the eventual deal might be significantly influenced by the degree of scrutiny on the transaction.
Please consider writing a short letter — even a few sentences is fine. Our partner handles the actual delivery, so all you need to do is submit the form. If you want to write one on your own and can't find contact info, feel free to dm me.
OpenAI has finally updated the "o1 system card" webpage to include evaluation results from the o1 model (or, um, a "near final checkpoint" of the model). Kudos to Zvi for first writing about this problem.
They've also made a handful of changes to the system card PDF, including an explicit acknowledgment of the fact that they did red teaming on a different version of the model from the one that released (text below). They don't mention o1 pro, except to say "The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1."
Practically speaking, with o3 just around the corner, these are small issues. But I see the current moment as the dress rehearsal for truly powerful AI, where the stakes and the pressure will be much higher, and thus acting carefully + with integrity will be much more challenging. I'm frustrated OpenAI is already struggling to adhere to its preparedness framework and generally act in transparent and legible ways.
I've uploaded a full diff of the changes to the system card, both web and PDF, here. I'm also frustrated that these changes were made quietly, and that the "last updated" timestamp on the website still reads "December 5th"
As part of our commitment to iterative deployment, we continuously refine and improve our models. The evaluations described in this System Card pertain to the full family of o1 models, and exact performance numbers for the model used in production may vary slightly depending on system updates, final parameters, system prompt, and other factors.
More concretely, for o1, evaluations on the following checkpoints [1] are included:
• o1-near-final-checkpoint
• o1-dec5-release
Between o1-near-final-checkpoint and the releases thereafter, improvements included better format following and instruction following, which were incremental post-training improvements (the base model remained the same). We determined that prior frontier testing results are applicable for these improvements. Evaluations in Section 4.1, as well as Chain of Thought Safety and Multilingual evaluations were conducted on o1-dec5-release, while external red teaming and Preparedness evaluations were conducted on o1-near-final-checkpoint [2]
"OpenAI is constantly making small improvements to our models and an improved o1 was launched on December 17th. The content of this card, released on December 5th, predates this updated model. The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1"
"Section added after December 5th on 12/19/2024" (In reality, this appears on the web archive version of the PDF between January 6th and January 7th)
I agree with your odds, or perhaps mine are a bit higher (99.5%?). But if there were foul play, I'd sooner point the finger at national security establishment than OpenAI. As far as I know, intelligence agencies committing murder is much more common than companies doing so. And OpenAI's progress is seen as critically important to both.
Lucas gives GPT-o1 the homework for Harvard’s Math 55, it gets a 90%
The linked tweet makes it look like Lucas also had an LLM doing the grading... taking this with a grain of salt!
I've used both data center and rotating residential proxies :/ But I am running it on the cloud. Your results are promising so I'm going to see how an OpenAI-specific one run locally works for me, or else a new proxy provider.
Thanks again for looking into this.
Ooh this is useful for me. The pastebin link appears broken - any chance you can verify it?
I defintiely get 403s and captchas pretty reliably for OpenAI and OpenAI alone (and notably not google, meta, anthropic, etc.) with an instance based on https://github.com/dgtlmoon/changedetection.io. Will have to look into cookie refreshing. I have had some success with randomizing IPs, but maybe I don't have the cookies sorted.
Sorry, I might be missing something: subdomains are subdomain.domain.com, whereas ChatGPT.com is a unique top-level domain, right? In either case, I'm sure there are benefits to doing things consistently — both may be on the same server, subject to the same attacks, beholden to the same internal infosec policies, etc.
So I do believe they have their own private reasons for it. Didn't mean to imply that they've maliciously done this to prevent some random internet guy's change tracking or anything. But I do wish they would walk it back on the openai.com pages, or at least in their terms of use. It's hypocritcal, in my opinion, that they are so cautious about automated access to their own site while relying on such access so completely from other sites. Feels similar to when they tried to press copyright claims against the ChatGPT subreddit. Sure, it's in their interest for potentially nontrivial reasons, but it also highlights how weird and self-serving the current paradigm (and their justifications for it) are.
It's the first official day of the AI
SafetyAction Summit, and thus it's also the day that the Seoul Commitments (made by sixteen companies last year to adopt an RSP/safety framework) have come due.I've made a tracker/report card for each of these policies at www.seoul-tracker.org.
I'll plan to keep this updated for the foreseeable future as policies get released/modified. Don't take the grades too seriously — think of it as one opinionated take on the quality of the commitments as written, and in cases where there is evidence, implemented. Do feel free to share feedback if anything you see surprises you, or if you think the report card misses something important.
My personal takeaway is that both compliance and quality for these policies are much worse than I would have hoped. I believe many peoples' theories of change for these policies gesture at something about a race to the top, where companies are eager to outcompete each other on safety to win talent and public trust, but I don't sense much urgency or rigor here. Another theory of change is that this is a sort of laboratory for future regulation, where companies can experiment now with safety practices and the best ones could be codified. But most of the diversity between policies here is in how vague they can be while claiming to manage risks :/
I'm really hoping this changes as AGI gets closer and companies feel they need to do more to prove to govts/public that they can be trusted. Part of my hope is that this report card makes clear to outsiders that not all voluntary safety frameworks are equally credible.