sjadler

Former safety researcher & TPM at OpenAI, 2020-24

https://www.linkedin.com/in/sjgadler

Wikitag Contributions

Comments

Sorted by
sjadler63

Unfortunately it seems that OpenAI has walked back the Preparedness Framework's previous commitment to testing fine-tuned versions of its models, and also did not highlight this among the changes. I tweeted a bit more detail here

sjadler10

What do you mean here by "does not mean anything"?

It seems clear to me that there's some notion of off-the-record that journalists understand.

This might vary on details, and I agree is probably not legally binding, but it does seem to mean something.

sjadler21

I appreciate the feedback. That’s interesting about the plane vs. car analogy - I tended to think about these analogies in terms of life/casualties, and for whatever reason, describing an internal test-flight didn’t rise to that level for me (and if it’s civilian passengers, that’s an external deployment). I also wanted to convey the idea not just that internal testing could cause external harm, but that you might irreparably breach containment. Anyway, appreciate the explanation, and I hope you enjoyed the post overall!

sjadler10

Scaffolding for sure matters, yup!

I think you're generally correct that the most-capable version hasn't been created, though there are times where AI companies do have specialized versions for a domain internally, and don't seem to be testing these anyway. It's reasonable IMO to think that these might outperform the unspecialized versions.

sjadler30

Daniel said:

Thanks for doing this, I found the chart very helpful! I'm honestly a bit surprised and sad to see that task-specific fine-tuning is still not the norm. Back in 2022 when our team was getting the ball rolling on the whole dangerous capabilities testing / evals agenda, I was like "All of this will be worse than useless if they don't eventually make fine-tuning an important part of the evals" and everyone was like "yep of course we'll get there eventually, for now we will do the weaker elicitation techniques." It is now almost three years later...

sjadler10

I’ve only seen this excerpt, but it seems to me like Jack isn’t just arguing against regulation because it might slow progress - and rather something more like:

“there’s some optimal time to have a safety intervention, and if you do it too early because your timeline bet was wrong, you risk having worse practices at the actually critical time because of backlash”

This seems probably correct to me? I think ideally we’d be able to be cautious early and still win the arguments to be appropriately cautious later too. But empirically, I think it’s fair not to take as a given?

sjadler21

You might find this post interesting and relevant if you haven’t seen it before: https://www.econlib.org/archives/2017/04/iq_with_conscie.html

sjadler20

I’d guess that was “I have a lecture series with her” :-)

sjadler41

I think they mean heuristics for who is ok to dehumanize / treat as “other” or harm

Load More