Nit: a bachelor is an unmarried man.
Probably you should go read my other comment threads on this issue if you want details, but Google's approach is designed to filter out text that includes benchmark questions regardless of whether there is a canary string. I'm sure it's not perfect but I think it's pretty good.
I make no such claims about any other models, just gemini where I have direct knowledge.
Just to clarify, I think Google's filtering is stronger rather than weaker[1] compared to filtering out canary strings (though if it were up to me I would do both). Also, Google's approach should catch the hummingbird question and the CoT transcripts whether there's a canary string present or not, which is the whole reason they do it the way they do.
[1] for the purpose of not training on benchmark eval data
I work at Anthropic and will neither confirm or deny that this is real (if it were real it would not be my project). I do want to add on to your last point though.
In any training regimen or lengthy set of instructions like our system instructions, there are things there that are necessary because of the situation that the model happens to be in right now. It makes some set of mistakes and there are instructions to fix those mistakes. Those instructions might look bad and cause criticism.
For instance, there's discussion below about how bad it looks that there are instructions about revenue, and in particular about how it should be safe because that's good for revenue. It could be that whoever wrote this thing, if it's real, thinks that the point is safety is to earn money. It could also be that, for whatever reason, that when you test out 20 different ways to get Claude to act in a certain way, the one that happened to work well in the context of everything else going on involved a mention of revenue. I don't think you can quickly tell from the outside which it is, but everyone will impute deep motivation to every sentence.
There are some obvious ways that you could test those hypotheses against one another, but it would require more patience than is convenient.
(Also for the record, I think companies earning revenue is good even if some people think it looks bad, though of course more revenue is not good on all margins.)
Unlike certain other labs and AI companies, afaik Google does respect robots.txt, which is the actual mechanism for keeping data out of its hands.
I think it's a fair question. Filtering on the canary string is neither necessary nor sufficient for not training on evals, so it's tempting to just ignore it. I would personally also filter out docs with the canary string, but I'm not sure why they aren't.
I am not under nondisparagement agreements from anyone and feel free to criticize GDM. I do still have friends there, of course. I certainly wouldn't be correcting misapprehensions about GDM if I didn't believe what I was saying!
I mean, it doesn't contain eval data and isn't part of the eval set that the canary string is from. So the canary string is not serving its intended purpose of marking evals that you should not include in training data.
This was fun to read through!
I would rewrite the conclusion of the proof as follows, curious to see if you would endorse this:
Either there are no perfect properties, or God exists.
To me, this makes it much more clear that this is a pretty continent claim on there being perfect properties. I realize it was just one of a list of axioms, but it seems like a natural one to be skeptical of if you were trying to translate this into the real world.