This question may be stupid but aren't canary strings quite fragile in terms safety mechanism for training? If someone were to just copy and paste material from one webpage to another without the string (be it intentionally or not), wouldn't this defeat the purpose?
If so, finding something a bit more failsafe for researchers to use that avoid contamination seems like a interesting project that would benefit the field.
This doesn't solve the issue of models figuring out for themselves how to bypass this, but does stop a lot of easy cases of missuses of the information.
This question may be stupid but aren't canary strings quite fragile in terms safety mechanism for training? If someone were to just copy and paste material from one webpage to another without the string (be it intentionally or not), wouldn't this defeat the purpose?
If so, finding something a bit more failsafe for researchers to use that avoid contamination seems like a interesting project that would benefit the field.
This doesn't solve the issue of models figuring out for themselves how to bypass this, but does stop a lot of easy cases of missuses of the information.