Thanks for posting this! This seems to be important to balance dataset before training CCS probes.
Another strange thing is that accuracy of CCS degrades for auto-regressive models like GPT-J, LLAMA. For GPT-J it is about random choose performance as per the DLK paper (Collins et al, 2022), about 50-60%. And in the ITI paper (Kenneth et al, 2023) they chose linear regression probe instead of CCS, and say that CCS was so poor that it was near random (same as in the DLK paper). Do you have thoughts on that? Perhaps they used bad datasets as per your research?
What do you think are realistic ways to enforce this on a global level? It seems UN can't enforce regulations world widely, USA and EU work in their areas only. Others can catch up but somewhat unlikely to do it.