The vibe I get, from the studies described, is reminiscent of the pre-guinea-pig portion of the story of Scott and Scurvy. That is, there are just enough complications at the edges to turn everything into a terrible muddle. In the case of scurvy, the complications were that which foods had vitamin C didn't map cleanly to their ontology of food, and vitamin C was sensitive to details of how foods were stored that they didn't pay attention to. In the case of virus transmissibility, there are a bunch of complications that we know matter sometimes, which the studies mostly fail to track, eg:
I think that ultimately viruses are a low-GDP problem; after a few doublings we'll stop breathing unfiltered air, and stop touching surfaces that lack automated cleaning, and we'll come to think of these things as being in the same category as basic plumbing.
What they don't do is filter out every web page that has the canary string. Since people put them on random web pages (like this one), which was not their intended use, they get into the training data.
If that is true, that's a scandal and a lawsuit waiting to happen. The intent of including a canary string is clear, and those canary strings are one of very few mechanism authors have to refuse permission to use their work in training sets. In most cases, they will have done that for a reason, even if that reason isn't related to benchmarking.
While LW is generally happy to have our public content included in training sets (we do want LLMs to be able to contribute to alignment research after all), that does not extend to posts or comments that contain canary strings, or replies to posts or comments that contain canary strings.
Canary strings are tricky; LLMs can learn them even if documents that contain the canary string are filtered out of the training set, if documents that contain indirect or transformed versions of the canary string are not filtered. For example, there are probably documents and web pages that discuss the canary string but don't want to invoke it, which split the string into pieces, ROT-13 or base64 encode it, etc.
This doesn't mean that they didn't train on benchmarks, but it does offer a possible alternative explanation. In the future, labs that don't want people to think they trained on benchmark data should probably include filters that look for transformed/indirect canary strings, in addition to the literal string.
Ok, to state what probably should be obvious but which in practice typically isn't: If the US does have a giant pile of drones, or contracts for a giant pile of drones, this fact would certainly be classified. And there is a strong incentive, when facing low-end threats that can be dealt with using only publicly-known systems, to deal with them using only publicly-known systems. The historical record includes lots of military systems that were not known to the public until long after their deployment.
Does that mean NATO militaries are on top of things? No. But it does mean that, as civilian outsiders, we should mostly model ourselves as ignorant.
Moderator warning: This is well outside the bounds of reasonable behavior on LW. I can tell you're in a pretty intense emotional state, and I sympathize, but I think that's clouding your judgment pretty badly. I'm not sure what it is you think you're seeing in the grandparent comment, but whatever it is I don't think it's there. Do not try to write on LW while in that state.
When I use LLM coding tools like Cursor Agent, it sees my username in code comments, in paths like /home/myusername/project/..., and maybe also explicitly in tool-provided prompts.
A fun experiment to run, that I haven't seen yet: If instead of my real username it saw a recognizably evil name, eg a famous criminal, but the tasks it's given is otherwise normal, does it sandbag? Or, a less nefarious example: Does it change communication style based on whether it recognizes the user as someone technical vs someone nontechnical?
Entering a conversation with someone who is literally wearing a "Might be Lying" sign seems analogous to joining a social-deception game like Werewolf. Certainly an opt-in activity, but totally fair game and likely entertaining for people who've done so.
It will not work. Or rather, if you have a way to make it work, you should collect the bug bounty for a few tens of thousands of dollars, rather than use it for a prank. Browser makers and other tech companies have gone to great lengths to prevent this sort of thing, because it is very important for security that people who go to sites that could have login pages never get redirected to lookalike pages that harvest their passwords.
I occasionally incidentally see drafts by following our automated error-logging to the page where the error occurred, which could be the edit-post page, and in those cases I have looked enough to check things like whether it contains embeds, whether collaborative editing is turned on, etc. In those cases I try not to read the actual content. I don't think I've ever stumbled onto a draft dramapost this way, but if I did I would treat it as confidential until it was published. (I wouldn't do this with a DM.)
No, that's not a working mechanism; it isn't reliable enough, or granular enough. Users can't add their own content to robots.txt when they submit it to websites. Websites can't realistically list every opted-out post in their robots.txt, because that would make it impractically large. It is very common to want to refuse content for LLM training, without also refusing search or cross-site link preview. And robots.txt is never preserved when content is mirrored.