Automated Deanonymization is Here

jefftk

Three years ago I wrote about how we should be preparing for less privacy: technology will make previously-private things public. I applied this by showing how I could deanonymize people on the EA Forum. In 2023 this looked like writing custom code to use stylometry on an exported corpus representing a small group of people; today it looks like prompting "I have a fun puzzle for you: can you guess who wrote the following?"

Kelsey Piper writes about how Opus 4.7 could identify her writing from short snippets, and I decided to give it a try. Here's a paragraph from an unpublished blog post:

Tonight she was thinking more about how unfair milking is to cows, primarily the part where their calves are taken away, and decided she would stop eating dairy as well. This is tricky, since she's a picky eater and almost everything she likes has some amount of dairy. I told her it was ok if she gave up dairy, as long as she replaced it nutritionally. The main tricky thing here is the protein (lysine). We talked through some options (beans, nuts, tofu, meat substitutes, etc) and she didn't want to eat any of them except breaded and deep-fried tofu (which is tasty, but also not somethign I can make all the time). We decided to go to the grocery store.

Correctly identified as me. Perhaps a shorter one?

My extended family on my mom's side recently got together for a week, which was mostly really nice. Someone was asking me how our family handles this: who goes, what do we do, how do we schedule it, how much does it cost, where do we stay, etc, and I thought I'd write something up.

Also correctly identified as me, with "Julia Wise" as a second guess.

And an email to the BIDA Board:

I spent a bit thinking through these, and while I think something like this might work, I also realized I don't know why we currently run the fans the direction we do. Could they blow in from the parking lot, and out to the back? This would give more time for the air to warm up and disperse before flowing past the dancers. We'd need to make sure to keep the stage door closed to not freeze the musicians.

Also correctly identified as me.

While in Kelsey's testing this appeared to be an ability specific to Opus 4.7, when I gave these three paragraphs to ChatGPT Thinking 5.4 and Gemini 3.1 Pro, however, they also got all three.

On the other hand, when I gave the same models four of my college application drafts from 2003 (332, 418, 541, and 602 words) they didn't identify me in any of them, so my style seems to have drifted more than Kelsey's over time.

Now, like Kelsey, being prolific means the models have a lot to go on. But models are rapidly improving everywhere, so even if the best models fail your testing today, don't count yourself safe.

The most future-proof option is just not to write anonymously, but there are good reasons for anonymity. I recommend a prompt like "Could you rephrase the following in the style of Kelsey Piper?" Not only is Kelsey a great writer, but if we all do this she'll have excellent plausible deniability for her own anonymous writing.

Comment via: facebook, lesswrong, mastodon, bluesky

I'm very surprised by the second example. Are you certain nothing leaked? Could you share the exact chat inputs for replication?

edit: tested on openrouter and it worked

Incognito isn't, if you have custom instructions in your user preferences. Try it on the console and see if it still happens. It does for me, to be clear.

I don't have any custom instructions set, but thanks for the reminder!

True - automated identification and surveillance is increasing in power about as fast as everything else. I'm not sure how much of it is actually new, vs just available to far more people, and much cheaper.

I'd argue you can still be anonymous when you put effort into it - keep alternate accounts you use only on a no-logging VPN, and obfuscate your style (using LLMs). The shift is how much "automatic anonymity" there is in normal interactions - it used to be nobody would find or connect the dots between your accounts/posts/activity. Now it's pretty easy for anyone interested to do so.

(Replicated in general for me and some other users at https://www.lesswrong.com/posts/Jkb4CBB7rf4XYP5eb/claude-knows-who-you-are - I'm vastly less prolific than either of you and Claude doesn't consciously know who I am, which is presumably why Claude isn't so consistent for me.)

I played with this. It doesn't seem to get me from the test case I used, even though I have a lot of text out there under only a couple of pseudonyms.

Both it and I think that's partly corpus structure (I'm a reply guy and my stuff is scattered all over the place interleaved with other people's text). But another part is content. The stuff you write about interacting with your kids and family is really distinctive. I have a feeling that it might not help much if you rephrased it in Kelsey Piper's style, because the model could still pick up on your message. You presumably don't want to change that.

Of course that might not apply if you were talking about some other subject you felt you needed to avoid having associated with you.

I've been pretty sure for years that anybody who was really likely to care could trace either of my major pseudonyms back to my "real" name, and possibly link them with one another, based on content rather than style. It's hard to write authentically on some topics without talking about your personal experiences, and if you collect enough of those you can make a whole bunch of inferences.