(Prompted by the post: On Media Synthesis: An Essay on The Next 15 Years of Creative Automation, where Yuli comments "Deepfakes exist as the tip of the warhead that will end our trust-based society")
There are answers to the problem of deepfakes. I thought of one, very soon after first hearing about the problem. I later found that David Brin spoke of the same thing 20 years ago in The Transparent Society. The idea seems not to have surfaced or propagated at all in any of the deepfake discourse, and I find that a little bit disturbing. There is a cartoon Robin Hanson that sits on my shoulder who's wryly whispering "Fearmongering is not about preparation" and "News is not about informing". I hope it isn't true. Anyway.
In short, if we want to stay sane, we will start building cameras with tamperproof seals that sign the data they produce with a manufacturer's RSA signature to verify that the footage comes directly from a real camera, and we will require all news providers to provide a checked (for artifacts of doctoring and generation), verified, signed (unedited) online copy of any footage they air. If we want to be extra thorough (and we should), we will also allocate public funding to the production of disturbing, surreal, inflammatory, but socially mostly harmless deepfakes to exercise the public's epistemic immune system, ensuring that they remain vigilant enough to check the national library of evidence for signed raws before acting on any new interesting video. I'm sure you'll find many talented directors who'd jump at the chance to produce these vaccinating works, and I think the tradition will find plenty of popular support, if properly implemented. The works could be great entertainment, as will the ensuing identification of dangerously credulous fools.
Technical thoughts about those sealed cameras
The camera's seal should be fragile. When it's broken (~ when there is any slight shift in gas pressure or membrane conductivity, when the components move, when the unpredictable, randomly chosen build parameters fall out of calibration), the camera's registered private key will be thoroughly destroyed, with a flash of UV, current, and, ideally, magnesium fire, so that it cannot be extracted and used to produce false signatures. It may be common for these cameras to fail spontaneously. We can live with that. Core components of cameras will mostly only continue to get cheaper.
I wish I could discuss practical processes for ensuring, through auditing, that the cameras' private keys are being kept secret during manufacture. We will need to avoid a situation where manufacturing rights are limited to a small few and the price of authorised sealed cameras climbs up into unaffordable ranges, making them inaccessible to the public and to smaller news agencies, but I don't know enough about the industrial process to discuss that.
(Edit: It occurs to me that the manufacturing process would not have to inject a private key from the outside, they would never need to be given access to the private key at all. Sealed camera components can be given a noise generation module to generate their key themselves after manufacture is complete. They can then communicate their public key to the factory, and it will be publicly registered as one of the manufacturer's cameras. Video signatures can be verified by finding their pubkey in the manufacturer's registry.)
There's an attack I'm not sure how to address, too. Very high-resolution screens and lenses could be used to show a sealed camera a scene that doesn't exist. The signature attests that the camera genuinely sees it, but it still isn't real. I'll name it the Screen Illusion Analogue Hole Attack (SIAHA).
It might be worth considering putting some kind of GPS chip inside the sealed portion of the camera so that the attack's illusion screen would need to be moved to the location where the fake event was supposed to have happened, which would limit the applications of such an attack, but GPS is currently very easy to fool, so we'll need to find a better location-verification technology than GPS (This is not an isolated need)
I initially imagined that a screen of sufficient fidelity, framerate, and dynamic range would be prohibitively expensive to produce. Now it occurs to me that the VR field aspires to make such screens ubiquitous. [Edit: in retrospect, I think the arguments against commercial VR-based SIAHA I've come up with here are pretty much fatal. It wont happen. There's too much of a difference between what cameras can see and what screens for humans can produce. If a SIAHA screen can be made, it'll be very expensive.]
- Resolution targets may eventually be met.
- A point in favour: maximum human-perceptible pixel density will be approached. Eye tracking will open the way to foveated rendering; wherein only the small patch of the scene the user is looking directly at will be rendered at max resolution. Current rendering hardware is already beefy enough to support foveated rendering, as it allows us to significantly down-spec the resolution of everything the user isn't looking at. The hardware will not necessarily be made to accept streaming 4k raw footage fresh out of the box (more like 720p footage and another patch of 720p footage for the foveal patch), but the pixels will all be there, the screen will be dense enough, it will be very possible to produce hardware that will do it, if not by modifying a headset, then by modifying its factory.
- A point against: Video cameras will sometimes want to go beyond retinal pixel density for post-production digital zoom. They will want to capture much more than a human standing in their position can see, and I can see no reason consumer-grade screens should ever come to output more detail than a human can see.
- Framerate targets will be met because if you dip below 100fps in VR, players puke. It's a hard requirement. There will never be a commercial VR headset that couldn't do it.
- point against: If the framerate of the screen is not much much higher than the framerate of the camera, if they're merely similar, unless they're perfectly synchronised, frame skips or tears will occur.
- Realistic dynamic range might take longer than the other two, but there will be a demand for it... though perhaps we will never want a screen that can flash with the brightness of the sun. If cameras of the future can record that level of brightness, that may be some defence against this kind of attack, at least for outdoor scenes.
- Color accuracy may remain difficult to replicate with screens. Cameras already accidentally record infra red light. Screens for humans will never need to produce infra-red. I'm not sure how current cameras' color accuracy compares to the human eye... I suspect it's higher, but I'm not able to confirm that.
[in conclusion, I think cheap SIAHA is unlikely]
In summary: A combination of technologies, laws, and fun social practices can probably mostly safeguard us against the problem of convincingly doctored video evidence. Some of the policing and economic challenges are a bit daunting, but not obviously insoluble.
It's unclear to me whether deepfakes is going to be that big of an issue in the first place. Written text can already be "faked" - that is, anyone can claim to have witnessed anything and write a report saying so. Photographs can likewise already be altered, staged, or edited in a misleading way.
Society solves the problem by trusting different claims depending on how reliable the source is considered to be. If an established newspaper publishes an investigative report citing anonymous sources, then they are generally trusted because they have staked their reputation on it, even though the whole report could have been made up. But if your crazy neighbor makes the same claim, they are much less likely to be widely believed.
It seems to me that at worst, deepfakes will only take us to a point where photos are about as trustworthy as the written word. But we seem to mostly already operate fine in a world where the written word is trivial to "fake". I'm sure that photography and video being harder to fake makes some contribution to it being easier to trust claims, but my intuition is that most trustworthiness still comes from the sources themselves being considered trustworthy.
On a fundamental level, I agree. However, there is some aspects of this technology that makes me wonder if things might not be a tad bit different and past experiences may not accurately predict the future. Artificial intelligence is a different beast from what we are used to, that is to say "mechanical effort".
When it comes to multimedia deepfakes, the threat is less "people believe everything they see" and more "people no longer trust anything they see". The reason why we trust written text and photographs is because most of us have never dealt with faked letters and most altered photos are very obviously altered. What's more, there are consequences for doing so. When I was a child, I sometimes had my senile grandmother write letters detailing why I was "sick" and couldn't come to school or had her sign homework under my father's name. Eventually, the teachers found out and stopped trusting any letter I brought in, even if they were legitimately from my father.
There's no reason to trust manufacturers or governments in this. There's already plenty of fakery in text and photographic reporting, and political topics are some of the worst cases for it. Basically, these are human reliability problems - why would you expect better for video technology, and why would you expect human institutions to solve them now, when they haven't for hundreds of years?
The only "solutions" (really more "viable responses" than "solutions") I see are:
Most impotant news is not about pictures. You can't prove anything that really matters with a picture. To the extent that this will mean that people will believe picture/video based proof of arguments less, that's a good thing.
Economist articles without pictures are more high quality news that the latest scandal about the speed that person A used to touch person B.
Alternative: notarised alibis as a service. The anonymous (and dubiously valid) video received by the broadcaster has me red handed in the library with the obvious melee weapon, but MY signed and notorised personal cloud of things has me harmlessly drinking in a pub at the same time in a different location, which proves beyond reasonable doubt that the deepfake is fake.
In other words: it’s a tough call ensuring all the wannabe bad actors have adequately sealed and trusted cameras, at which point panoptical surveillance by a vaguely trusted system starts to seem like a good alternative.
(This feels very Brin, so I may have stolen it from him)
Depending how trustworthy the surveillance is, this may merely be an express route to a different dystopia.
The cryptographic component of the camera would be manufactured without a private key in place. Instead, there's a write-only memory section that any number of third parties can write their private keys to. That way, we can rely on existing legally trusted institutions like notaries, and nobody should be able to extract too much rent.
This raises the question of how the notary can know that the "cryptographic component" in front of them isn't some scammer's usb stick. Require each cryptographic component to have many compartments, each of which is separate from the others. Breaking the seal on one is to wipe its memory, but not its architecture. Given a supposed cryptographic component, the notary randomly selects a few compartments to destructively verify their architecture. The remaining compartments are told one fresh private key each, and the notary publishes (and signs with his personal private key) that whoever can authenticate himself with at least x of these particular public keys is a trusted camera. To produce fake images, an adversary would have to have subverted at least x compartments that the notary didn't destructively verify. A component can be shipped around to a few notaries so long as nobody breaks enough compartments that fewer than any notary's x are left.
Of course, "Don't roll your own crypto." says all of this is to be interpreted as conjecture.
Great takes on all this, better than a typical reply.
Yes. But maybe instead the physical sealing we could use a blockchain, which registers time and geotag of the recording?
A Blockchain cannot validate information received if the hardware is not secure, so it cannot replace the Fail-Deadly Key.
A Blockchain, on the other hand, could timestamp the hash as soon as it is generated to ensure it was created before a certain point in time.
May be use something like a projected outside crypto timestamp, by the photo flash?