Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
6 comments, sorted by Click to highlight new comments since: Today at 4:48 PM
New Comment

I often hear about deepfakes--pictures/videos that can be entirely synthesized by a deep learning model and made to look real--and how this could greatly amplify the "fake news" phenomenon and really undermine the ability of the public to actually evaluate evidence.

And this sounds like a well-founded worry, but then I was just thinking, what about Photoshop? That's existed for over a decade, and for all that time it's been possible to doctor images to look real. So why should deepfakes be any scarier?

Part of it could be that we can fake videos, not just images, but that can't be all of it.

I suspect the main reason is that in the future, deepfakes will also be able to fool experts. This does seem like an important threshold.

This raises another question: is it, in fact, impossible to fool experts with Photoshop? Are there fundamental limitations on it that prevent it from being this potent, and this was always understood so people weren't particularly fearful of it? (FWIW when I learned about Photoshop as a kid I freaked out with Orwellian visions even worse than people have with deepfakes now, and pretty much only relaxed out of conformity. I remain ignorant about the technical details of Photoshop and its capabilities)

But even if deepfakes are bound to cross this threshold (not that it's a fine line) in a way Photoshop never could, aren't there also plenty of things which experts have had and do have trouble classifying as real/fake? Wikipedia's list of hoaxes is extensive, albeit most of those fooled the public rather than experts. But I feel like there are plenty of hoaxes that lasted hundreds of years before being debunked (Shroud of Turin, or maybe fake fossils?).

I guess we're just used to seeing less hoaxes in modern times. Like, in the past hoaxes abounded, and there often weren't the proper experts around to debunk them, so probably those times warranted a greater degree of epistemic learned helplessness or something. But since the last century, our forgery-spotting techniques have gotten a lot better while the corresponding forgeries just haven't kept up, so we just happen to live in a time where the "offense" is relatively weaker than the "defense", but there's no particular reason it should stay that way.

I'm really not sure how worried I should be about deepfakes, but having just thought through all that, it does seem like the existence of "evidence" in political discourse is not an all-or-nothing phenomenon. Images/videos will likely come to be trusted less, maybe other things as well if deep learning contributes in other ways to the "offense" more than the "defense". And maybe things will reach a not-so-much-worse equilibrium. Or maybe not, but the deepfake phenomenon certainly does not seem completely new.

Part of it could be that we can fake videos, not just images, but that can't be all of it.

There was a handful of news about things like a company being scammed out of a lot of money (the voice of the CEO was faked over a phone). This is a different issue than "the public" being fooled.

In reasoning about AGI, we're all aware of the problems with anthropomorphizing, but it occurs to me that there's also a cluster of bad reasoning that comes from an (almost?) opposite direction, where you visualize an AGI to be a mechanical automaton and draw naive conclusions based on that.

For instance, every now and then I've heard someone from this community say something like:

What if the AGI runs on the ZFC axioms (among other things), and finds a contradiction, and by the principle of explosion it goes completely haywire?

Even if ZFC is inconsistent, this hardly seems like a legitimate concern. There's no reason to hard-code ZFC into an AI unless we want a narrow AI that's just a theorem prover (e.g. Logic Theorist). Anything close to AGI will necessarily build rich world models, and from the standpoint of these, ZFC wouldn't literally be everything. ZFC would just be a sometimes-useful tool it discovers for organizing its mathematical thinking, which in turn is just a means toward understanding physics etc. better, much as humans wouldn't go crazy if ZFC yields a contradiction.

The general fallacy I'm pointing to isn't just "AGI will be logic-based" but something more like "AGI will act like a machine, an automaton, or a giant look-up table". This is technically true, in the same way humans can be perfectly described as a giant look-up table, but it's just the wrong level of abstraction for thinking about agents (most of the time) and can lead one to silly conclusions if one isn't really careful.

For instance my (2nd hand, half-baked, and lazy) understanding of Penrose's arguments are as follows: Godel's theorems say formal systems can't do X, humans can do X, therefore human brains can't be fully described as formal systems (or maybe he references Turing machines and the halting problem, but the point is still similar). Note that this makes sense as stated, the catch is that

"the human brain when broken down all the way to a Turing machine" is what the Godel/Turing stuff applies to, not "the human brain at the level of abstraction we use to think about it (in terms of 'thoughts', 'concepts', etc.)". It's not at all clear that the latter even resembles a formal system, at least not one rich enough that the Godel/Turing results apply. The fact that it's "built out of" the former means nothing on this point: the proofs of PA > 10 characters do not constitute a formal system, and fleshing out the "built out of" probably requires solving a large chunk of neuroscience.

Again, I'm just using straw-Penrose here as an example because, while we all agree it's an invalid argument, this is mostly because it concludes something LW overwhelmingly agrees is false. When taken at face value, it "looks right" and the actual error isn't completely obvious to find and spell out (hence I've left it in a black spoiler box). I claim that if the argument draws a conclusion that isn't obviously wrong or even reinforces your existing viewpoint, then it's relatively easy to think it makes sense. I think this is what's going on when people here make arguments for AGI dangers that appeal to its potential brittleness or automata-like nature (I'm not saying this is common, but I do see it occasionally).

But there's a subtlety here, because there are some ways in which AGI potentially will be more brittle due to its mathematical formulation. For instance, adversarial examples are a real concern, and those are pretty much only possible because of the way ML systems output numerical probabilities (from these the adversary can infer the gradient of the model's beliefs, and run along it).

And of course, as I said at the start, an opposing fallacy is thinking AGI will be more human-like by default. To be clear I think the fallacy I'm gesturing at here is the less dangerous one in the worst case, but more common on LW (i.e. > 0).

K-complexity: The minimum description length of something (relative to some fixed description language)

Cake-complexity: The minimum description length of something, where the only noun you can use is "cake"

[This comment is no longer endorsed by its author]Reply

Are we allowed to I-am-Groot the word "cake" to encode several bits per word, or do we have to do something like repeat "cake" until the primes that it factors into represent a desired binary string?

(edit: ah, only nouns, so I can still use whatever I want in the other parts of speech. or should I say that the naming cakes must be "cake", and that any other verbal cake may be whatever this speaking cake wants)

To be clear I unendorsed the idea about a minute after posting because it felt like more of a low-effort shitpost than a constructive idea for understanding the world (and I don't want to make that a norm on shortform). That said I had in mind that you're describing the thing to someone who you can't communicate with beforehand, except there's common knowledge that you're forbidden any nouns besides "cake". In practice I feel like it degenerates to putting all the meaning on adjectives to construct the nouns you'd want to use. E.g. your own "speaking cake" to denote a person, "flat, vertical, compartmentalizing cakes" to denote walls. Of course you'd have to ban any "-like" and "-esque" constructions and similar things, but it's not clear to me if the boundaries there are too fuzzy to make a good rule set.

Actually, maybe this could be a board game similar to charades. You get a random word such as "elephant", and you write down a description of it with this constraint. Then the description is gradually read off, and your team tries to guess the word based on the description. It's inverse to charades in that the reading is monotonous and w/o body language (and could even be done by the other team).

New to LessWrong?