Recently, OpenAI came out with a new language model that automatically synthesizes text, called GPT-2.

It’s disturbingly good.  You can see some examples (cherry-picked, by their own admission) in OpenAI’s post and in the related technical paper.

I’m not going to write about the machine learning here, but about the examples and what we can infer from them.

The scary thing about GPT-2-generated text is that it flows very naturally if you’re just skimming, reading for writing style and key, evocative words.  The “unicorn” sample reads like a real science press release. The “theft of nuclear material” sample reads like a real news story. The “Miley Cyrus shoplifting” sample reads like a real post from a celebrity gossip site.  The “GPT-2” sample reads like a real OpenAI press release. The “Legolas and Gimli” sample reads like a real fantasy novel. The “Civil War homework assignment” reads like a real C-student’s paper.  The “JFK acceptance speech” reads like a real politician’s speech.  The “recycling” sample reads like a real right-wing screed.

If I just skim, without focusing, they all look totally normal. I would not have noticed they were machine-generated. I would not have noticed anything amiss about them at all.

But if I read with focus, I notice that they don’t make a lot of logical sense.

For instance, in the unicorn sample:

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.

Wait a second, “Ovid” doesn’t refer to a “distinctive horn”, so why would naming them “Ovid’s Unicorn” be naming them after a distinctive horn?  Also, you just said they had one horn, so why are you saying they have four horns in the next sentence?

While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”

Wait, unicorns originated from the interbreeding of humans and … unicorns?  That’s circular, isn’t it?

Or, look at the GPT-2 sample:

We believe this project is the first step in the direction of developing large NLP systems without task-specific training data. That is, we are developing a machine language system in the generative style with no explicit rules for producing text.

Except the second sentence isn’t a restatement of the first sentence — “task-specific training data” and “explicit rules for producing text” aren’t synonyms!  So saying “That is” doesn’t make sense.

Or look at the LOTR sample:

Aragorn drew his sword, and the Battle of Fangorn was won. As they marched out through the thicket the morning mist cleared, and the day turned to dusk.

Yeah, day doesn’t turn to dusk in the morning.

Or in the “resurrected JFK” sample:

(1) The brain of JFK was harvested and reconstructed via tissue sampling. There was no way that the tissue could be transported by air. (2) A sample was collected from the area around his upper chest and sent to the University of Maryland for analysis. A human brain at that point would be about one and a half cubic centimeters. The data were then analyzed along with material that was obtained from the original brain to produce a reconstruction; in layman’s terms, a “mesh” of brain tissue.

His brain tissue was harvested…from his chest?!  A human brain is one and a half cubic centimeters?!

So, ok, this isn’t actually human-equivalent writing ability. OpenAI doesn’t claim it is, for what it’s worth — I’m not trying to diminish their accomplishment, that’s not the point of this post.  The point is, if you skim text, you miss obvious absurdities.  The point is OpenAI HAS achieved the ability to pass the Turing test against humans on autopilot.

The point is, I know of a few people, acquaintances of mine, who, even when asked to try to find flaws, could not detect anything weird or mistaken in the GPT-2-generated samples.

There are probably a lot of people who would be completely taken in by literal “fake news”, as in, computer-generated fake articles and blog posts.  This is pretty alarming.  Even more alarming: unless I make a conscious effort to read carefully, I would be one of them.

Robin Hanson’s post Better Babblers is very relevant here.  He claims, and I don’t think he’s exaggerating, that a lot of human speech is simply generated by “low order correlations”, that is, generating sentences or paragraphs that are statistically likely to come after previous sentences or paragraphs:

After eighteen years of being a professor, I’ve graded many student essays. And while I usually try to teach a deep structure of concepts, what the median student actually learns seems to mostly be a set of low order correlations. They know what words to use, which words tend to go together, which combinations tend to have positive associations, and so on. But if you ask an exam question where the deep structure answer differs from answer you’d guess looking at low order correlations, most students usually give the wrong answer.

Simple correlations also seem sufficient to capture most polite conversation talk, such as the weather is nice, how is your mother’s illness, and damn that other political party. Simple correlations are also most of what I see in inspirational TED talks, and when public intellectuals and talk show guests pontificate on topics they really don’t understand, such as quantum mechanics, consciousness, postmodernism, or the need always for more regulation everywhere. After all, media entertainers don’t need to understand deep structures any better than do their audiences.

Let me call styles of talking (or music, etc.) that rely mostly on low order correlations “babbling”. Babbling isn’t meaningless, but to ignorant audiences it often appears to be based on a deeper understanding than is actually the case. When done well, babbling can be entertaining, comforting, titillating, or exciting. It just isn’t usually a good place to learn deep insight.

I used to half-joke that the New Age Bullshit Generator was actually useful as a way to get myself to feel more optimistic. The truth is, it isn’t quite good enough to match the “aura” or “associations” of genuine, human-created inspirational text. GPT-2, though, is.

I also suspect that the “lyrical” or “free-associational” function of poetry is adequately matched by GPT-2.  The autocompletions of Howl read a lot like Allen Ginsberg — they just don’t imply the same beliefs about the world.  (Moloch whose heart is crying for justice! sounds rather positive.)

I’ve noticed that I cannot tell, from casual conversation, whether someone is intelligent in the IQ sense.

I’ve interviewed job applicants, and perceived them all as “bright and impressive”, but found that the vast majority of them could not solve a simple math problem.  The ones who could solve the problem didn’t appear any “brighter” in conversation than the ones who couldn’t.

I’ve taught public school teachers, who were incredibly bad at formal mathematical reasoning (I know, because I graded their tests), to the point that I had not realized humans could be that bad at math — but it had no effect on how they came across in friendly conversation after hours. They didn’t seem “dopey” or “slow”, they were witty and engaging and warm.

I’ve read the personal blogs of intellectually disabled people — people who, by definition, score poorly on IQ tests — and they don’t read as any less funny or creative or relatable than anyone else.

Whatever ability IQ tests and math tests measure, I believe that lacking that ability doesn’t have any effect on one’s ability to make a good social impression or even to “seem smart” in conversation.

If “human intelligence” is about reasoning ability, the capacity to detect whether arguments make sense, then you simply do not need human intelligence to create a linguistic style or aesthetic that can fool our pattern-recognition apparatus if we don’t concentrate on parsing content.

I also noticed, upon reading GPT2 samples, just how often my brain slides from focused attention to just skimming. I read the paper’s sample about Spanish history with interest, and the GPT2-generated text was obviously absurd. My eyes glazed over during the sample about video games, since I don’t care about video games, and the machine-generated text looked totally unobjectionable to me. My brain is constantly making evaluations about what’s worth the trouble to focus on, and what’s ok to tune out. GPT2 is actually really useful as a *test* of one’s level of attention.

This is related to my hypothesis in https://srconstantin.wordpress.com/2017/10/10/distinctions-in-types-of-thought/ that effortless pattern-recognition is what machine learning can do today, while effortful attention, and explicit reasoning (which seems to be a subset of effortful attention) is generally beyond ML’s current capabilities.

Beta waves in the brain are usually associated with focused concentration or active or anxious thought, while alpha waves are associated with the relaxed state of being awake but with closed eyes, before falling asleep, or while dreaming. Alpha waves sharply reduce after a subject makes a mistake and begins paying closer attention. I’d be interested to see whether ability to tell GPT2-generated text from human-generated text correlates with alpha waves vs. beta waves.

The first-order effects of highly effective text-generators are scary. It will be incredibly easy and cheap to fool people, to manipulate social movements, etc. There’s a lot of opportunity for bad actors to take advantage of this.

The second-order effects might well be good, though. If only conscious, focused logical thought can detect a bot, maybe some people will become more aware of when they’re thinking actively vs not, and will be able to flag when they’re not really focusing, and distinguish the impressions they absorb in a state of autopilot from “real learning”.

The mental motion of “I didn’t really parse that paragraph, but sure, whatever, I’ll take the author’s word for it” is, in my introspective experience, absolutely identical to “I didn’t really parse that paragraph because it was bot-generated and didn’t make any sense so I couldn’t possibly have parsed it”, except that in the first case, I assume that the error lies with me rather than the text.  This is not a safe assumption in a post-GPT2 world. Instead of “default to humility” (assume that when you don’t understand a passage, the passage is true and you’re just missing something) the ideal mental action in a world full of bots is “default to null” (if you don’t understand a passage, assume you’re in the same epistemic state as if you’d never read it at all.)

Maybe practice and experience with GPT2 will help people get better at doing “default to null”?

New to LessWrong?

New Comment
35 comments, sorted by Click to highlight new comments since: Today at 12:01 PM

I would like to propose a model that is more flattering to humans, and more similar to how other parts of human cognition work. When we see a simple textual mistake, like a repeated "the", we don't notice it by default. Human minds correct simple errors automatically without consciously noticing that they are doing it. We round to the nearest pattern.

I propose that automatic pattern matching to the closest thing that makes sense is happening at a higher level too. When humans skim semi contradictory text, they produce a more consistent world model that doesn't quite match up with what is said.

Language feeds into a deeper, sensible world model module within the human brain and GPT2 doesn't really have a coherent world model.

When humans skim semi contradictory text, they produce a more consistent world model that doesn't quite match up with what is said.

I felt like something like this happened to me when I was reading some of the "nonsensical" examples in the post, rather than deeming the text outright nonsensical and non-human I just interpreted it as the writer being sloppy.

Me too. I found it strongly reminiscent of reading low grade click bait. Or trying to listen to some woo. Part of it feels like rescuing some food when the bottom of the pan is burned. Part of it is like throwing out models of what state the author's head was in that resolve the text into sense.

I think what makes GPT2 look relatively good is how low the baseline is in many respects. If you tell me 'this is a political acceptance speech' I don't actually expect it to make that much sense. Most of the genre seems to be written by autocomplete anyway.

I’ve noticed that I cannot tell, from casual conversation, whether someone is intelligent in the IQ sense.

I can't really do anything except to state this as a claim: I think a few minutes of conversation with anyone almost always gives me significant information about their intelligence in an IQ sense. That is, I couldn't tell you the exact number, and probably not even reliably predict it with an error of less than 20 (maybe more), but nonetheless, I know significantly more than zero. Like, if I talked to 9 people evenly spaced within [70, 130], I'm pretty confident that I'd get most of them into the correct half.

This does not translate into and kind of disagreement wrt to GPT's texts seeming normal if I just skim them. Or to Robin Hanson's thesis.

I think a few minutes of conversation with anyone almost always gives me significant information about their intelligence in an IQ sense.

Out of curiosity, what do you base this on? Is there anything specific you're looking for? Particular patterns of thought/logic or something more superficial? Not trying to be disparaging, just interested.

I often greatly moderate the way I speak depending on circumstance. I'm looking for the best means of communication, not to impress anyone with vocabulary. Sometimes sounding like the smart one in the room is detrimental, or sounds like condescension. In practice this means I'm often speaking in a way that someone might categorize as 'not high intelligence.'

I also think that since language and communication are a product of one's environment, they aren't necessarily good indicators of intelligence. Simple example: I often see people think that immigrants are not smart because they can't speak English well - never mind that the person might speak 2-3 other languages fluently and have an engineering degree. People often assume those who use a lot of slang are not smart, but that doesn't really mean anything other than they are using the best mode of communication within their community/area.

Personally I also like to throw in profanity to keep people on their toes. I don't want people to get an accurate read on me; but that's probably also just me being a paranoiac. So then I guess also: how do you know people aren't giving you false data on purpose?

Strong enough language barriers make all but the last one mostly useless, but for fluent English speakers, I can tell if they can:

  • Point out things I didn't see yet about things I've been thinking about for a while, or build models in any domain that's new to them.
  • Notice what I'm feeling before I do, and make inferences about how to act.
  • Pick up on new-to-them concepts I'm using and apply them to new situations in real time.
  • Explain things to me clearly and simply that I didn't understand or know about before, responsively to dynamic queries, without extraneous material that I didn't ask about.
  • Explain multi-level causal models or do verbal recursive reasoning at all.
  • Explain anything, ever, even at a single level.
  • Tell a story.
  • Directly articulate their feelings, preferences, or thoughts, verbally or nonverbally.
  • Interact with physical objects, keep track of what's physically going on in the room and who's in it.

These are very roughly in descending order of intelligence level necessary.

Interesting. Thanks!

Now serious question: could we create something like adversarial - generative network by coupling GPT-2 with another neural net which is capable to find flaws in reasoning?

This is somewhat related: https://blog.openai.com/debate/

That is a plausible architecture, and is probably analogous to something humans do. But the "neural net which finds flaws in reasoning" would by itself be a much more complex object than a language model.

I actually find it plausible that what-most-humans-are-doing doesn't involve the second model being much more complicated (this is more of a dig at what I think most humans are doing most of the time, then a point about what the smartest humans are doing)

Like, you generate some babble of things to say and do. You then predict which things would get you yelled at by other humans – for being dangerous, for being socially unsuave, for being logically wrong. My impression from my own introspection is that most of this looks more like vague pattern-matching than anything else, and I have to sit down and "think for real" in order to get more interesting things.

I do notice that when I sit down and "think for-real", I can generate better thoughts that when I do the pattern of "hmm, I remember getting criticized for this sort of thought before, let me try to permutate the thought until I don't feel like I'll get yelled at." So (hopefully?) thinking-for-real exists, but I bet you could make serious progress without it.

This is how I generated my first comment to the post. I generated first variant of the comment and then generate expected number of votes which came out negative, so I decided not to post my first variant. When I generated new comment which had better expected number of votes and decided to post it.

Man, at first I thought you were saying that your top level comment was generated by GPT-2 and I thought you were on a whole nother level of meta.

I would start with dataset of errors in reasoning. Just generate 100 000 texts using GPT-2, put them in the Mechanical turk for marking reasoning errors, and then train another neural net to find logical or other types of errors bad on this dataset.

There's are factual claims in this section:

The point is, I know of a few people, acquaintances of mine, who, even when asked to try to find flaws, could not detect anything weird or mistaken in the GPT-2-generated samples.

There are probably a lot of people who would be completely taken in by literal “fake news”, as in, computer-generated fake articles and blog posts. This is pretty alarming. Even more alarming: unless I make a conscious effort to read carefully, I would be one of them.

I'm a little uncertain of how I would test this since it seems predicated on selection effects around which people you ask. This continues throughout the post.

I'd love to see some kind of data collection about the subject, some of which is suggested in the post like measuring alpha vs. beta waves, or how related this is to cognitive reflection tests.

This post uses the example of GPT-2 to highlight something that's very important generally - that if you're not concentrating, you can't distinguish GPT-2 generated text that is known to be gibberish from non-gibberish.

And hence gives the important lesson, which might be hard to learn oneself if they're not concentrating, that you can't really get away with not concentrating. 

I feel like I learned something very important about my mind - you're right, if I skim these low-level-pattern-matched paragraphs, they read as basically fine to me. Has plausibly quite important implications for AI too. So I've curated this post.

If true, this could lend a lot of weight to an argument that the new writing portion of the SAT was a bad addition to the test.

Yeah I think it was a terrible addition. Best way to do it was to simply write in the 5 paragraph pattern that is expected. Even still it was subject to wildly differing results - scores were demonstrably effected by simple things like reviewers being irritated or tired that day.

It's interesting to read this post, as previously I didn't understand why people found the GPT2 results to be an impressive feat.

I would be really interested to know how many people get actually fooled by them.

I badly want to know how many people get 100% fooled.

Why should we expect future text generators to be any more dangerous or effective than human-generated propaganda? As advertising has advanced, so have our abilities to resist or avoid it. We mute the television when the commercials come on, teach children to analyze them for the underlying message, create fact-checking services, and so on. It seems likely to me that we will develop anti-textgen technology roughly in sync with the development of text generation itself.

Imagine a future publishing company that put out AI generated nonfiction. It might use one AI to generate the text, another to fact-check, another to provide adversarial takes on the claims in the book. Its book on the Civil War will compete with others written by human experts, and eventually by other companies putting out computer-generated nonfiction.

Certainly we'd expect that the KKK would eventually get its hands on such software and create a revisionist, racist Civil War history. But the reading public will receive it in the context of other histories published by "reputable AI publishing firms" and human experts. I don't see why this situation is all that different than the one we have today, just with different means of production.

Certainly we'd expect that the KKK would eventually get its hands on such software and create a revisionist, racist Civil War history. But the reading public will receive it in the context of other histories published by "reputable AI publishing firms" and human experts. I don't see why this situation is all that different than the one we have today, just with different means of production.

Yeah, they already do this so what would change really?

One of the posts about GPT-2 (and 3) that has most stuck with me, and helped me to model what the system is doing.

Nominating because the core lesson has stuck with me and it's a useful reminder not to overestimate human intelligence when we aren't really paying attention.

I keep thinking about the title (/central claim) of this post. I'm not sure it's true, but it's given me a lot to think about. I think this post is useful for understanding GPT etc.

What will be in last place in the race toward human simulation - text, image (ie realistic AI-generated video of the human face or voice), or the body? Whichever is in last place would become the privileged marker of biological humanity.

It seems to me that we're already doing pretty well with AI-generated faces and voices. Probably last place will either be babble quality or robotic body quality.

So an alternative to careful parsing of written text might be simply to insist on hearing words spoken by a human being. Of course, there's a potential for those words to be an AI-generated script. That doesn't put us in much of a different place from listening to human-originating babble, though. In fact, we already parse people (like politicians) for whether they sound like they're just giving us "talking points," following a loose script, or whether they're actually speaking off-the-cuff, with authenticity. This is one reason people liked DJT and disliked Clinton, for example. Weirdly enough, since I bet AI will be able to imitate the Donald long before it can copy Clinton's speaking style.

So count me unconvinced that the babble problem is either a genuinely new issue, or that System-2 careful parsing for deep structure is our only solution.

effortless pattern-recognition is what machine learning can do today, while effortful attention, and explicit reasoning (which seems to be a subset of effortful attention) is generally beyond ML’s current capabilities.

Just to be clear, are you or aren't you (or neither) saying that this is only a matter of scale?

It seems to me like you're saying it could indeed only be a matter of scale, we're just in the stage of figuring out what the right dimension to amp up is ("be coherent for longer").

Absent unusual cases such as traumatic brain injury, there is no clear dividing line between "human who is concentrating" and "human who isn't concentrating". A normal human can switch back and forth between these states with no warning, either for well-motivated reasons ("wait, that doesn't make sense"/"bored now") or for essentially none. So I think "humans who aren't concentrating" is about transitory state, and "is/isn't a general intelligence" is about overall capacity; any equating between those two sides is a category error.