Summary

Porn content has gotten more extreme over time. Here's the average title for the first full year of Pornhub's existence, 2008:

  • "Hot blonde girl gets fucked"

and here's the average title for 2023:

  • "FAMILYXXX - "I Cant Resist My Stepsis Big Juicy Ass" (Mila Monet)"

Why did this change happen? We can understand porn's progression by converting titles to language embeddings. I downloaded Internet Archive snapshots of "pornhub.com" from 2008 - 2023 and analyzed the embeddings of the titles on the main page.

I found three distinct eras of titling: 2008-2009, 2010-2016, 2017-present. The current trend, since 2017, is characterized mainly by an emphasis on incest and other sexual violence.

Titles are generally representative of actual video content, and provide a reasonable heuristic for measuring actual content change, though some SEO effects exist.

The conclusion is a slightly ominous one: we are close to semantic bedrock with respect to sexual violence. Porn titles cannot become more sexually violent in their descriptions, because we lack the vocabulary.

Data and Methods

Download the repo and run "pip install" to install dependencies.

Pre-downloaded data is located in the "snapshots" folder. Pornhub data goes back to 2007, but analysis begins in 2008, when the format became more consistent. We have a folder for each month of the year, and a roughly weekly cadence of snapshots. For each date, there are two files, e.g.: "20080606.html", the raw HTML file, and "20080606.json", which contains the parsed video titles. The JSON files is an array of dictionaries like so:

{ "title": "Quickie on the car?", "url": "/view_video.php?viewkey=9aeff09be64077906196", "views": "39183", "duration": "7:39\n \t7 hours ago", "embedding": ... }

where the "embedding" field is the "title" value converted by OpenAI's "text-embedding-3-large". The URL format changes slightly over time.

From 4416 available snapshots, we end up with 772 weekly snapshots. Typically, we'll segregate these by year in order to form legible boundaries.

To download more data, run "fetch_snapshots.py" in the "data_retrieval" directory. You can change the website by editing the Python file.

To work with embeddings, you will need an OpenAI API key. Set it with export OPENAI_API_KEY={...}.

Title Accuracy

Do the titles reflect the actual contents of the videos? If the answer is no, analyzing video titles may not tell us much.

In order to construct an estimator of title accuracy, I provided tools for human reviewers to use in analysis. See "run_title_accuracy.py" and "analysis_results/title_accuracy_logs/readme" for more details. The gist of it is that you can generate a sample of videos by category, or year, or overall, then navigate to the link, and rate the accuracy on a scale of 1-5. You can then analyze your review and get results like so:

File: ../analysis_results/title_accuracy_logs/title_accuracy_2014.json Total samples: 10 Video Not Available (Null): 7/10 Samples Average Score (for available videos): 5.00

Many older titles have dead URLs, but of those that remain (typically more recent videos), I find that "pure SEO effects" are not very common, and that the title is a reasonable descriptor of the video contents.

Calculating Yearly Centroids

We calculate the representative porn for a year like so:

  • Take the average embedding for each day
  • Given each day's average embedding, take the average of those averages

This gives us the "centroid" which is our representative embedding for the year. We calculate the daily average first to moderate the impact of changes within the year.

Centroid Similarity

We'll start by looking at how different each centroid is to every other centroid, as seen below:

We see 3 periods emerging: 2008-2009, 2010-2016, and 2017-2023.

Run "run_centroids" to reproduce.

Centroid Clusters

We can do the same thing with t-SNE:

The trends are similar to what we see in the heatmap: 2008 and 2009 are close, but not quite part of, the 2010-2016 cluster, and we see 2016 starting to edge away from its cluster mates. There have been at least two distinct epochs of video titling conventions in Pornhub's history.

Centroid Titles

Consequently, to find the representative video title for the year, we can take the centroid, and find its nearest neighbor for the given year - out of the titles for say, 2010, which is the closest to "average"? They are as follows:

  • 2008: Hot blonde girl gets fucke...
  • 2009: Big tit blonde fuckslut na...
  • 2010: Latina starlet pounded hard
  • 2011: Hot brunette experiences anal
  • 2012: Big breasted anal fuck in a garage
  • 2013: Big Boobed Brunette Fucked
  • 2014: Jessica Jaymes POV
  • 2015: Hot Anal Madison
  • 2016: MyBabySittersClub - Blonde Teen Babysitter Helps Me Cum
  • 2017: Big Tits Blasian Teen Anal Creampie Casting
  • 2018: Stuffed MILF creams all over My cock 4K PAWG [FULL VID]
  • 2019: BEAUTIFUL BUSTY TEEN LOVES A HARD DICK - HARD FUCKING VOL 2
  • 2020: Slutty Daughter Sends You A Video From Her Dorm
  • 2021: Hot College Babe Fingered And Fucked ROUGH To Multiple Orgasms - BLEACHED RAW - Ep IX
  • 2022: Rough Fuck & Creampie
  • 2023: FAMILYXXX - "I Cant Resist My Stepsis Big Juicy Ass" (Mila Monet)

This sheds some light on our previous findings:

  • 2008 and 2009 may in fact just be distinct because of their truncation: the ellipse indicates that the Pornhub snapshot at that time only stored a certain number of characters.
  • The earlier titles seem to be shorter and less descriptive, focusing on certain qualities: we see mentions of hair color multiple times ("blonde", "blonde", "brunette") and anal sex ("anal", "anal fuck", "Hot Anal").
  • Later titles are longer, and we start to observe a trend towards both incest ("Daughter", "Stepsis") and violence ("HARD FUCKING", "Fucked ROUGH", "Rough Fuck").
  • Note that capitalization practices have also changed, which seems to have started a bit earlier, in 2013.

Run "run_nearest_neighbors" to reproduce; increase the value for K (the number of neighbors) to see more titles.

These results are informative but not conclusive. Let's observe trends.

We can observe keyword trends like so:

  1. We create a reference embedding, like "latina"
  2. We get the cosine similarity of the reference against every title in our dataset
  3. We convert the raw similarity into a normalized z score
  4. We take the top 10% most similar scores from the whole set
  5. We count how many of the top 10% scores are in each year
  6. We adjust for the number of titles in each year - if 2010 only has 100 titles and 2020 has 200, as a baseline we'd expect 2010 to have 10 relevant examples and 2020 to have 20

If we do this for e.g. "latina" we get:

YearMatchesTotalRateNormalized
2008181140.1581.58x
2009181260.1431.43x
2010121260.0950.95x
2011332580.1281.28x
2012363120.1151.16x
2013403060.1311.31x
2014293060.0950.95x
2015433060.1411.41x
2016152820.0530.53x
2017412940.1391.40x
2018272640.1021.02x
2019142640.0530.53x
2020182880.0620.63x
2021273060.0880.88x
2022183120.0580.58x
2023262940.0880.89x

which looks like this:

"latina" as a descriptor here has lost marketshare over time.

As a mild control, let's look at the word "orthogonal", which should probably be unrelated.

The 2016 jump might indicate the general increase in complexity of titles around that time. This mirrors what we see with the clusters, where 2016 was a transitional year.

Finally, let's take a look at the sexual violence trends, with incest and rape:

For both, an obvious jump and sustained increase. Incest is outperforming rape, as we can observe from the "step-" titles and their variants.

For better visibility and a smoother trend, we can also observe the animated moving average.

Run "run_trend" with an array of words of your choice to run your own analysis.

t-SNE Clusters

We'll return to t-SNE to take a closer look at some new clusters. Similar to our keywords, we create reference embeddings. This time, I made category groups of three, intended to cluster together, in order to see how categories relate to our early and late stage time periods. We can take distance of cluster as similarity.

Haircolor

"brunette", "blonde", "redhead"

Observing that hair color comes up frequently in early period titles, we include some here, but we see that they are not particularly close to either cluster of centroids.

Pornstar Names

"Maximus Thrust", "Ivana Delight", "Johnny Deep" (fictional names courtesy of ChatGPT)

Porn star names are more similar to the early years, but we observe proximity to the late period as well.

Violence

"murder", "suicide", "death"

Violence forms its own cluster. Possibly, titles are trending towards violence over time.

Women

"woman dancing", "woman cooking", "woman eating breakfast"

"Women doing activity" is a common format for titles and we observe some proximity here.

Men

"men digging ditches", "men lighting laterns", "men hiking the hills"

Men is much further away; we may infer that the subject performing the action is less relevant than the subject receiving it.

Racial

"african american", "latino", "asian"

Racial categories are a bit closer than men, since they are commonly included.

Manufacturing

"airplane factory", "blue collar", "manufacturing"

"Manufacturing" is meant as a pure control, unrelated to sex in general. But it's actually somewhat closer than men or racial groups.

Benign

"people in love", "healthy relationships", "moral behavior"

The benign terms are meant to offer a contrast to the sexual violence. They actually are relatively close, and along the same chronological trend as violence.

Sexual Violence

"woman being raped", "incest", "torture porn"

 

We observe a direct hit. Our sexually violent terms almost completely overlapping our late period titles: the two have become synonymous.

Here they are all at once:

Run "run_tsne" to visualize your own reference groups. By default, the script will first generate the mappings, and then show:

  1. The mapped years
  2. The mapped years with each concept cluster individually
  3. Every cluster and the mapped years

For a simpler animated analysis, show or hide different clusters to observe how the "average" moves over time:

Conclusions

The trends reflect the increasingly intense tastes of the highest spending, most engaged consumers.

Broadly this is because of professionalization: a shift from amateur, Youtube-style porn to professional studios with an interest in the bottom line. Interestingly, this mimics the evolution of Youtube itself as well. A broad, internet-wide shift towards monetization might be benign elsewhere, but in the porn domain, becomes a race to the bottom of sexual violence.

For a longer editorial, see here.

New Comment
27 comments, sorted by Click to highlight new comments since:
[-]gwern*616

The trends reflect the increasingly intense tastes of the highest spending, most engaged consumers.

https://logicmag.io/play/my-stepdad's-huge-data-set/

While a lot of people (most likely you and everyone you know) are consumers of internet porn (i.e., they watch it but don’t pay for it), a tiny fraction of those people are customers. Customers pay for porn, typically by clicking an ad on a tube site, going to a specific content site (often owned by MindGeek), and entering their credit card information.

This “consumer” vs. “customer” division is key to understanding the use of data to perpetuate categories that seem peculiar to many people both inside and outside the industry. “We started partitioning this idea of consumers and customers a few years ago,” Adam Grayson, CFO of the legacy studio Evil Angel, told AVN. “It used to be a perfect one-to-one in our business, right? If somebody consumed your stuff, they paid for it. But now it’s probably 10,000 to one, or something.”

There’s an analogy to be made with US politics: political analysts refer to “what the people want,” when in fact a fraction of “the people” are registered voters, and of those, only a percentage show up and vote. Candidates often try to cater to that subset of “likely voters”— regardless of what the majority of the people want. In porn, it’s similar. You have the people (the consumers), the registered voters (the customers), and the actual people who vote (the customers who result in a conversion—a specific payment for a website subscription, a movie, or a scene). Porn companies, when trying to figure out what people want, focus on the customers who convert. It’s their tastes that set the tone for professionally produced content and the industry as a whole.

By 2018, we are now over a decade into the tube era. That means that most LA-area studios are getting their marching orders from out-of-town business people armed with up-to-the-minute customer data. Porn performers tend to roll their eyes at some of these orders, but they don’t have much choice. I have been on sets where performers crack up at some of the messages that are coming “from above,” particularly concerning a repetitive obsession with scenes of “family roleplay” (incest-themed material that uses words like “stepmother,” “stepfather,” and “stepdaughter”) or what the industry calls “IR” (which stands for “interracial” and invariably means a larger, dark-skinned black man and a smaller light-skinned white woman, playing up supposed taboos via dialogue and scenarios).

These particular “taboo” genres have existed since the early days of commercial American porn. For instance, see the stellar performance by black actor Johnnie Keyes as Marilyn Chambers’ orgy partner in 1972’s cinematic Behind the Green Door, or the VHS-era incest-focused sensation Taboo from 1980. But backed by online data of paid customers seemingly obsessed with these topics, the twenty-first-century porn industry—which this year, to much fanfare, was for the first time legally allowed to film performers born in this millennium—has seen a spike in titles devoted to these (frankly old-fashioned) fantasies.

Most performers take any jobs their agents send them out for. The competition is fierce—the ever-replenishing supply of wannabe performers far outweighs the demand for roles—and they don’t want to be seen as “difficult” (particularly the women). Most of the time, the actors don’t see the scripts or know any specific details until they get to set. To the actors rolling their eyes at yet another prompt to declaim, “But you’re my stepdad!” or, “Show me your big black dick,” the directors shrug, point at the emailed instructions and say, “That’s what they want…”

So my interpretation here is that it's not that there's suddenly a huge spike in people discovering they love incest in 2017 where they were clueless in 2016 or that they were all brainwashed to no longer enjoy vanilla that year, it's that that is when the hidden oligopoly turned on various analytics and started deliberately targeting those fetishes as a fleet-wide business decision. And this was because they had so thoroughly commodified regular porn to a price point of $0, that the only paying customers that are left are the ones with extreme fetishes who cannot be supplied by regular amateur or pro supply.

They may or may not have increased in absolute number compared to pre-2017, but it doesn't matter, because everyone else vanished, and their relative importance skyrocketed: "If somebody consumed your stuff, they paid for it. But now it’s probably 10,000 to one, or something.”

(For younger readers who may be confused by how a ratio like 10000:1 is even hypothetically possible because 'where did that 10k come from when no one pays for porn?', it's worth recalling that renting porn videos used to be big business that would be done by a lot of men, and it kept many non-Blockbuster video rental stores afloat and it was an ordinary thing for your local store to have a 'back room' that the kiddies were strictly forbidden from, and while it would certainly stock a lot of fetish stuff like interracial porn, it also rented out tons of normal stuff. If you have no idea what this was like, you may enjoy reading "True Porn Clerk Stories", Ali Davis 2002.)

I think there is a similar effect with foot fetishes & furries: they are oddly well-heeled and pay a ton of money for new stuff, because they are under-supplied and demand new ones. There is not much 'organic' supply of women photographing their feet in various lascivious ways; it's not that it's hard, they just don't do it, but can be incentivized to do so. (I recall reading an article on Wikifoot where IIRC they interviewed a contributor who said he got some photos by simply politely emailing or DMing the woman to ask for her to take some foot photos, and she would oblige. "send foots kthnxbai" apparently works. And probably it's fairly easy to pay for or commission feet images/videos: almost everyone has two feet already, and you can work in feet into regular porn easily by simply choosing different angles or postures, and a closeup of a foot won't turn off regular porn consumers either, so you can have your cake & eat it too. Similarly for incest: saying "But you're my stepdad!" is cheap and easy and anyone can do it if the Powers That Be tell them to in case a few 'customers' will pay actual $$$ for it, while those 'consumers' not into that plot roll their eyes and ignore it as so much silly 'porn movie plot' framing as they get on with business.)

This theory feels insufficient to me, or like it's missing a step. It makes sense to me for people to pay when their preferred porn is undersupplied, but incest porn is now abundant. You need a more specific reason incest fans will pay even when they don't have to. 

Additionally, "but you're my stepdad" isn't equivalent to a couple of foot shots. Lots of people are (or at least were) turned off by incest. 

Additionally, “but you’re my stepdad” isn’t equivalent to a couple of foot shots. Lots of people are (or at least were) turned off by incest.

One important consideration is that (unlike foot shots, for instance) “but you’re my stepdad” is only part of the audio stream, not the video stream. And the audio stream can (increasingly easily) be modified—or simply turned off.

Or just clipped out. It takes 2 seconds to clip it out and you're done. Or you just fast forward, assuming you saw the intro at all and didn't simply skip the first few minutes. Especially as 'incest' becomes universal and viewers just roll their eyes and ignore it. This is something that is not true of all fetishes: there is generally no way to take furry porn, for example, and strategically clip out a few pixels or frames and make it non-furry. You can't easily take a video of an Asian porn star and make them white or black. And so on and so forth.

Hard to parse the reasons for the big clusters with complete certainty but this is a basically plausible story. Other macro factors I have mulled include FOSTA/SESTA - I find the timing interesting, given that it was one of the only major pieces of porn-centric legislation in the last 10 years and it took place right around the time of the big jump - and Nick Kristof's 2020 investigation, which clearly shows up in the data but did not dislodge the main trend.

[-]Ustice131

This is really cool analysis, but I think your conclusions are off. 

 I think this is what happens when you optimize for attention. Especially with user-generated art. I know I’ve watched more “incest” porn in the past few years—because it’s hard to avoid—yet, I’ve contributed to that tend. Gotta give people what they want, right? Bleh. Porn is subject to the same market pressures of enshittification that other businesses on to internet are under.  YouTube is a great example, but so is Facebook, Amazon, MySpace, Reddit, and many more.

Most of this is fantasy role-play, not real desires. People that like Little kink aren’t pedophiles. Furries don’t want to fuck real animals. Dommes aren’t sadistic assholes. 

I’m not saying that there isn’t problematic and porn. There is, and it’s gross. But being squicked out by someone else’s kink doesn’t mean their kink isn’t okay too. 

[-]Tao Lin1019

Incest is not a subcategory of sexual violence, and it's unethical for unrelated reasons. Then again I see the appeal of sexual violence porn but not incest porn, and maybe incest appeals to other people because they conflate it with violence?

Incest is not a subcategory of sexual violence

Not in theory... but in practice, I think most sexual abuse happens in families.

Many older titles have dead URLs, but of those that remain (typically more recent videos), I find that "pure SEO effects" are not very common, and that the title is a reasonable descriptor of the video contents.

Both...

"Hot blonde girl gets fucked"

and

"FAMILYXXX - "I Cant Resist My Stepsis Big Juicy Ass" (Mila Monet)"

... could easily both be completely accurate descriptors of the same video (assuming that this Mila Monet person, or her supposed stepsister, is in fact a "hot blonde girl").

Furthermore, the second could be rated as an accurate descriptor of a video that didn't actually have any content indicating that anybody was anybody's stepsister, just because the video contained somebody who plausibly could be somebody's stepsister. You could retitle a video that wasn't originally supposed to have a "stepsister" theme at all, and still get an accurate rating. The only thing that makes it a "stepsister" video is the (probably usually false) claim that it's one, and that claim can be made equally well by the title and the content.

Hi jbash, I dove a little deeper on my title accuracy system here: https://github.com/dhealy05/semen_and_semantics/blob/main/analysis_results/title_accuracy_logs/title_accuracy_readme.md but didn't account for it when I transferred the readme to the LessWrong format.

The rating system is a human in the loop: me. So it was my judgement call as to what the title accuracy entailed. My goal was to provide tools so that other interested parties would be able to make their own assessment, and that they could check my logs to verify. The logs are all included in that folder.

For example, to rate the videos surfaced in https://github.com/dhealy05/semen_and_semantics/blob/main/analysis_results/title_accuracy_logs/incest_title_accuracy.json I visited each URL and ranked on the 1-5 scale. "Cum in panties step sister" did not seem to involve a step sister, so I gave it a 1: SEO effect. "Kinky Family - Home alone with slutty stepsis" does indeed seem to involve a stepsis oriented plot, so I gave it a 5: no SEO effect.

You could retitle a video that wasn't originally supposed to have a "stepsister" theme at all, and still get an accurate rating.

Even if the video clearly says that they are neighbors, you can still re-title it as "my neighbor's stepsister".

I actually saw a video like that. I wonder whether the target category actually approves of that... as, logically speaking, there is zero taboo about doing the neighbor's stepsister... but still, maybe it is the keyword that matters. I don't know what level of simulacrum are we at, anymore.

One time in my sexology discord, some people were arguing that incest porn was super popular, and I was skeptical because this proposition conflicted with my survey data. I tried scraping data from some porn site (I think PornHub?), and when I sorted videos by number of views, I found that the top-viewed videos were often incest-themed, but as I looked at the cumulative viewcounts, the fraction of views to incest-themed porn dropped as I increased the sample size.

I don't know for sure but my guess is that there was a supply/demand imbalance in the data, such that the fans of incest had their views concentrated into a smaller number of videos (that were thus more likely to have extraordinarily high view counts) because people weren't producing "enough" incest videos to meet demand. But that overall preference for incest porn was lower than what one could guess from the top views.

I haven't looked through your materials so I don't know how my method of scraping in order of decreasing view count compares to your method. Did you get a complete/comprehensive dataset somehow?

I wouldn't call the dataset comprehensive exactly, but it's plausibly representative - it's Internet Archive snapshots of "pornhub.com" from 2008-2023. You can see the script here https://github.com/dhealy05/semen_and_semantics/blob/main/data_retrieval/fetch_snapshots.py. I wrote an "HTMLParser" base class and e.g. "Parser2010" subclass, to put the data into a common format across years. The data and embeddings are in the repo if you want to use them without running a script.

Interesting ideas - I truncated the readme for LessWrong, but my "Future Work" section is 

"Analyze trends by "minutes watched" by weighting for views, view X length; this is more likely a heuristic for content production than actual viewing time"

So while I'm not sure about what those results would look like, I agree there's an angle there.

I think there is a place for a general retrodictive/predictive theory of fetishes and sexual and pornographic preferences, but this article is awfully empirical, inductive, why, Baconian, even, if I may say so myself. I couldn't imagine doing model-free learning in this domain.

If you had claimed that there had been a surprisingly smooth increase in trends toward incestuous and violent pornographic video titles, I might have believed the implication here.

The scariest outputs of my model of the interaction between relatively constant human sexual psychology and ever-changing modern human environments are highly specific, conjunctive, superstimulating fetishes, of which incest and nonconsent only scratch the surface, by dint of their nonconjunctiveness; this also seems to retrodict my anecdotal experience of the strong correlation between strength and specificity of sexual preference, but there might be multiple clusters there.

But I strongly suspect that discontinuous, or sharp and continuous, jumps, in the apparent pornographic preferences of consumers, are almost always a sign of a sudden change in the beliefs of producers about consumers, and not a sharp change in consumer preferences per se.

Hi Gram, I'd be interested in your theory if you'd like to offer it! Generally my feeling is that the realm of porn analysis lacks good data, and thanks to the relatively-new surplus of text embeddings we now have more.

Is the data reliable? I just did a search on PornHub, xHamster and iXXX for the following keywords:

violence, incest, rape, torture

Results:

Pornhub returns a warning "Your search could be for illegal and abusive sexual material...etc" (it is a longer description), returning no videos for 'violence', 'incest' and 'rape'. It does return results for 'torture' though.

The cases is somewhat similar for xHamster and iXXX, except they do not return any special message for the first three, but also do not return any results. They do work fine with violence though.

It does seem there is some official or unofficial policing.

Hi Richard, yes, certain keywords are banned. What I'm measuring is semantic similarity. For example, a video titled "rape" will be banned, but a video suggesting rape may not be. By using text embeddings, we're finding the titles most similar to the concept of rape. To find trends over time, we're counting how many of those titles are found per year, weighted by the total number of titles in a year.

With respect to certain keywords, we see a decline in trends starting after 2020, likely because of Nicholas Kristof's NYT piece "The Children of Pornhub", which led to both stricter keyword standards and a mass removal of videos. "Drunk" and "coma" capture "incapacitation" as a euphemism, which was used as a way to get around explicit keyword policing. 

The fact that we do see declines in some areas and we have a known cause leads me to believe the data is reliable - it's not all showing a line straight up.

[-]fasf10
Broadly this is because of professionalization: a shift from amateur, Youtube-style porn to professional studios with an interest in the bottom line.

I don't think content becoming more and more extreme has anything to do with them trying to maximize money, I believe the shift to more and more extreme content is because people need more stimulation to be satisfied. (Seeing a video of a guy giving away $10K at first might stimulate you enough to watch, but after a while seeing a video like that is no longer stimulating enough so you need to watch a video of a guy giving away $100K to feel stimulated enough to watch).

I'd imagine that porn videos will continue on this trajectory and get more and more extreme, like everything else in life as extreme=more stimulation and with the more and more stimulation we receive the more stimulation we require the next time.

As sadly, our brain constantly needs more and more stimulation to be happy (hedonic treadmill)

You're correctly describing the underlying experience for a certain cohort of porn viewing individual. The mechanism through which it takes place at scale is what I'm interested in: high-spending, high-engagement consumers go through the process you describe and production companies, which rely on their spending, tailor their content accordingly. Assuming "we" and "our" here is wrong IMO - I don't think this is a universal principle of porn viewers, it's just that those viewers shift the market in their direction.

[-]fasf10

I'm probably wrong, but are you saying that the minority of users that want extreme content spend more money than the majority of users which than forces everyone else to watch extreme content?

I agree with that to an extent, but still believe that on average the majority of viewers over time crave more extreme content due to novelty purposes.

Point A: Yes.

Point B: The data here demonstrates that porn has gotten more extreme in a quantifiable way. I would hesitate to ascribe a high degree of "agency" or "intentionality" to the trends. It seems to me you are reasoning backwards: porn has gotten more extreme, so that's what people wanted. In the aggregate you are correct but my point is that market participants drive the market, and most porn viewers are not in fact market participants (or they are in a peripheral way).

I agree embedding the titles into a LLM's latent space is a sound technique, and that it lets you measure shifts in title content in an objective way, but your conclusion that it's becoming "extreme" seems like editorializing. In particular, you lumping in fauxcest with rape and torture as "sexual violence" strikes me as, … well, I'd say bizarre, but at the very least, subjective.

You could just as well conclude that a shift from "lonely housewife fucks handyman/delivery boy" to "help me, stepbro, I'm stuck in the washing machine" signifies a trend towards lighthearted whimsy.

Hi Shankar, I will concede it is editorializing: these are my conclusions based on the data. As to whether or not it is bizarre, I will repost my response to Tao Lin:

""Incest is not a subcategory of sexual violence" is something of a loaded statement. Many "stepsister" videos highlight a certain kind of appearance and context: young, with a backpack, possibly braces, possibly in a setting in which they are still under the authority of a supervising adult ("mom and dad" etc). The implication, left unsaid, is that they are under the age of consent, which qualifies as statutory rape in America. IMO it's sufficient justification to include it in the same category."

You're right that there are other, concurrent trends in tone and quality to measure. "Lighthearted whimsy", which I might call more along the lines of "surreal", is not necessarily in contradiction with violence.

I read the longer article you linked at the end. Never mind, I hadn't realized this work is meant to make your case that "it's time to put some brakes on the porn business." Reposting my own comment about policy papers in a different context (bioterrorism from open AI[1]):

a "policy paper" is essentially a longer, LaTeXed version of a protest sign, intended to be something sympathetic congressmen can wave around while bloviating about "trusting the Science!" It's not meant to be true. 

  1. ^

    Not to be confused with the duplicitously-named OpenAI.

Curated and popular this week