The Marubo, a 2,000-member tribe in the Brazilian Amazon, recently made headlines: they received Starlink in September 2023. Within 9 months, their youth stopped learning traditional body paint and jewelry making, young men began sharing pornographic videos in group chats (in a culture that frowns on public kissing), leaders observed "more aggressive sexual behavior," and children became addicted to short-form video content. One leader reported: "Everyone is so connected that sometimes they don't even talk to their own family."
Note that the tribe in question is suing the New York Times over that article, and indeed the New York Times issued a retraction (or, well, as close as they can come to a retraction, as the original article never actually claimed any members of the tribe were "addicted to porn" only that some minors watched it).
This obviously doesn't show the original reporting is garbage, but it does seem pretty important to note here.
This embodied wisdom, refined through generations of trial and error, often reveals natural phenomena that laboratory science would never think to investigate. For every cultural extinction, we lose not just traditions, but entire systems of observation and experimentation that could unlock medical and scientific breakthroughs.
- Cytarabine, a leukemia drug which grants remission to thousands of children yearly, came from the Caribbean sponge Tectitethya crypta ֊֊֊ an organism coastal peoples knew but Western science discovered by accident.
But… did Western science learn about this sponge… from the “coastal peoples”? Or independently? (Wikipedia doesn’t seem to say.)
My guess is that if you're worried about this, the best intervention is to ensure that the cultures are recorded so that people can bring them back later if they want to. I think these people should consider recording more videos of themselves, potentially just in private for the purpose of informing interested people later.
You can't bring a culture back. It would just be cosplay and reenactment. Recording what is is worthwhile only so that later people can learn what was. Who takes the ancient Greek gods seriously today? Who speaks Egyptian? How many crofters are there still? The young vote with their feet and the old die off. "How Ya Gonna Keep 'em Down on the Farm, After They've Seen Paree?"
The old SSC essay, How the West Was Won, seems relevant.
I am pretty sure there was, at one point, such a thing as western civilization. I think it included things like dancing around maypoles and copying Latin manuscripts. At some point Thor might have been involved. That civilization is dead. It summoned an alien entity from beyond the void which devoured its summoner and is proceeding to eat the rest of the world.
edit: disagree voters, which claims do you disagree with, and why?
I doubt that AIs speaking "western" and not various-indigenous would be the thing that kills their cultures. I expect the only thing that causes an omnicide[metaphorical] for underrepresented groups would be, well, omnicide[literal]: AGI or ASI that is able to autonomously operate its entire supply chain deciding humanity is no longer necessary.
I do agree it would be good to avoid underrepresenting groups in some respects. It's not clear to me whether representing them in training data better is better for their values, though. Probably should get certified-provenance claims from folks in those groups giving reasoning for why they would expect the model understanding them to be better or worse. I do think that moderately-aligned models would need to understand them to treat them well, for example, so there's a clear argument for yes.
Record-keeping seems unambiguously good, and making those records available to humans in the relevant groups seems unambiguously good. It's not clear that making those records available to, eg, recommenders, would be better for their values - they'd have to make that decision themselves, ideally after understanding what has happened when others' records became available to recommenders.
The reality, risks, & consequences of scaling consumer-entertainment technologies to a global population with little cognitive security. [mirrored from Substack]
Western, American culture will be in frontier language models’ training data.
Most other cultures will not.
If nobody steps in, ~90% of mankind's culture could die quite soon.
Globally, indigenous cultures face an unprecedented threat to their traditional heritage.
The Marubo, a 2,000-member tribe in the Brazilian Amazon, recently made headlines: they received Starlink in September 2023. Within 9 months, their youth stopped learning traditional body paint and jewelry making, young men began sharing pornographic videos in group chats (in a culture that frowns on public kissing), leaders observed "more aggressive sexual behavior," and children became addicted to short-form video content. One leader reported: "Everyone is so connected that sometimes they don't even talk to their own family."
Marubo people carrying a Starlink antenna stopped for a break to eat papaya.
Cultural diversity is already experiencing a mass extinction at rates comparable to our biological, Holocene extinction ֊֊ up to 10,000 times the natural background rate. Yet unlike species loss, which triggers conservation efforts, cultural extinction largely remains unnoticed.
Approximately 50% of the world's population was not online 10 years ago. In 2010, only 30% of the global population used the internet, while by 2020, that number had doubled to 60%.
The mechanism is straightforward: providing internet access to previously offline populations overwhelms users with enticing, addicting, and viral stimuli. The entertainment is far greater than anything indigenous, pre-digital lifestyles can offer.
This cultural erosion has profound implications beyond the communities themselves: the internet that's replacing indigenous knowledge systems is also feeding the artificial intelligence that will shape humanity's future. Large language models learn from text scraped across the web ֊֊֊ and if a culture doesn't exist online, it effectively doesn't exist to AI. Every traditional practice abandoned for TikTok, every oral tradition left unrecorded, every indigenous language that goes unwritten represents not just a loss to those communities, but a permanent gap in the knowledge systems that will guide our AI-mediated world. The internet homogenizes human culture toward whatever content generates the most engagement, and AI systems trained on that same internet amplify those patterns further. The Marubo children who stopped learning traditional body painting aren't just losing their heritage ֊֊֊ they're ensuring that AI systems will never learn to think in ways their ancestors did. What survives digitization becomes the foundation for machine intelligence; what doesn't survive gets erased from our computational future.
A North Korean soldier of the Ukrainian War (2025) lies in his barracks, scrolling.
The mechanism driving this convergence has deep historical precedent, and is an unsurprising consequence of how power operates in the digital age. The policing power of any central cultural authority weakens with distance ֊֊֊ a Chinese proverb, from the Yuan dynasty, applies: "Heaven is high and the emperor is far away.” Historically, in times of totalitarian oppression, such a reality enabled local customs to persist. In the post-industrial age, this protection no longer exists. The proverbial emperor is no longer a despot, requiring legislative & coercive mechanisms to homogenize culture. The emperor has been supplanted by market forces and economic competition, which carry out this responsibility instead.
Mature capitalist economies, such as the United States, necessitate firms pursue global user acquisition for survival ֊֊֊ firms must constantly expand their customer base to satisfy growth expectations and outcompete adversaries. This imperative drives firms to pursue a global audience, while delivering widely-appealing products which transcend cultural boundaries. Through algorithmically-delivered content, engineers develop low-friction behavioral pathways which reshape cultural habits, social practices, and leisure activities towards addiction.
Cultural diversity, much the same as biological diversity, preserves an irreplaceably robust repository of solutions to ancient problems. Darwinian evolution has operated as a massively parallel search algorithm in for hundreds of millions of years, testing chemical and behavioral solutions to ecological problems. Claude Lévi-Strauss, renowned anthropologist, believes culture to be "a unique experiment in organizing human life" ֊֊֊ encoding generations of knowledge systems adapted to localized environments. Much the same as biodiversity, such cultural solutions cannot be easily discovered, let alone, recreated once lost.
This embodied wisdom, refined through generations of trial and error, often reveals natural phenomena that laboratory science would never think to investigate. For every cultural extinction, we lose not just traditions, but entire systems of observation and experimentation that could unlock medical and scientific breakthroughs.
Tectitethya crypta
Recently, I had the privilege of speaking with Ayah Bdeir, a Lebanese entrepreneur and activist currently leading AI strategy at The Mozilla Foundation. She is reducing her Mozilla responsibilities to pursue what she sees as an existential challenge: economizing large-scale digitization of Arab cultural media, knowledge, and lifestyles. Her rationale was straightforward: "The Arab world is not in the training data." She explained how this absence manifests at every level - from the technical (Arabic OCR remains largely unresolved, making centuries of texts inaccessible) to the geopolitical (The Gulf States pouring billions into sovereign AI). Without intervention, she argued, AI will learn to think exclusively through a Western lens, while a fifth of humanity's historical frameworks vanish from our computational future.
Similarly, Rohan Pandey, a machine learning scientist, recently departed from OpenAI to address Sanskrit OCR, recognizing that millennia of Vedic texts ֊֊ containing alternative philosophical frameworks, governance models, and problem-solving methodologies ֊֊ exist neither in digital format nor in current training corpora. These initiatives are important beyond preservation efforts; they constitute strategic interventions to incorporate cognitive diversity into artificial intelligence systems before technological monoculture becomes irreversible.
See, large language models are prediction engines. LLMs learn implicit value functions from their training data. Cultural homogenization is as epistemological as it is behavioral: when AI systems train exclusively on internet-scale English text, they inherit a biased ontology. Claude Opus models “believe” problems have solutions, progress is linear, optimization is inherently good. The majority of human knowledge was developed under different assumptions. (For example, Islamic jurisprudence uses analogical reasoning, Qiyas, which treats precedent as living wisdom rather than fixed law). Now, as far as we are aware, language models cannot hold conscious “beliefs”, per se, but they do hold statistical patterns which function as such. In the training process, language models operate like anthropologists in reverse ֊֊ they must reconstruct entire worldviews from textual fragments. When indigenous cultural frameworks have no representation in the training data, models literally cannot develop the neural infrastructure to reconstruct those ways of thinking.
Including Vedic literature in the training data (or Arabic philosophy, or indigenous texts) fundamentally changes what the model treats as valid reasoning. The Bhagavad Gita presents duty-based ethics where Krishna argues for necessary violence; the Arthashastra details statecraft through deception; Tantric texts describe transformation through transgression. When repositories of cultures developed from these ideas comprise significant training data, language models develop different priors about acceptable solutions. It might propose strategies that Western-trained models would never generate ֊֊ not because they're "wrong" but because they violate implicit constraints learned from a narrow philosophical canon.
We're watching the construction of a new priesthood - machine learning engineers who shape how future systems will think - while the training data that could provide alternate reasoning patterns remains unlabeled, untranslated, or simply absent. Pandey's work on Sanskrit OCR isn't about preservation; it's about preventing an intellectual monoculture where every AI system inherits the same blind spots. The Marubo didn't just lose their customs to Starlink ֊֊֊ they lost their children's ability to think in ways that modernity cannot replicate. As one Marubo leader stated: "We can't live without the internet." The dependency crystallizes before understanding of the danger arrives.
We're writing a self-fulfilling prophecy: by training AI only on the worldviews that survived digitization, we ensure those become the only worldviews that can exist in our AI-mediated future. The models will complete our civilizational story the only way they know how - by extrapolating from the fragments we've given it. In doing so, we're eliminating entire branches of human problem-solving that took millennia to develop and might be impossible to rediscover.
Thank you Dunya Baradari, Cassandra Melax, & Izzy for helpful insights & review.