A sociolinguistic approach to AI detectors: The problem and what to do about it

LESSWRONG
LW

A sociolinguistic approach to AI detectors: The problem and what to do about it — LessWrong

With the rapid development of generative AI in the past several years, there has been growing concern over academic and journalistic integrity, giving rise to AI detection software. Popular AI detectors tout their ability to detect AI-generated writing with extremely high accuracy, by analyzing linguistic features such as perplexity (how unpredictable the text is), burstiness (variation in sentence length), and word choice, yet studies have found AI detectors to be less reliable than advertised, resulting in false accusations which may disproportionately affect certain marginalized groups. As generative AI continually improves, detecting AI-generated text reliably and equitably has proven itself to be a complicated task, and for those who find themselves caught in the crossfire of the AI arms race, false accusations of AI-generated writing can have serious consequences.

The problem

Popular AI detectors, such as Turnitin and Originality.ai, claim to be able to detect AI with 98% accuracy (Drozdowski, 2024; Heneberry, 2023). Yet, some studies have suggested the accuracy of AI detectors is actually much lower than advertised (Leechuy, 2023; Coffey, 2024).

There are various factors which may hinder the effectiveness of AI detectors. While an AI detector may be able to detect unaltered AI-generated text, this can be easily bypassed using paraphrasing tools (Leechuy, 2023) or manually editing the text. This is why watermarks have proven to be an unsuccessful way of preventing false negatives (Hoffman-Andrews, 2024).

Further, using accuracy (which represents the proportion of true predictions) alone as a metric can be misleading, as it does not take into account the disproportionate cost of false positives vs false negatives. That is where precision and recall come into play. While recall denotes the proportion of correctly identified positive cases among all true positives, precision denotes the amount of true positives among all positive predictions (Bonnet, 2023). Therefore, recall minimizes false negatives while precision minimizes false positives. In the case of AI detection, false positives are much more costly than false negatives.

Though AI detectors often err on the side of classifying writing as human according to a study by Weber-Wuff et. al (2023), they have been known to falsely flag human-generated writing as AI as well. AI-generated writing does indeed tend to have certain features such as less variation in sentence length, more predictable text, more formal tone, and repetition of certain words and phrases (Leechuy, 2023), but these tendencies are not unique to AI. In one viral example, an AI detector flagged the US Constitution as AI-generated (University of Maryland, 2023). More troublingly, studies show that false positives disproportionately impact marginalized communities. Liang et al. (2023) found that while AI-detectors classified the essays of native English writers with near perfect accuracy, the essays by non-native writers were misclassified as AI 61.22% of the time. Further, a report by Common Sense Media (Madden et al., 2024) found that Black teens were twice as likely as White or Latino teens to report having their own writing flagged as AI in school. There is also anecdotal evidence that neurodivergent authors tend to be false-flagged as AI as well (Gegg-Harisson & Quarterman, 2024).

The consequences

If AI detectors aren’t reliable at differentiating human vs. AI-generated text, one must ask if they are doing more harm than good. Being falsely flagged for AI writing is not only demoralizing, but can have a number of damaging effects on a person’s professional and academic reputation and relationships, and jeopardize their careers. Accusations based on AI-detector results have already led to punitive action towards students (Jimenez, 2023) and even job termination for professional writers (Germain, 2024). This has created a wider culture of distrust, placing undue anxiety on students and authors.

It is particularly damaging to the marginalized communities who are more likely to be unfairly targeted by AI detection, as it reinforces existing implicit biases toward these groups, placing them under even more scrutiny and pressure to language-police themselves. In a notable example, a Purdue university professor, who is autistic, was accused by a fellow researcher of using AI to write their emails, due to “lacking warmth” (Kling, 2023). It seems that rather than simply preserving the integrity of academia and journalism to maintain public trust in these institutions, AI detectors have inadvertently created yet another form of structural inequity and injustice.

Future directions

Several schools, including Vanderbilt, Michigan State, and University of Texas at Austin have stopped using AI-detection tools as a result of the potential social harms (Ghaffary, 2023). Vanderbilt suggests implementing a more holistic approach to determining whether a student’s work is original, such as checking for factual inaccuracies (known as “hallucinations” in the world of AI-generated text) and comparing the writing style to the student’s previous work (Coley, 2023). The latter approach is known in the field of applied linguistics as “authorship attribution,” wherein a questioned document is compared to a known writing sample to check for consistency. The practice was famously used by Mosteller and Wallace (1963), employing Bayesian statistical methods to identify the authors of the Federalist papers. Since then, authorship attribution has been applied to forensic as well as literary contexts - including identifying the works of Shakespeare (Eisen et al., 2016). Integrating this process into AI detectors could help reduce the false positive rate, as it takes into account the possibility that the similarity to AI-generated writing may be coincidental.

This would result in a two-step system:

Compare the questioned writing against the author’s known writing samples (authorial attribution)
If the writing is NOT consistent with the author’s known writing style, then test for consistency with AI-generated writing.

Implementing further checks and balances by using scalable oversight techniques such as debate (Irving et al., 2018) or market making (Hubinger, 2020) to verify the results could make this machine-learning system even more robust against false positives. Not only would the proposed system reduce false positives, detecting discrepancies in writing style could also point to other forms of non-original content, such as students having someone else write their paper for them. Of course, no automated technology should be relied upon entirely, and independent human judgment should never be taken out of the equation. In addition to machine learning tools, manually checking for logical inconsistencies and factual inaccuracies will continue to be important going forward.

In the meantime, more transparency and public awareness for the limitations of current AI detection technology is needed. A future study could compare the detection rate of AI-generated texts before and after running them through a paraphrasing tool, to further demonstrate their vulnerability to detection-bypassing strategies. Further, more research is needed on the impacts of AI-detection and false accusations on marginalized populations, particularly the neurodivergent community as empirical research seems to be lacking in this area.

Conclusion

Current AI detection technology is unreliable and has been shown to display bias. This can have serious effects on people’s learning and livelihood, particularly those from marginalized communities who already face educational and economic barriers. This paper proposes avenues for further research in this area as well as an actionable solution for improving AI detection. The proposed solution aims to make AI detection software more trustworthy and equitable by comparing the questioned writing against the author’s known writing, rather than comparing it only to AI. While this solution may lead to a decrease in recall due to its inverse relationship with precision, minimizing false positives should be considered the most pressing issue facing AI detection, as false positives cause much greater social harm

References

Bonnet, A. (2023, November 23). Accuracy vs. precision vs. recall in machine learning: What is the difference? Encord. https://encord.com/blog/classification-metrics-accuracy-precision-recall/

Coffey, L. (2024, February 9). Professors cautious of tools to detect AI-generated writing. Inside Higher Ed. https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2024/02/09/professors-proceed-caution-using-ai

Coley, M. (2023, August 16). Guidance on AI detection and why we’re disabling Turnitin’s AI detector. Vanderbilt University. https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/

Drozdowski, M. J. (2024, April 24). Testing Turnitin’s new AI detector: How accurate is it? BestColleges. https://www.bestcolleges.com/news/analysis/testing-turnitin-new-ai-detector/

Eisen, M., Ribeiro, A., Segarra, S., & Egan, G. (2017). Stylometric analysis of Early Modern Period English plays. Digital Scholarship in the Humanities, 33(3), 500–528. https://doi.org/10.1093/llc/fqx059

Gegg-Harrison, W., & Quarterman, C. (2024). AI Detection’s High False Positive Rates and the Psychological and Material Impacts on Students. In S. Mahmud (Ed.), Academic Integrity in the Age of Artificial Intelligence (pp. 199–219). IGI Global.

Germain, T. (2024, June 12). AI detectors get it wrong. Writers are being fired anyway. Gizmodo. https://gizmodo.com/ai-detectors-inaccurate-freelance-writers-fired-1851529820

Ghaffary, S. (2023, September 21). Universities rethink using AI writing detectors to vet students’ work. Bloomberg. https://www.bloomberg.com/news/newsletters/2023-09-21/universities-rethink-using-ai-writing-detectors-to-vet-students-work

Henebery, B. (2023, April 6). New AI detector spots CHATGPT content with 98% accuracy. The Educator . https://www.theeducatoronline.com/k12/news/new-ai-detector-spots-chatgpt-content-with-98-accuracy/282285

Hoffman-Andrews, J. (2024, January 5). AI watermarking won’t curb disinformation. Electronic Frontier Foundation. https://www.eff.org/deeplinks/2024/01/ai-watermarking-wont-curb-disinformation

Hubinger, E. (2020, June 26). AI safety via market making. AI Alignment Forum. https://www.alignmentforum.org/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making

Irving, G., Christiano, P., & Amodei, D. (2018). AI safety via debate. CSArXiV. https://doi.org/10.48550/arXiv.1805.00899

Jimenez, K. (2023, April 13). Professors are using ChatGPT detector tools to accuse students of cheating. but what if the software is wrong? USA Today. https://www.usatoday.com/story/news/education/2023/04/12/how-ai-detection-tool-spawned-false-cheating-case-uc-davis/11600777002/

Kling, J. (2023, July 26). Prof accused of being AI bot. Purdue Exponent. https://www.purdueexponent.org/campus/article_2d1826e2-2bfa-11ee-84c9-6f34496edb29.html

Leechuy, J. (2024, March 14). How reliable are AI detectors? Claims vs. reality. The Blogsmith. https://www.theblogsmith.com/blog/how-reliable-are-ai-detectors/

Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., Zou, J. (2023). GPT detectors are biased against non-native English writers. CSArXiV.

https://doi.org/10.48550/arXiv.2304.02819

Madden, M., Calvin, A., Hasse, A., & Lenhart, A. (2024). The dawn of the AI era: Teens, parents, and the adoption of generative AI at home and school.

San Francisco, CA: Common Sense.

Mosteller, F., & Wallace, D. L. (1963). Inference in an Authorship Problem. Journal of the American Statistical Association, 58(302), 275–309. https://doi.org/10.2307/2283270

University of Maryland. (2023, May 30). Is AI-generated content actually detectable? UMD Department of Computer Science. https://www.cs.umd.edu/article/2023/05/ai-generated-content-actually-detectable