This is excellent and related to a project I'm working on, I'm really happy to find this post, and I thank you for sharing it, and if I make headway on my project (which is similar but not as risk of eating your lunch, I assure) I'll be happy to share it with you. Language learning could be so, so much easier than it currently is, and stuff like this is at the core of how it could be improved.
Motivation
Even if you don’t speak French, you can probably understand, or at least get the gist of, the above sentence: the French president Emmanuel Macron is assuring the “peuple canadien” (Canadian people) about something involving the “gouvernment français” (French government). Imagine reading thousands of sentences like this and gradually acquiring French, backdooring into the language using cognates you already know. This is known as comprehensible input, a language learning technique first advocated for by linguist Stephen Krashen in the 1980s.
Comprehensible input is a good language learning method, but creating comprehensible input is very hard. A native speaker has to painstakingly create thousands of sentences that speakers of another language can understand, slowly scaling up the difficulty as the learner progresses. Resources like this actually do exist for a number of languages. For example, learners of Latin can use Lingua Latina per se Illustrata, a book that teaches you Latin using exclusively sentences in Latin. However, writing this text took years of effort on the part of Hans Ørberg, a linguist who dedicated a large part of his life to teaching Latin. Ørberg carefully wrote the sentences in his book to only use cognates an English speaker could understand and to avoid any complicated syntax an English speaker would have a hard time understanding.
What if there was a way to automate the process of generating comprehensible input using large language models? Language models like GPT-4o and o1 have good enough reasoning ability to tell what words in a foreign language are and aren’t cognate with English. They can reliably generate sentences in other languages like French without any additional training. In short, this is an ideal use case for large language models.
What I did
I made a simple interactive language-learning app, hosted at cognateful.vkethana.com. It teaches you French using stories written exclusively in French.
Users are provided with a brief, ten-sentence story consisting of French sentences. The user’s task is to read the sentences and translate them into English. The app tells you whether or not your translation is correct using GPT-4o-powered scoring. Based on your performance on the exercises, it assigns you a “difficulty” score, which goes up and down depending on your performance. Users then are served sentences at appropriate levels of difficulty based on their performance. For those who are curious, the source code can be found here.
Caveats: This is just a minimum viable product. The number of sentences in the app is limited. I don’t speak French (yet), so sentences may contain mistakes. But the interface, scoring system, and sentence generation are all functional, and I think they will work at scale. The biggest hurdle to improving the app is increasing the number of sentences while not compromising on sentence quality.
How I generated and scored the sentences
Scoring
I have to explain sentence scoring before sentence generation because the scoring system influenced the prompt used to generate the sentences. I give o1 a sentence and ask it to score the difficulty on a scale of 0 to 3, 0 being very hard to understand for a monolingual English speaker and 3 being very easy.
Score 0: Completely unintelligible to English speakers. Example: “Je veux manger du pain.”
Score 1: Contains some cognate words, but contains words unintelligible to an English speaker. The cognates might allow them to guess the general topic but not the main idea or actual meaning. Example: “Le maître savant utilise beaucoup de livres.” (Has cognates like “savant” but key verbs/objects aren't cognates)
Score 2: Contains many cognate words. An English speaker might guess the main idea but would miss important details or nuances that change the meaning. Example: “Le patient refuse absolument de prendre ses médicaments malgré les protestations constantes du docteur.” 1
Score 3: Fully understandable through cognates. Use almost exclusively cognate words except for basic connectors. Example: “Le président Emmanuel Macron assure le peuple canadien que le gouvernement français va continuer à défendre le Canada contre la menace américain.”
(Side Note: I found that using o1 is necessary for good-quality scoring. Other models – 4o, 4o-mini, and o1-mini – had a hard time determining what a cognate was. Also, they were too lenient, often assigning scores of 3 to sentences that in my opinion an English speaker wouldn’t be able to fully understand.)
Generation
To save time and money, I pregenerate all sentences on the site using GPT-4o and o1. Unsurprisingly, o1 made much higher quality sentences, but 4o did a pretty good job too, and any bad-quality sentences are flagged by the scoring system anyway. So it’s OK to cut costs and use a cheaper model when generating sentences, but not when scoring them. To generate sentences, I made a function
generate_storythat takes in a target difficulty and then asks GPT-4o to generate a story consisting of sentences at that difficulty. This allows me to create a variety of sentences at different difficulty levels to suit the user’s needs.To make the final set of sentences seen on the site, my script repeatedly calls, and saves the output of,
generate_storywith the target difficulty set to a randomly-generated integer between 0 and 3, inclusive. Here’s a breakdown of how many sentences the site currently has per difficulty level (recall that 1 story = 10 sentences).For those interested, the exact prompts used to score and generate sentences are below:
Approaches that didn’t work
Some optimizations I made
Findings
Some cognate words have a stronger association with high-scoring sentences than others. For example, université and enthousiasme have average scores of 3.00, whereas recherches and ouvre have average scores of 1.67. These findings might seem obvious at first glance, but it’s proof that the scoring function is doing something right! Cognates that are very easy to understand receive high scores. More difficult or obscure cognates receive lower scores. Here’s a non-exhaustive table of some cognates and the average scores of the sentences containing them.
Features I plan to add
How you can help
If you’re familiar with NLP and/or software development, you can help out by suggesting solutions to the following blockers that I’m currently facing. Leave a comment below!
Thanks for reading my post! By the way, this is a mirror of the original post on my personal website. If you liked (or hated) reading about this project or have thoughts on how to improve it, please leave a comment below.