Without regard to anything specific to LLMs... Math works the same for all conceivable beings. Beings that live in our universe, of sufficient advancedness, will almost certainly know about hydrogen and other elements, and fundamental constants like Planck lengths. So there will exist commonalities. And then you can build everything else on top of those. If need be, you could describe the way things looked by giving 2D pixel-grid pictures, or describe an apple by starting with elements, molecules, DNA, and so on. (See Contact and That Alien Message for explorations of this type of problem.)
It's unlikely that any LLM resembling those of today would translate the word for an alien fruit into a description of their own DNA-equivalent and their entire biosphere... But maybe a sufficiently good LLM would have that knowledge inside it, and repeatedly querying it could draw that out.
I guess my question would then be whether the translation would work if neither language contained any information on microphysics or advanced math. Would the model be able to translate e.g. "z;0FK(JjjWCxN" into "fruit"?
I'm pretty optimistic based on research like this that this is possible. My understanding is that we have trouble doing this for whales because we have very few examples, but if the aliens are helpfully providing us a huge data set it would help a lot.
So I could imagine two approaches:
You might get weird translations if the aliens perceive things differently, like if their primary perception is smell the LLM might translate smells to vision or something like that, but I think it's plausible you'd get a translation that's at least useful.
Another weird issue would be tokenization. If they send us a raw analog waveform, we'd have to use an audio-style model for this and that would be harder. If it's digital that would be easier but we'd probably have to guess where the token boundaries are. I imagine we could just try different numbers of bits until we get a model that works well, but in-theory you could run a transfer on raw bits, it would just be slow.
It's conceivable how the characters/words are used across English and Alienese have a strong enough correspondence that you can guess matching words much better than chance. But, I'm not confident that you'd have high accuracy.
Consider encryption. If you encrypted messages by mapping the same character to the same character each time, e.g. 'd' always gets mapped to '6', then this can be broken with decent accuracy by comparing frequency statistics of characters in your messages with the frequency statistics of characters in the English language.
If you mapped whole words to strings instead of character to character, you could use frequency statistics for whole words in the English language.
Then, between languages, this mostly gets way harder, but you might be able to make some informed guesses, based on
An AI might use similar facts or others, and many more, about much fine-grained and specific uses of words and associations, to guess, but I’m not sure an LLM token predictor mostly just trained on both languages in particular would do a good job.
EDIT: Unsupervised machine translation as Steven Byrnes pointed out seems to be on a better track.
Also, I would add that LLMs trained without perception of things other than text don't really understand language. The meanings of the words aren't grounded, and I imagine it could be possible to swap some in a way that would mostly preserve the associations (nearly isomorphic), but I’m not sure.
The naive answer is that, if you can produce an 'English vector' and an 'Alienese vector' in unit activations by averaging over the respective datasets, you can try to apply that vector during generation to get outputs in the corresponding language
Points about math looking the same across all languages have been raised, and I think this would be crucial in testing any approach taken. Looking for a set of Alienese tokens that activates the features in the intersection of those activated by a plus sign, the word "plus", and other addition-related concepts in Alienese, and likewise with fundamental mathematical ideas like pi in various languages and formats, seems like a good way to secure an anchor point that can be expanded from.
More concretely, I think that you could test the viability of this with a fairly succinct experiment that doesn't require much compute at all. Train a small transformer model on a wide selection of elementary math problems across two distinct token sets, one of which corresponds to our own base ten numerals (and some fundamental mathematical operations) and the other of which corresponds to base 3 numbers, or roman numerals, or something equally alien, with symbols like +' *' -' /' ^' serving as operators, and perhaps an alternative to the PEMDAS format for equations (e.g. postfix). All training datapoints will use either only one of the two token sets and formats. See how the carryover looks in practice.
2 2 +'? XIV + V? + and +' share some embedding properties? I think the standard technical term for what you’re talking about is “unsupervised machine translation”. Here’s a paper on that, for example, although it’s not using the LLM approach you propose. (I have no opinion about whether the LLM approach you propose would work or not.)
Interesting reference! So an unsupervised approach from 2017/2018, presumably somewhat primitive by today's standards, already works quite well for English/French translation. This provides some evidence that the (more advanced?) LLM approach, or something similar, would actually work for English/Alienese.
Of course English and French are historically related, and arose on the same planet while being used by the same type of organism. So they are necessarily quite similar in terms of the concepts they encode. English and Alienese would be much more different and harder to translate.
But if it worked, it would mean that sufficiently long messages, with enough effort, basically translate themselves. A spiritual successor to the Pioneer plaque and the Arecibo message, instead of some galaxy brained hopefully-universally-readable message, would simply consist of several terabytes of human written text. Smart aliens could use the text to train a self-supervised Earthling/Alienese translation model, and then use this model to translate our text.
Paul Christiano discusses this in “Unsupervised” translation as an (intent) alignment problem (2020)
From the post:
Suppose that we want to translate between English and an alien language (Klingon). We have plenty of Klingon text, and separately we have plenty of English text, but it’s not matched up and there are no bilingual speakers.
We train GPT on a mix of English and Klingon text and find that it becomes fluent in both. In some sense this model “knows” quite a lot about both Klingon and English, and so it should be able to read a sentence in one language, understand it, and then express the same idea in the other language. But it’s not clear how we could train a translation model.
So he talks about the difficulty of judging whether an unsupervised translation is good, since there are no independent raters who understand both English and Alienese, so translations can't be improved with RLHF.
He posted this before OpenAI succeeded in applying RLHF to LLMs. I now think RLHF generally doesn't improve translation ability much anyway compared to prompting a foundation model. Based on what we have seen, it seems generally hard to improve raw LLM abilities with RLHF. Even if RLHF does improve translation relative to some good prompting, I would assume doing RLHF on some known translation pairs (like English and Chinese) would also help for other pairs which weren't mentioned in the RLHF data. E.g. by encouraging the model to mention it's uncertainty about the meaning of certain terms when doing translations. Though again, this could likely be achieved with prompting as well.
He also mentions the more general problem of language models not knowing why they believe what they believe. If a model translates X as Y rather than as Z, it can't provide the reasons for its decision (like pointing to specific statistics about the training data), except via post hoc rationalisation / confabulation.
Unfortunately, it is hardly possible to answer this question empirically using data from human languages. Large text dumps of, say, English and Chinese contain a lot of "Rosetta Stone" content. Bilingual documents, common expressions, translations into related third languages like Japanese, literal English-Chinese dictionaries etc. Since LLMs require a substantial amount of training text, it is not feasible to reliably filter out all this translation content.
I don't think this is clear. I think you might be able to train an LLM a conlang created after the data cutoff for instance.
As far as human languages, I bet it works ok for big LLMs.
I don't think this was a statement about whether it's possible in principle, but about whether it's actually feasible in practice. I'm not aware of any conlangs, before the cutoff date or not, that have a training corpus large enough for the LLM to be trained to the same extent that major natural languages are.
Esperanto is certainly the most widespread conlang, but (1) is very strongly related to European languages, (2) is well before the cutoff date for any LLM, (3) all training corpora of which I am aware contain a great many references to other languages and their cross-translations, and (4) the largest corpora are still less than 0.1% of those available for most common natural languages.
I think this is a really interesting question since it seems like it should neatly split the "LLMs are just next token predictors" crows from the "LLMs actually display understanding" crowd.
If in order to make statements about chairs and tables an LLM builds a model of what a chair and a table actually are, and to answer questions about fgeyjajic and chandybsnx it builds a model of what they are, it should be able to notice that these models correspond. At the very least it should be surprising if it can't do that.
If it can't generalize beyond stuff in the training set, and doesn't display any 'true' intelligence, then it would be surprising if it can translate between two languages where it's never seen any examples of translation before.
I don't know whether or not we could use an LLM this way, and I think in this particular framing of the question we couldn't even be certain that the Alienese corpus didn't contain any transliterated English (after all, if signals can get from their planet to our planet, can we really be sure no signals have gone from our planet to theirs? Especially if they're only a few light-years away..) - but I do think that we could have a fair crack at deciphering their language without the use of an LLM if we had to, and it'd look a bit like codebreaking.
Maybe we'd start by looking for numbers: for the statistical relationships between different symbols that would indicate different base counting systems. For example, if in many parts of the data we see the same ten symbols used over and over with nothing in between it would be pretty easy to distinguish a base-ten counting system even if the ten symbols were "∆¥π®✓=§*£@" instead of "0123456789")
Once we'd found the numbers, we could probably find mathematical equations that we also knew, and deduce operands like "+", "-", "*", etc. Then if we find equations that describe physical things that are the same on Earth we could probably deduce the words for those physical things For example, in E=mc² we could have deduced the "squared" operand in the previous step, and "c" would have the same value on Earth, so we could deduce the words* for "energy" and "mass". We could then use our knowledge of "energy", Planck's constant (h) being the same on Earth, and E = hv, to work out their word for v (frequency), and so on and so on.
(*Or rather, the symbols, or multidimensional arrangements of symbols, representing these things: they probably wouldn't have words. I happen to imagine them as looking more like weights vectors..)
Meanwhile, once we had numbers, we could look for things like timestamps, and conduct periodicity analysis. If something happened at roughly the same time every day, but drifted slightly earlier and slightly later on a yearly cycle, we might be looking at sunrise, or sunset, or solar noon. (Naturally we wouldn't need the lengths of the "day" or "year" to match up to our days or years, it'd be the pattern we were looking for, not the specific values). If the aliens were observing cosmological phenomena associated with specific values that we could also see from Earth (eg. pulsars, the cosmic microwave background) we might be able to identify their words for these things* from the aliens' observations of them.
Once we had worked-out some of their cosmography (say, the time it takes their planet to orbit their star, or the distance from their star to certain heavenly bodies we can also observe from Earth (Sagittarius A* for example)) we might be able to narrow-down exactly where their star is, observe it directly from Earth, and obtain data that we could match-up to numbers in the dataset. We could then use the position and velocity of their star to work out when they would have observed phenomena that we also observed (such as supernovae) compare this to when we observed the same phenomenon, and learn to convert fully between their dates and ours. We might even be able to work-out some of their colourspace (if they have one! I can't help but imagine the aliens as one big machine intelligence, not a large number of individual biological organisms..) by associating our observed wavelength of the light from their sun (corrected for any redshift from our position) with the same values in their number system, to get the way they map colours, and then look for this structure elsewhere in the dataset.
I don't know whether we could ever bootstrap from shared physical/astronomical observations and universally-true mathematics to a full understanding of the language - but my goodness, it'd be so interesting to try!
Similar question: Let's start with an easier but I think similarly shaped problem.
We have two next-token predictors. Both are trained on English text, but each one was trained on a slightly different corpus (let's say one the first one was trained on all arxiv papers and the other one was trained on all public domain literature), and each one uses a different tokenizer (let's say the arxiv one used a BPE tokenizer and the literature one used some unknown tokenization stream).
Unfortunately, the tokenizer for the second corpus has been lost. You still have the tokenized dataset for the second corpus, and you still have the trained sequence predictor, but you've lost the token <-> word mapping. Also due to lobbying, the public domain is no longer a thing and so you don't have access to the original dataset to try to piece things back together.
You can still feed a sequence of integers which encode tokens to the literature-next-token-predictor, and it will spit out integers corresponding to its prediction of the next token, but you don't know what English words those tokens correspond to.
I expect, in this situation, that you could do stuff like "create a new sequence predictor that is trained on the tokenized version of both corpora, so that the new predictor will hopefully use some shared machinery for next token prediction for each dataset, and then do the whole sparse autoencoder thing to try and tease apart what those shared abstractions are to build hypotheses".
Even in that "easy" case, though, I think it's a bit harder than "just ask the LLM", but the easy case is, I think, viable.
Actually I wonder if we could do an experiment in the following way:
It's true that there will be some amount of overlap, but this should put a ceiling on how well this approach could work.
Trying to learn a language from scratch, just from text is a fun exercise for humans also. I recently tried this with Hindi after I had an disagreement with someone about the exact question of this post. I didn't get very far in 2 hours though.
I think this is almost impossible for humans to do. Even with a group of humans and decades of research. Otherwise we wouldn't have needed the Rosetta Stone to read Egyptian hieroglyphs. And would long have deciphered the Voynich manuscript.
Until recently, it seemed impossible to learn a language without either
...
This does not fully line up with my experience learning languages, so I would like to see a source or evidence backing up this claim.
I know this is an anecdote, but I have learnt a significant amount of Japanese by just listening to podcasts, including on topics too abstract or too disconnected from me to have sensory experiences on, and I personally do not see why you can't learn a language from nothing but the language itself.
That sounds hard to believe if you knew zero Japanese before. (Inferring the meaning of unknown words from known words is different.)
I would like to see a source or evidence backing up this claim.
Sure. If it were otherwise, deciphering the Voynich manuscript from the text alone would long have been achieved.
Suppose astronomers detect a binary radio signal, an alien message, from a star system many light years away. The message contains a large text dump (conveniently, about GPT-4 training text data sized) composed in an alien language. Let's call it Alienese.[1]
Unfortunately we don't understand Alienese.
Until recently, it seemed impossible to learn a language without either
However, the latest large language models seem to understand languages really well, but without using either of these methods. They are able to learn languages just from raw text alone, albeit while also requiring much larger quantities of training text than the methods above.
This poses a fundamental question:
If an LLM understands language A and language B, is this sufficient for it to translate between A and B?[2]
Unfortunately, it is hardly possible to answer this question empirically using data from human languages. Large text dumps of, say, English and Chinese contain a lot of "Rosetta Stone" content. Bilingual documents, common expressions, translations into related third languages like Japanese, literal English-Chinese dictionaries etc. Since LLMs require a substantial amount of training text, it is not feasible to reliably filter out all this translation content.
But if we received a large text dump in Alienese, we could be certain that no dictionary-like connections to English are present. We could then train a single foundation model (a next token predictor, say a GPT-4 sized model) on both English and Alienese.
By assumption, this LLM would then be able, using adequate prompt engineering, to answer English questions with English answers, and Alienese questions with Alienese answers.
Of course we can't simply ask any Alienese questions, as we don't know the language. But we can create a prompt like this:
(Assume the garbled text are Alienese tokens taken from a random document in the alien text dump.)
Can we expect a prompt like this, or a similar one, to produce a reasonably adequate translation of the Alienese text into English?
Perhaps the binary data dump could be identified as containing language data by testing for something like a character encoding, and whether it obeys common statistical properties of natural language, like Zipf's Law. ↩︎
There is a somewhat similar question called Molyneux's problem, which asks whether agents can identify objects between two completely unrelated sensory modalities. ↩︎