The approach of the linked article tries to match words meaning the same thing across languages by separately building a vector embedding of each language corpus and then looking for structural (neighborhood) similarity between the embeddings, with an extra global 'rotation' step mapping the two vector spaces on one another.
So if both languages have a word for "cat", and many other words related to cats, and the relationship between these words is the same in both languages (e.g. 'cat' is close to 'dog' in a different way than it is close to 'food'), then these words can be successfully translated.
But if one language has a tiny vocabulary compared to the other one, and the vocabulary isn't even a subset of the other language's (dolphins don't talk about cats), then you can't get far. Unless you have an English training dataset that only uses words that do have translations in Dolphin. But we don't know what dolphins talk about, so we can't build this dataset.
Also, this is machine learning on text with distinct words; do we even have a 'separate words' parser for dolphin signals?
So the blocker I mentioned. OK, thanks. Well, maybe we could make a translator between whales and dolphins then.
Or we could make a translator between a corpus of scuba diver conversations and dolphins.
We might be able to parse dolphin signals into separate words using ordinary unsupervised learning, no?
Why does the relative size of the vocabularies matter? I'd guess it would be irrelevant, the main factor would be how much overlap the two languages have. Maybe the absolute (as opposed to relative) sizes would matter.