Follow

Does Google Translate use the concept of phonological change at all (e.g., Grimm's Law)? Or is it all just from existing human translations?

· · Web · 0 · 0 · 1

@latersauctro wait do you study linguistics too? dang, nice to meet you

@trashqueen same here! It's just an amateur interest -- getting a PhD in will have to be in another lifetime.

@trashqueen systems biology, studying interactions between genes, proteins and small molecules from a network perspective.

@latersauctro also wait how would Google Translate use Grimm's Law? They don't deal in phonology, only orthography

@trashqueen I'm guessing Translate's picks up on some sound shifts, like -ción (Spanish) > -ção (Portuguese). Maybe there's a way to more explicitly include that?

@latersauctro Ohhh, you mean dealing with cognates in related languages? That's a really smart idea, that way, all you have to do is program in the necessary grammatical and phonological differences between two languages, and allow the translation of one to improve the translation of the other.

This would be AMAZING for languages like Catalan, which would rarely get used and so not have lot of feedback to work on.

@latersauctro Yeah that's a really good idea. Imagine being able to add pretty much every "dialect" that's really a language to google Translate, with a minimum of effort. You'd just translate to the "main" language, and then make the necessary adjustments in vocabulary, spelling, etc.

@trashqueen Hmm... maybe they do in a way? "To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system." arxiv.org/abs/1609.08144

@trashqueen Also, "This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations)." arxiv.org/abs/1508.07909

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!