Internationalizing Your Voice, by Joseph Tyler
Joseph is in linguistics. He works at Sensely, where they created a conversational interface to help people connect with insurance and healthcare resources. There’s a “voice mode” and a “text mode”, and they work across multiple countries.
So, what goes into internationalization?
Translation work costs money! But it’s necessary. You need to know in advance for format the translators will need, and if you want it translated for speaking or reading.
Have to adapt content to new areas. For example, if you have content about “scorpion stings” in a country without scorpions, you’re wasting their time. Or if you’re asking “have you been traveling outside the US?” but now you want to launch in Abu Dhabi, you need to update (not just translate).
Also, in some countries they use multiple languages – in UAE they often read in English but speak in Arabic.
And platform adoption changes country to country! iOS is 53% of US market share, but only 3% of India market share.
Lastly, regulations will change from country to country.
Gender is a big one – in Spanish “Ready?” requires gender marking. So if you don’t know the person’s gender, it’s hard to say “are you ready?” In Arabic, “How are you feeling?” is a different question dependent on gender.
Plus, if every sentence with pronouns needs to be marked for gender, that doubles the translation costs.
Templatized content will also have challenges: in the US you can have a template with “show me the X” but in other languages the word “the” will change based on male/female and number.
Does you SQL script render non-Latin characters? Can you read those characters in the database?
Copy/pasting Japanese characters doesn’t always work, because Unicode isn’t the same. And it can be harder to catch issues when you don’t know the language, so you need a team that is diverse.
Avatars should be localized. It’s weird if your avatars are all white men.
All of this applies to voice. But in addition, you MUST test in local areas. If you are using American English, quality for Text-to-Speech tends to be good. But if you are using Australian English (for example) it may not be as high quality.
(If you take one thing from this talk: test.)
Arabic, for example, is written without vowels. But for Voice, the machine needs the vowels, so each English line needs two strings: one for written text (without vowels) and once for speaking (with the vowels).
Quality can also vary across speech recognition.
Also, where in English you might say “Hurt/hurts/hurting,” in Arabic that might have 20 or more variants, including for masculine, feminism, standard vs Egyptian Arabic, etc etc.
Lastly, punctuation can actually be different in different languages.
If you’re sending data around the world, it can slow things down. Use local servers, pre-load content where you can, and cache as much as possible.
Also, some services are blocked in other languages. You need to work with partners on QA, and also situations where you might suggest a service. For example, Google is blocked in China. So a lot of features that “work” in the US don’t work in China.