When machine translation and volunteer translators collide: A YouTube/TED case study

Google recently announced a rather nifty feature in YouTube: Auto-translation of auto-generated video captions.

So not only is Google automatically transcribing the text of its videos, it’s also providing translations — via machine translation. Now I just need a “machine reader” so I can process all of this new content — as I’m running out of hours in a day.

Google’s blog post notes:

In the next few months we expect over 150,000 Youtube channels to implement auto-captioning with translation. This is just the beginning and we hope that all Youtube content will soon be enjoyed by all Youtube users, regardless of what language they speak.

One of the examples cited is a TED talk by author Elizabeth Gilbert, show here:

Here’s how you enable the auto-translation — hover your mouse over the Closed Caption icon and click the Translate Captions link.

I found the language-selection overlay (shown below) challenging to scroll through. But I suspect this feature will be automated eventually, similar to how Google’s Chrome browser has automated translation based on your language setting.

What I find interesting about the Gilbert talk is that TED has recruited its own army of translators — human translators — to do the same thing but in higher quality.

Here is the TED-translated version of the same talk:

I think it’s safe to assume that the volunteers are going to offer a much higher-quality translation of the video. But TED does not (yet) support the breadth of languages that Google supports. So while TED has the advantage in quality, Google has the advantage in languages.

But the larger is to what extent Google will make the TED-translated video as easy to find as its own YouTube version.

I did a Google search today and both videos emerged at the top of the results:

I believe this scenario raises a few interesting issues that will need to be addressed in the years ahead:

  • How to easily differentiate between content that has been machine translated vs. human translated
  • How to quickly discover which content is available in which languages
  • Will the crowd continue to be as enthused about translating content by hand when Google  provides the same service, albeit in lower quality, for free?

Is Google the best machine translation engine? It depends…

Two weeks ago, I introduced Ethan Shen and his project to analyze the three major free machine translation (MT) engines — Google, Microsoft, and Yahoo! Babelfish — by relying on translator reviews.

Ethan has provided me with a mid-point summary of results, which I’ve included below. I was surprised to find that Microsoft and Babelfish are beating Google on some languages pairs, as well as on shorter text strings. Although Google is emerging the overall winner — and receiving some much-deserved attention from the media — it’s nice to see some healthy competition.

That said, quality is only one piece of the puzzle. The other piece — perhaps much more important — is usability. Now that Google has embedded its MT engine into Gmail and Reader — and now its Chrome client –I find I’m using Google exclusively as my MT engine.

Here are Ethan’s findings so far (emphasis mine):

At the highest level, it appears that survey participants prefer Google Translate’s results across the board.

In a few languages (Arabic, Polish, Dutch) the preference is overwhelming with votes for Google doubling its nearest competitor

However, once you remove voters that have self defined their fluency in the source or target language as “limited,” the contest becomes closer along some of the heavily trafficked languages. For example:

  • Microsoft Bing Translator leads in German
  • Yahoo! Babelfish leads in Chinese
  • Google maintains its lead in Spanish, Japanese, and French

Observing only the self-defined “limited fluency” voter reveals a strong brand bias. If your fluency in the target translation language is limited, it would stand to reason your ability to assess the quality of the translation is very limited. And yet…

  • Limited-fluency voters chose Google over Bing by 2 to 1
  • They also chose Google over Yahoo! Babelfish by 5 to 1

As I had guessed, Yahoo! and Microsoft’s hybrid rules-based MT model performed better on shorter text passages

For phrases below 50 characters, Google’s lead in Spanish, Japanese, and French disappear. And Microsoft’s lead in German widens.

Beyond 50 characters, Google’s relative performance seems to improve across the board.

For passages that are only one sentence, the same effect is seen, though to a lesser extent than under 50 characters.

On March 4th, we made a few changes to our survey – hiding the brands and randomizing the positions of the text results before voting.  Since then, we have not yet collected enough data to draw conclusions, but Babelfish seems to be receiving the biggest boost, perhaps showing the effects of the recent neglect of that tool.

Clearly, Ethan needs more data to arrive at more concrete conclusions. If you’re a translator and you want to lend a hand, here is the voting site.

PS: Here’s an interview with Google’s MT guru Franz Josef Och.

What’s the best free machine translation engine?

Google Translate is the first place I turn for free machine translation (MT), mostly because it supports the greatest number of language pairs. I use Microsoft Translator as well, but usually only when I want to compare engines. I haven’t used Babel Fish in years.

But which engine offers the highest quality translations? I’m assuming Google, but this is only based on anecdotal feedback and personal experience.

Years ago, IBM developed an algorithmic method of measuring MT quality known as the BLEU score. Google scored well here, but the BLEU score is not without its critics.

Translation, like writing itself, is as much an art as it is a science.

Which is why translators are best positioned to judge the quality of machine translation engines. And although even translators are going to disagree as well, if you get enough of them together, perhaps you can begin to draw statistically significant conclusions.

Enter Ethan Shen and his start-up venture Gabble On.

Ethan has set out to recruit a few thousand volunteer translators to compare the three free translation engines. He asked me to help get out the word. He promises that he will publish the results for all to see. He’s also offering a free Apple iPad to one lucky volunteer. I have no financial interest in the project. I’m just curious to see what engine comes out on top.

Here are the details from Ethan:

We are seeking functional to fluent speakers of any two languages to take 5 minutes to judge and submit their opinion in our dynamic comparison engine (until March 29, 2010). At the end of the 6 week voting period, we will be publishing our results publicly in hopes that our research can to contribute meaningfully to the body of knowledge in this field.

In gratitude for your participation, we are awarding one new Apple iPad to a lucky participant. The survey can be found at: www.gabble-on.com/SurveySelector.aspx.

Which engine do you think is best?

Haitian Creole is now a machine translation staple

In response to the earthquake in Haiti, Microsoft quickly expanded its machine translation engine to include Haitian Creole.

Today I noticed that Google has an alpha version of its Haitian Creole engine as well.

Though it’s sad that it took a natural disaster to spur attention to a particular language, I’m glad to see the language available.

It’s hard to underestimate the importance of readily accessible machine translation. Just as search engines help us better understand the world, machine translation engines help us better understand one another.

And, yes, they’re far from perfect. But they’re far better than nothing at all. And they are finding their way into countless applications and countless fixed and mobile devices, each additional language offering another glimpse into another world.