Think your translator is cutting corners? Try the machine translation detector…

Lior Libman of One Hour Translation has released a web tool that you can use to quickly determine if text was translated by one of the three major machine translation (MT) engines: Google Translate, Yahoo! Babel Fish, and Bing Translate.

It’s called the Translation Detector.

To use it, you input your source text and target text and then it tells you the probability of each of the three MT engines being the culprit.

How does it know this? Simple. Behind the scenes it takes the source text and runs it through the three MT engines and then compares the output to your target text. So the caveat here is that this tool only compares against those three MT engines.

Being the geek that I am, I couldn’t help but give it a test drive.

It correctly guessed between text translated by Google Translate vs. Bing Translate (I didn’t try Yahoo!). Below is a screen shot of what I found after inputing the Google Translate text:

Next, I input source and target text that I had copied from the Apple web site (US and Germany). I would be shocked if the folks at Apple were crunching their source text through Google Translate.

And, sure enough, here’s what the Translation Detector spit out:

So if you suspect your translator is taking shortcuts with Google Translate or another engine, this might be just the tool to test that theory.

Though in defense of translators everywhere, I’ve never heard of anyone resorting to an MT engine to cut corners.

I actually see this tool as part of something bigger — the emergence of third-party tools and vendors that evaluate, benchmark, and optimize machine translation engines. Right now, these three engines are black boxes. I wrote awhile back of one person’s efforts to compare the quality of these three engines. But there are lots of opportunities here. As more people use these engines there will be a greater need for more intelligence about which engine works best for what types of text. And hopefully we’ll see vendors arise that leverage these MT engines for industry-specific functions.

UPDATE: As the commenters noted below, there are limits to the quality of results you will get if you input more than roughly 130 words. The tool is limited by API word-length caps.

The Top 25 Global Web Sites of 2011

I’m pleased to announce the publication of the 2011 Web Globalization Report Card. This year, we reviewed 250 web sites across 25 industries. The web sites represent nearly half of the Fortune 100 and nearly all of the Interbrand Global 100.

Out of these 250 sites, here are the top 25 overall:

Google, which has held the number one spot for years, was unseated by Facebook this year. Facebook’s recent innovations (multilingual social plugins, improved global gateway, multilingual user profiles) gave it the edge. (I’ve devoted a separate report to Facebook’s innovations.)

Companies like 3MCiscoPhilips, and NIVEA have become regular faces in the top 25. But there are some new faces as well. There are five companies new this year to the top 25: Volkswagen, Adobe, Shell, Skype, and DHL.

Although these 25 web sites represent a wide range of industries, they all share a high degree of global consistency and impressive support for languages. They average 58 languages — which is more than twice the average for all 250 sites reviewed.

The average number of languages supported by  all 250 web sites is 23, up from 22 last year. As the visual below illustrates, language growth over the years has been amazing. Seven years ago, I was thrilled to find a web site with more than 20 languages. Today, 20 languages is below average.

Language is just one element of web globalization, but it is the most visible element. When a company adds a language, it is making its global expansion plans known. If you want to know where your competitors are betting on growth, spend some time looking at their local web sites. More than twenty companies added four or more languages over the past 12 months.

Fast-growing languages on the Internet include Hungarian, Turkish, Indonesian, and Russian. Here is where Russian stands today — now found on nearly 8 of 10 web sites:

In the Report Card, languages account for 25% of a web site’s score. We also evaluate a web site’s depth and breadth of local content, the effectiveness of the global gateway, and overall global consistency. Beginning in 2010, we have also begun tracking how companies promote local social platforms such as Facebook and Twitter around the world. Our goal was not only to highlight the leaders in language but to identify those web sites and services that were globally “well rounded” as well as innovative.

The top 25 web sites are not perfect. The Report Card details many ways these sites could be improved (including Facebook and Google). That said, the executives who manage these web sites and services deserve a great deal of credit. As someone who has worked as both a consultant and an employee at companies such as these, I know how challenging it can be to get the funding to add languages and staff and to educate various teams on the many complexities of web globalization. While it may be the company names that appear on the top 25 list, it is the hundreds of passionate and bright people who got them there.

Congratulations!

Previewing the 2011 Web Globalization Report Card

I’ve begun work on the 7th edition of the Report Card. To produce this report I individually review more than 200 global web sites across more than 20 industries. Needless to say, I’ve got a busy month ahead!

I’ve already done a first pass on a number of web sites and have some initial thoughts to share:

  • As regular readers know, Google and Facebook finished in a dead heat for first place last year, with Google having a slight advantage. Both companies made significant changes over the past twelve months, changes that promise to make this another photo finish.
  • I’ve noticed an increase in the number of sites using geolocation for navigation. Unfortunately, some of these sites are not using geolocation as well as they should. As I’ve noted in my book, geolocation should never be used without a visual global gateway in place. Geolocation is an excellent tool, but it presents a number of edge cases that only a global gateway can solve.
  • I’ve seen some amazing global gateways so far, and, in some cases, demonstrating vast improvements over previous global gateways. I’ll be documenting a number of these gateways in the report.
  • Companies continue to add languages. After initial analysis, Indonesian is hot, as is Russian and Turkish. Last year, the average number of languages was 20. I suspect we’ll see increase again this year. Keep in mind that this is just the average. Companies like Cisco, Apple, and DHL are well above 20 languages.
  • For last year’s report, I began measuring “community localization” — the integration of social networking platforms into local web sites. I wasn’t just looking at Twitter and Facebook use around the world, but at how companies are fostering communities. I’ve noticed quite a lot of Facebook integration around the world. Below is a home page visual from Samsung Italy:
  • Samsung also promotes its Twitter feed on the home page of its Brazil site. And Samsung is far from alone.
  • Finally, I’m noticing lots and lots of web site surveys.They’re popping up everywhere and in many languages. Somebody please make them stop!

Here is the link to the 2010 Report Card. All companies included in this report will be included in 2011. We’ll have a page for the 2011 report up shortly.

Translation Sharing is Caring

The TAUS Data Association (TDA) was founded about two years ago with the goal of creating a widely used and trusted platform for companies to share their translation memories. The idea was that companies could achieve greater cost savings and greater quality if they worked together and reciprocally shared their previous translated text strings. For companies within a given industry, the cost savings can quickly add up.

I received their update newsletter this week. After 18 months in operation, TDA reports:

  • TDA membership has doubled to 90 members
  • Database volume has grown to 3.2 billion words in 320 language pairs
  • 50,000 searches per month on the free TAUS Search
  • Over 12 billion words downloaded by members to train MT engines, and improve services and tools
  • Free open APIs used by members and non-members to integrate their tools and services

Between the APIs and the more affordable membership fees, TAUS appears to be making the right moves to not only expand its membership but expand the reach of its platform. With the major translation vendors (and Google) offering proprietary platforms, it’s nice to see an independent alternative. But more important, it’s nice to see companies sharing. It wasn’t very long ago that the idea of sharing TMs was considered on par with divulging corporate secrets. But sharing of TMs and the building up of large-scale databases of translated strings will provide the foundation for some really innovative (and hopefully accessible) products and services, should TAUS wish to pursue them. It will be interesting to see what develops…

Link: TAUS Data Association