Twitter launches translation crowdsourcing, again

Twitter went live with its newly updated translation center today. This is the second iteration of the platform; it first launched in October 2009, but was closed less than a year after for an overhaul.

I gave it a quick tour. A number of people were complaining (via Twitter naturally) about the slowness of the site. But it was fast enough on my end.

There are nine target languages as of today (six of which are already live). The three new languages are Indonesian, Russian, and Turkish. It’s fascinating to see Indonesian and Turkish as part of this first batch of languages — ahead of, say, Dutch or Swedish. Twitter is simply going where the users are — and Twitter is HUGE in Indonesia and Turkey.

Also, not surprisingly, Chinese is NOT on the list of target languages.

Overall, I liked the new design. The language translation interface is similar in many ways to Facebook’s UI. But what I found most intriguing (see above) as how the home page segments the text strings by platform (Android, Twitter.com, iPhone) as well as audience and content type (Business, Open Source, and Help).

If you’re wondering why Twitter.com text strings are handled differently than iPhone text strings, consider the platforms. On a PC, you have a good deal more real estate to work with. On a mobile device, you may only have a fraction of that real estate, which would require a much-shorter text string. So you could have the same message translated differently depending on the target device or application.

Finally, I thought I’d share the “opt in” text that Twitter presents potential volunteer translators. I like the fact that Twitter is up front with users in that they are giving away their time and text for free. Though I’m not sure how Twitter plans to enforce the confidentiality rule:

  • Since you’ll be helping out Twitter (thanks again!) we want to let you know our ground rules. Please read the full agreement below before continuing. Here are some of the things you can expect to see:
  • We may show you confidential, yet to be released products or features and you must be willing to keep those secret.
  • You’ll be volunteering to help out Twitter and will not be paid.
  • Twitter owns the rights to the translations you provide. You are giving them to us so that we can use them however we want. Among other things, Twitter plans to share the translations with the Twitter development community. We want to help make all of the other great Twitter apps, not just Twitter.com, available in your language.

Now that Twitter has its new platform, will it match the record set by Facebook awhile back — translating 70 languages in less than 18 months?

The next Internet revolution will not be in English

This visual depicts about half of the currently approved internationalized domain names (IDNs), positioned over their respective regions.

Notice the wide range of scripts over India and the wide range of Arabic domains. I left off the Latin country code equivalents (in, cn, th, sa, etc.) to illustrate what the Internet is going to look like (at a very high level) in the years ahead.

This next revolution is a linguistically local revolution. In terms of local content, it is already happening. Right now, more than half of the content on the Internet is not in English. Ten years from now, the percentage of English content could easily drop below 25%.

But there are a few technical obstacles that have so far made the Internet not as user friendly as it should be for people in the regions highlighted above. They’ve been forced to enter Latin-based URLs to get to where they want to go. Their email addresses are also Latin-based. This will all change over the next two decades.

For those of us who are fluent only in Latin-based languages, this next wave of growth is going to be interesting, if not a bit challenging. In a Latin-based URL environment, you can still easily navigate to and around non-Latin web sites and brands. For example, if I want to find Baidu in China, I can enter www.baidu.cn. For Yandex in Russia, it’s yandex.ru.

But flash forward a few years and these Latin URLs (though they’ll still exist) may no longer function as the front doors into these markets.

Try Яндекс.рф. It currently redirects to Yandex.ru.

In a few years, I doubt this redirection will exist.

We’re getting close to a linguistically local Internet — from URL to email address. There are still significant technical obstacles to overcome. It will be exciting to see which companies take the lead in overcoming them — as these companies will be well positioned to be leaders in these emerging markets.

UPDATE: I’ve expanded on this topic in a recent article on IP Watch.

Amazon’s Kindle goes multilingual

The Kindle 3 was announced last evening.

The big news about the device is the price — starting at $139. You could argue that this is the first mass-market e-reader.

Of course, going truly mass market means going multilingual.

Last year, I asked where was Kindle’s support for non-Latin characters.

I was happy to find this morning, buried in the product description for the Kindle 3, this product blurb:

Support for New Characters
Kindle can now display Cyrillic (such as Russian), Japanese, Chinese (Traditional and Simplified), and Korean characters in addition to Latin and Greek scripts.

This is great to see. I guess asking for bidi support (Arabic and Hebrew) would have been a bit too much.

PS: I’ve got a book on the Kindle now — though only in plain ol’ Latin script. Still, this is great news for when my book is translated into Russian, Japanese, etc. I can dream…

Translation memory goes open source: An interview with Smith Yewell of Welocalize

Translation memory helps companies re-use previously translated text, improving consistency and potentially saving money.

But translation memory requires using translation memory software, which has for years largely meant using SDL Trados software.

When a company hires a translation agency and requires that they use translation memory — not only must that agency have Trados software, but so too must the freelance translators — who are often located all around the world. This is a nice business model for SDL, but it has been a pain point for translators and agencies for years.

For agencies, the more acute pain point has been that SDL not only sells TM software but also sells translation services. Nearly every translation exec I have spoken to has openly asked for an open-source alternative to Trados.

Well, now we have one.

IBM has partnered with LISA (Localization Industry Standards Association), Welocalize, Cisco, and Linux Solution Group e.V. (LiSoG) to launch an open source project that provides a “full-featured, enterprise-level translation workbench environment for professional translators.”

It’s called Open TM2 — and it’s basically a scaled-down version of what IBM has developed and used internally for years. I haven’t used the product yet and there’s understandably quite a bit of work involved to get this software to a point where it’s easy for translators, agencies, etc. to consume.

I’m not prepared to say Open TM2 is going to put an end to Trados. After all, Linux didn’t exactly put Windows or OSX out of business. But I am excited to see it out there in the world. Open source keeps software vendors on their toes. I’ll be very curious to see if developers embrace the code,  and what they come up with.

To learn more, I interviewed one of the partners behind Open TM2, Smith Yewell, CEO of Welocalize.

Here is what he had to say:

Q: Why did IBM decide to open source its software in this fashion? What does it hope to gain?

Bill Sullivan can answer this question better than I, but as he stated, “Freelance translators are the backbone of the localization industry. These translators have longed for free and open translation tools to increase their productivity. There is a recognized and growing need for standards in the localization industry. Despite our best intentions, however, standards themselves can often be vague and open to multiple interpretations. What is needed are reference implementations and reference platforms that serve as concrete and unambiguous models in support of the standard.”

In my opinion, productivity and standardization go hand-in-hand. By releasing Open TM2 as an open source product with a standards-based, data-exchange goal, not only is there potential for increased productivity – flexibility and freedom of choice also increase.

Q: And what do you hope to gain from this effort?

I like to use the mobile phone analogy. I can travel just about anywhere in the world, turn my phone on, and it works. This is possible, because competing carriers and hardware manufacturers collaborated to be able to offer that seamless user experience across global networks and handset protocols. Consider the user experience in our industry. There is really no ability for a client to turn on a translation supply chain and have it work out of the box across various content types, tools and translation vendors. The clients I speak with are demanding that this change.

GlobalSight, Joomla and Open TM2 are being used to demonstrate an example of a seamless data exchange based upon a set of standards. LISA will play an important role in documenting and sharing these standards so that they can be applied uniformly to other integrations. To put it simply, we need a variety of tools to be able to talk to each other in an automated way. This is where I think we can improve time, cost and quality results and greatly improve the user experience. Ultimately, I expect Welocalize to gain an increase in productivity, interoperability and freedom of choice in configuring the best set of tools for each client’s unique translation supply chain needs. If we can get under the hood, we can tune the engine; otherwise, it is becoming increasingly difficult to gain time, cost and quality advantages from the old way of doing business.

Q: Who is going to use this software? And what software will it replace?

Many translators are already using TM2 in delivering work to IBM. I expect Open TM2, as its features grow, will appeal to more translators as a desktop workbench. This is only an initial release of the open source product, and there is much work to be done. But the potential is there to collaborate and improve. Ultimately, I think Open TM2 has the potential to replace the Trados desktop workbench.

Q: When you talk open source, stability and support are common pain points. Who will be actively supporting this effort?

The members of the Steering Committee are currently supporting the effort, and the goal is to build a community which can support itself. This open source initiative is not unlike others, what one puts into it will determine the benefits one can pull from it. I wouldn’t be surprised to see a company create a business model to offer Open TM2 support. Support, training and customization are typical services that bloom around open source initiatives.

Q: What would stop a technology company from taking the source code and creating a competitive ™ product?

It is an open source product, so there is potential for companies to build a business model around the product. However, I doubt that will be a proprietary fork of the code. The appeal is an open source product with growing standards compliance, not yet another proprietary product. What is more likely are support, training and integration services. Anyone investing in the product naturally expects a return, and the better the return, the more healthy and diverse will be the community. I think that is a good thing. Competition drives innovation. However, if we can’t get the standard data-exchange protocols right, productivity across the supply chain will continue to lag the increasing velocity of change in the marketplace. Rapidly evolving time, cost and quality demands already exceed what the traditional translation supply chain can deliver.

Q: The source code is available now but documentation is lacking. What is your timetable for launching a more translator and agency friendly product.

I think the first step for the Steering Committee is to take the feedback that is already coming in about the product, good and bad, and use that to set priorities, responsibilities and a timeline. The idea is sound, but it must be tested in practical use and refined according to what the market really needs. Translators have the answers to many challenges in our supply chain, they are just not asked very often.

Q: How will this software be integrated? Is there is a goal of integrating it with the open source GlobalSight CMS?

Content creation, translation, workflow and performance metrics reporting – there are many systems and tools for accomplishing each of these requirements. However, very few of them can pass necessary data in an automated way. A lot can be accomplished with web services and open APIs, but widespread integration possibilities can only be realized with a critical mass actively using an industry-supported data-exchange standard.

In order to demonstrate this possibility in a live use case scenario, Joomla, GlobalSight and Open TM2 will be integrated with the resultant standards published by LISA. I think additional standards organizations will also need to participate to gain wider understanding, agreement and adoption. If enough of the industry’s thought leaders and leading practitioners get behind this standard data-exchange and tools integration challenge, I think all boats will rise. Without it, the industry will never be able to approach the growing volume of content which current production and cost models can’t support.

Link: Open TM2