The next Internet revolution will not be in English

This visual depicts about half of the currently approved internationalized domain names (IDNs), positioned over their respective regions.

Notice the wide range of scripts over India and the wide range of Arabic domains. I left off the Latin country code equivalents (in, cn, th, sa, etc.) to illustrate what the Internet is going to look like (at a very high level) in the years ahead.

This next revolution is a linguistically local revolution. In terms of local content, it is already happening. Right now, more than half of the content on the Internet is not in English. Ten years from now, the percentage of English content could easily drop below 25%.

But there are a few technical obstacles that have so far made the Internet not as user friendly as it should be for people in the regions highlighted above. They’ve been forced to enter Latin-based URLs to get to where they want to go. Their email addresses are also Latin-based. This will all change over the next two decades.

For those of us who are fluent only in Latin-based languages, this next wave of growth is going to be interesting, if not a bit challenging. In a Latin-based URL environment, you can still easily navigate to and around non-Latin web sites and brands. For example, if I want to find Baidu in China, I can enter www.baidu.cn. For Yandex in Russia, it’s yandex.ru.

But flash forward a few years and these Latin URLs (though they’ll still exist) may no longer function as the front doors into these markets.

Try Яндекс.рф. It currently redirects to Yandex.ru.

In a few years, I doubt this redirection will exist.

We’re getting close to a linguistically local Internet — from URL to email address. There are still significant technical obstacles to overcome. It will be exciting to see which companies take the lead in overcoming them — as these companies will be well positioned to be leaders in these emerging markets.

UPDATE: I’ve expanded on this topic in a recent article on IP Watch.

Gruber gives up on his ✪ IDN

Tech pundit John Gruber threw in the towel on his domain ✪df.ws.

He writes:

What I didn’t foresee was the tremendous amount of software out there that does not properly parse non-ASCII characters in URLs, particularly IDN domain names. Twitter clients (including, seemingly, every app written using Adobe AIR, which includes some very popular Twitter clients), web browsers (including Firefox), and, for a few months, even the Twitter.com website wasn’t properly identifying DF’s short URLs as links.

Worse, some — but, oddly, not all — of AT&T’s DNS servers for 3G wireless clients choke on IDN domains. This meant that even if you were using a Twitter client that properly supports IDN domains, these links stillwouldn’t work if your 3G connection was routing through one of AT&T’s buggy DNS servers.

There is still a lot of heavy lifting left to do among many software and hardware vendors before IDNs can go mainstream. Unless, of course, a country — say Russia or China — mandates their support and pushes the vendors along.

PS: I’ve updated my top-level IDN tracker.

Link

Chinese IDNs have arrived

ICANN gave approval to Chinese IDNs — for China, Taiwan, and Hong Kong.

This is a significant development — particularly since China was one of the major forces pushing ICANN to support IDNs.

To give you an idea of how these new IDNs are poised to change the Internet as we know it, I’ve overlayed the approved IDNs onto my Country Codes of the World map.

You’ll notice both simplified and traditional script IDNs for both China and Taiwan.

Here’s my running list of all IDNs that have passed string evaluation stage.

Adobe launches translation crowdsourcing in China

Facebook has demonstrated that you can crowdsource translations with high quality and rapid turnaround, leading many other companies to ask how they too can leverage the crowd to translate their content.

Enter Adobe and Lingotek.

Adobe has recently begun leveraging Lingotek’s software platform to enable the crowdsourcing of translations within China. As of now, there are 40 volunteer translators in China translating documentation.

Keeping in mind that this is a new and ongoing effort, I recently conducted a Q&A with Lingotek’s CEO Rob Vandenberg.

Here is the interview:

Q: What incentives did Adobe use to get Chinese users interested in translating content?

Adobe takes a very user-centric approach to volunteer translation. Instead of asking users to translate certain material, Adobe provides the content and tools for users to translate what they are interested in. They went to their user groups, and offered community translation as an opportunity. This allowed them to find people who were already interested in translating – whether because they are a reseller of the software, they want to put Adobe’s name on their résumé, or they are end-users who just want Adobe content in their language.

Q: Does the Lingotek platform stand alone or is it integrated into existing Adobe translation systems?

We have worked with Adobe to provide a number of integration points, including:

  1. Providing an API to allow community members to upload documents from an Adobe Flex application.
  2. Providing a version of our leaderboard that could be placed on the Adobe Groups site, as well as an API to get leaderboard data.
  3. Providing a version of our signup page that could be placed on the Adobe Groups site.

Q: How is quality managed with regard to the volunteers. Even Facebook relies on a vendor to ensure quality.

The primary means of producing quality translations in the Adobe communities is to limit who is allowed to participate. Adobe selects project managers who they can trust, and these people are in charge of determining which translators should be allowed to participate.

Q: Are the project managers Adobe employees in China? And are they effectively the gatekeepers for quality?

As I understand it, there is a Community Manager who is the interface between Adobe and the community, but the project management is all done by community members. The translated content is then given to the community, and they publish it.

In addition, the Lingotek platform allows for a number of tools which not only help translators to work faster, but improve the quality of the translations, including:

  • Shared Translation Memories
  • Translation Voting
  • Notes on each segment
  • Terminology tools

Q: How does Adobe get rapid turnaround using volunteers? Are deadlines used?

The speed of translation is affected most by letting volunteers translate the things that they want to translate. In addition, Adobe brings attention to the project managers and translators who have done the most work.

Q: How does Adobe deal with customers who assume that they should not be required to translate content themselves?

Adobe focuses on the users who are eager to help them to translate. They don’t try to recruit general end-users, and I think that is why they have avoided most of this criticism.

Q: Why is Adobe doing this exactly?

The main driving factor is Adobe’s community users are asking for translated content that isn’t in Adobe’s professional translation pipeline. By using Lingotek’s API’s and translation software and Adobe’s existing community to translate content were making new content available to Adobe users quicker and at a much lower price.

Q: How does Adobe license the Lingotek platform?

Lingotek is licensed on a concurrent user basis. We don’t share pricing information.

Q: Is this limited to only volunteers? That is, will the same platform be used not only for documentation but for product/software loc work?

The Lingotek platform is designed to support many different workflows. Some clients are using their communities to provide the initial translation, and then use internal reviewers to do the final review before publishing. Other clients use a traditional assigned workflow, without using community members.  In Adobe’s case, so far they are only using their community members.

For more information, here’s the Lingotek press release.