Translation memory goes open source: An interview with Smith Yewell of Welocalize

Translation memory helps companies re-use previously translated text, improving consistency and potentially saving money.

But translation memory requires using translation memory software, which has for years largely meant using SDL Trados software.

When a company hires a translation agency and requires that they use translation memory — not only must that agency have Trados software, but so too must the freelance translators — who are often located all around the world. This is a nice business model for SDL, but it has been a pain point for translators and agencies for years.

For agencies, the more acute pain point has been that SDL not only sells TM software but also sells translation services. Nearly every translation exec I have spoken to has openly asked for an open-source alternative to Trados.

Well, now we have one.

IBM has partnered with LISA (Localization Industry Standards Association), Welocalize, Cisco, and Linux Solution Group e.V. (LiSoG) to launch an open source project that provides a “full-featured, enterprise-level translation workbench environment for professional translators.”

It’s called Open TM2 — and it’s basically a scaled-down version of what IBM has developed and used internally for years. I haven’t used the product yet and there’s understandably quite a bit of work involved to get this software to a point where it’s easy for translators, agencies, etc. to consume.

I’m not prepared to say Open TM2 is going to put an end to Trados. After all, Linux didn’t exactly put Windows or OSX out of business. But I am excited to see it out there in the world. Open source keeps software vendors on their toes. I’ll be very curious to see if developers embrace the code,  and what they come up with.

To learn more, I interviewed one of the partners behind Open TM2, Smith Yewell, CEO of Welocalize.

Here is what he had to say:

Q: Why did IBM decide to open source its software in this fashion? What does it hope to gain?

Bill Sullivan can answer this question better than I, but as he stated, “Freelance translators are the backbone of the localization industry. These translators have longed for free and open translation tools to increase their productivity. There is a recognized and growing need for standards in the localization industry. Despite our best intentions, however, standards themselves can often be vague and open to multiple interpretations. What is needed are reference implementations and reference platforms that serve as concrete and unambiguous models in support of the standard.”

In my opinion, productivity and standardization go hand-in-hand. By releasing Open TM2 as an open source product with a standards-based, data-exchange goal, not only is there potential for increased productivity – flexibility and freedom of choice also increase.

Q: And what do you hope to gain from this effort?

I like to use the mobile phone analogy. I can travel just about anywhere in the world, turn my phone on, and it works. This is possible, because competing carriers and hardware manufacturers collaborated to be able to offer that seamless user experience across global networks and handset protocols. Consider the user experience in our industry. There is really no ability for a client to turn on a translation supply chain and have it work out of the box across various content types, tools and translation vendors. The clients I speak with are demanding that this change.

GlobalSight, Joomla and Open TM2 are being used to demonstrate an example of a seamless data exchange based upon a set of standards. LISA will play an important role in documenting and sharing these standards so that they can be applied uniformly to other integrations. To put it simply, we need a variety of tools to be able to talk to each other in an automated way. This is where I think we can improve time, cost and quality results and greatly improve the user experience. Ultimately, I expect Welocalize to gain an increase in productivity, interoperability and freedom of choice in configuring the best set of tools for each client’s unique translation supply chain needs. If we can get under the hood, we can tune the engine; otherwise, it is becoming increasingly difficult to gain time, cost and quality advantages from the old way of doing business.

Q: Who is going to use this software? And what software will it replace?

Many translators are already using TM2 in delivering work to IBM. I expect Open TM2, as its features grow, will appeal to more translators as a desktop workbench. This is only an initial release of the open source product, and there is much work to be done. But the potential is there to collaborate and improve. Ultimately, I think Open TM2 has the potential to replace the Trados desktop workbench.

Q: When you talk open source, stability and support are common pain points. Who will be actively supporting this effort?

The members of the Steering Committee are currently supporting the effort, and the goal is to build a community which can support itself. This open source initiative is not unlike others, what one puts into it will determine the benefits one can pull from it. I wouldn’t be surprised to see a company create a business model to offer Open TM2 support. Support, training and customization are typical services that bloom around open source initiatives.

Q: What would stop a technology company from taking the source code and creating a competitive ™ product?

It is an open source product, so there is potential for companies to build a business model around the product. However, I doubt that will be a proprietary fork of the code. The appeal is an open source product with growing standards compliance, not yet another proprietary product. What is more likely are support, training and integration services. Anyone investing in the product naturally expects a return, and the better the return, the more healthy and diverse will be the community. I think that is a good thing. Competition drives innovation. However, if we can’t get the standard data-exchange protocols right, productivity across the supply chain will continue to lag the increasing velocity of change in the marketplace. Rapidly evolving time, cost and quality demands already exceed what the traditional translation supply chain can deliver.

Q: The source code is available now but documentation is lacking. What is your timetable for launching a more translator and agency friendly product.

I think the first step for the Steering Committee is to take the feedback that is already coming in about the product, good and bad, and use that to set priorities, responsibilities and a timeline. The idea is sound, but it must be tested in practical use and refined according to what the market really needs. Translators have the answers to many challenges in our supply chain, they are just not asked very often.

Q: How will this software be integrated? Is there is a goal of integrating it with the open source GlobalSight CMS?

Content creation, translation, workflow and performance metrics reporting – there are many systems and tools for accomplishing each of these requirements. However, very few of them can pass necessary data in an automated way. A lot can be accomplished with web services and open APIs, but widespread integration possibilities can only be realized with a critical mass actively using an industry-supported data-exchange standard.

In order to demonstrate this possibility in a live use case scenario, Joomla, GlobalSight and Open TM2 will be integrated with the resultant standards published by LISA. I think additional standards organizations will also need to participate to gain wider understanding, agreement and adoption. If enough of the industry’s thought leaders and leading practitioners get behind this standard data-exchange and tools integration challenge, I think all boats will rise. Without it, the industry will never be able to approach the growing volume of content which current production and cost models can’t support.

Link: Open TM2

TAUS drops membership fees, finally

The Translation Automation User Society has always been an organization that I’ve admired more in theory than in practice.

That is, I admire the organization’s goal of broadly sharing translation memories (TM).

But I’ve been less than enthusiastic about how this organization operates.

TAUS always felt a bit like a country club — in which only a few large players could afford to join and its inner workings kept top-secret. TAUS caught some flack awhile back from trying to prevent its attendees from tweeting its conference sessions. It’s this culture of secrecy that has always bothered me. For translation memory sharing to go mainstream, you need to raise awareness significantly. You need lots of evangelists embedded in companies large and small.

So I’m happy to see TAUS lowering admission fees for its Data Association. TAUS writes:

The annual associate level fee has come down from €625 to €250. Professional membership has been reduced from €75 to €50 and allows individuals to download 10 times the amount of data that is uploaded.

Many executives that I speak with still do not see the value of sharing previously translated text strings (if this is on their radar to begin with). And if you don’t see the value in sharing TM, you sure as heck aren’t going to throw money at it.

More important, you’re not going to throw money at membership fees for something you don’t understand.

By lowering the fees, not only do you expand your organization to smaller players, you lower the barrier for larger organization that may not yet see the value of participating.

It will be interesting to see how membership evolves based on this change. The last I checked, current membership stands at roughly 70 corporate members.

Interview: Lionbridge and IBM seek to expand “real time” translation

As readers of this blog well know, I’ve been bullish on machine translation for quite some time. Way back in 2004, I wrote:

Note to translators: I’m not implying you’ll be out of business anytime soon. But I am saying that machine translation (MT) is going to find its niche and this niche will grow exponentially. There is simply not enough translators in the world to handle the content necessary in this increasingly global economy.

Enter Lionbridge and IBM.

In March, the two companies inked a multi-year partnership in which Lionbridge would have exclusive rights to market IBM’s Real Time Translation Service (RTTS) technology.

I recently asked Lionbridge COO Satish Maripuri about the deal.

Here is the interview:

Q: It appears that Lionbridge is trying to replace the legacy term “machine translation” with Real-Time Translation. Why do you think this new term is better?

Real-Time Translation is a more accurate term for the solution. We see machine translation as a technology that enables the solution.  Real time instantaneous translation is the solution. Also, within the localization industry, Machine Translation typically refers to using productivity tools to offset the cost and time associated with translation and usually ends with a heavy post edit (PE) to get the content to publication quality. That is different than instantaneous real time translation that delivers “good-enough” translation without post edit if a customer so desires.

Q: When you talk to companies about machine translation, what types of content are they most excited about leveraging through your MT engine?

The customer excitement is remarkable. This one announcement created more in-bound interest than any announcement in our history. Organizations are most excited about translating the enterprise content that they aren’t translating today due to cost and time associated with the traditional localization process. Examples include: user generated content, research  reports, eSupport, social media, knowledge bases, website content and real time instantaneous chat/email communication.

Q: We still live in a “cost per word” translation ecosystem. Do you see real-time translation as the beginning of the end of the per-word pricing model?

Details around the new pricing model will be forthcoming, but it will follow a SaaS model subscription fee and/or seat license for certain applications. This will be different than the traditional per-word pricing model.  Time will tell whether this will lead to a change in the way organizations view all translation pricing. But for real time translation technology, SaaS-based subscription pricing is clearly the right model.

Q: In my view, Google has done more than any language provider to demonstrate that machine translation has a valuable role to play in global communications. Is there any concern at Lionbridge, that Google might open up its MT engine to companies via its Apps platform?

Google’s automated translation tool is a highly visible application.  And you are right in that it creates awareness of the opportunity for automated translation. There are applications for Google’s technology, specifically in its ability to enhance search.  Our focus is on enterprise content – which is a different application for automated translation in that it requires higher levels of quality and utility within the enterprise.

For the last five years, Lionbridge has been using and continuously developing our translation management platform — Translation Workspace. This technology combined with IBM’s real-time translation technology will allow us to customize the engine to our customer-specific domains to provide levels of quality that far surpass any freeware translation technology. This customization combined with cloud availability through Translation Workspace differentiates our tool from freeware tools and creates a  highly valuable application for the enterprise.

Q: TAUS has been critical of the Lionbridge/IBM alliance as an attempt to “lock in” users via ownership of the translation resources. TAUS has called on Lionbridge to open up your data. What is Lionbridge’s response?

Customers who use Lionbridge’s real time translation technology are not locked in to Lionbridge for any service — post edit or other traditional managed service translation. We are only providing our customers with a technology application to support real-time multilingual communication.  As such, our customers would simply license the technology to support real time translation. If they choose to post edit the result, they can use any service provider they choose.

Q: What do you estimate will be the ratio of human translated content to machine translated content in a typical company — say from today to five years from now?

As machine translation improves over time we believe it will be used more frequently, especially on dynamic user-generated content. We also believe over the next ten years we are going to see a shift from “Just in Case Translation” — just in case someone happens to read to “Just in Time Translation” — translation after someone shows interest.

In addition, we believe that over the next five to ten years, there will be more acceptance in the market for “good-enough” translation. Therefore, it would not be unreasonable to see a larger percentages of enterprise content translated using machine translation or Real-Time Translation technology.

For more information:

Google, Bing and Babelfish: What’s the best translation engine?

Two months ago I wrote about an effort to evaluate the quality of the three major free machine translation (MT) engines:

  • Google Translate
  • Bing (Microsoft) Translator
  • Yahoo! Babelfish

Ethan Shen has wrapped up the project, soliciting input from more than 1,000 reviewers. He summed up his findings here.

Here are the findings that jumped out at me:

  • Google wins, hands down, translating longer text passages. No big surprise here.
  • Bing and Babelfish are competitive translating shorter texts (150 or fewer characters). Bing did quite well with Italian and German, while Babelfish did well with Chinese.
  • Google’s brand trumps all. About halfway through his test, Ethan removed the brand names from the search engines, so the reviewers did not know which engine was doing which translation. The change in results was significant. Reviewers were 21% more likely to say Google was better than Microsoft when they knew the brand names. And reviewers were 136% more likely to say Google was better than Babelfish.

This last finding is what poses the greatest hurdle for Microsoft and Yahoo!

When it comes to machine translation — perception is (almost) everything. If people think you’re the best translation engine, then you are the best.

Integration is the other key element of success, and Google Translate is doing well here also — I absolutely love the Chrome browser integration.

Ethan is not done with his research. This is only stage one. To help him with stage two, click here.