SignAll is slowly but surely building a sign language translation platform

Translating is difficult work, the more so the further two languages are from one another. French to Spanish? Not a problem. Ancient Greek to Esperanto? Considerably harder. But sign language is a unique case, and translating it uniquely difficult, because it is fundamentally different from spoken and written languages. All the same, SignAll has been working hard for years to make accurate, real-time machine translation of ASL a reality.

One would think that with all the advances in AI and computer vision happening right now, a problem as interesting and beneficial to solve as this would be under siege by the best of the best. Even thinking about it from a cynical market-expansion point of view, an Echo or TV that understands sign language could attract millions of new (and very thankful) customers.

Unfortunately, that doesn’t seem to be the case — which leaves it to small companies like Budapest-based SignAll to do the hard work that benefits this underserved group. And it turns out that translating sign language in real time is even more complicated than it sounds.

CEO Zsolt Robotka and chief R&D officer Marton Kajtar were exhibiting this year at CES, where I talked with them about the company, the challenges they were taking on, and how they expect the field to evolve. (I’m glad to see the company was also at Disrupt SF in 2016, though I missed them then.)

Perhaps the most interesting thing to me about the whole business is how interesting and complex the problem is that they are attempting to solve.

“It’s multi-channel communication; it’s really not just about shapes or hand movements,” explained Robotka. “If you really want to translate sign language, you need to track the entire upper body and facial expressions — that makes the computer vision part very challenging.”

[embedded content]

Right off the bat that’s a difficult ask, since that’s a huge volume in which to track subtle movement. The setup right now uses a Kinect 2 more or less at center and three RGB cameras positioned a foot or two out. The system must reconfigure itself for each new user, since just as everyone speaks a bit differently, all ASL users sign differently.

“We need this complex configuration because then we can work around the lack of resolution, both time and spatial (i.e. refresh rate and number of pixels), by having different points of view,” said Kajtar. “You can have quite complex finger configurations, and the traditional methods of skeletonizing the hand don’t work because they occlude each other. So we’re using the side cameras to resolve occlusion.”

As if that wasn’t enough, facial expressions and slight variations in gestures also inform what is being said, for example adding emotion or indicating a direction. And then there’s the fact that sign language is fundamentally different from English or any other common spoken language. This isn’t transcription — it’s full-on translation.

“The nature of the language is continuous signing. That makes it hard to tell when one sign ends and another begins,” Robotka said. “But it’s also a very different language; you can’t translate word by word, recognizing them from a vocabulary.”

SignAll’s system works with complete sentences, not just individual words presented sequentially. A system that just takes down and translates one sign after another (limited versions of which exist) would be liable to creating misinterpretations or overly simplistic representations of what was said. While that might be fine for simple things like asking directions, real meaningful communication has layers of complexity that must be detected and accurately reproduced.

Somewhere in between those two options is what SignAll is targeting for its first public pilot of the system, at Gallaudet University. This Washington, D.C. school for the deaf is renovating its welcome center and SignAll will be installing a translation booth there so that hearing people can interact with deaf staff there.

SignAll is slowly but surely building a sign language translation platform

Translating is difficult work, the more so the further two languages are from one another. French to Spanish? Not a problem. Ancient Greek to Esperanto? Considerably harder. But sign language is a unique case, and translating it uniquely difficult, because it is fundamentally different from spoken and written languages. All the same, SignAll has been working hard for years to make accurate, real-time machine translation of ASL a reality.

One would think that with all the advances in AI and computer vision happening right now, a problem as interesting and beneficial to solve as this would be under siege by the best of the best. Even thinking about it from a cynical market-expansion point of view, an Echo or TV that understands sign language could attract millions of new (and very thankful) customers.

Unfortunately, that doesn’t seem to be the case — which leaves it to small companies like Budapest-based SignAll to do the hard work that benefits this underserved group. And it turns out that translating sign language in real time is even more complicated than it sounds.

CEO Zsolt Robotka and chief R&D officer Marton Kajtar were exhibiting this year at CES, where I talked with them about the company, the challenges they were taking on, and how they expect the field to evolve. (I’m glad to see the company was also at Disrupt SF in 2016, though I missed them then.)

Perhaps the most interesting thing to me about the whole business is how interesting and complex the problem is that they are attempting to solve.

“It’s multi-channel communication; it’s really not just about shapes or hand movements,” explained Robotka. “If you really want to translate sign language, you need to track the entire upper body and facial expressions — that makes the computer vision part very challenging.”

[embedded content]

Right off the bat that’s a difficult ask, since that’s a huge volume in which to track subtle movement. The setup right now uses a Kinect 2 more or less at center and three RGB cameras positioned a foot or two out. The system must reconfigure itself for each new user, since just as everyone speaks a bit differently, all ASL users sign differently.

“We need this complex configuration because then we can work around the lack of resolution, both time and spatial (i.e. refresh rate and number of pixels), by having different points of view,” said Kajtar. “You can have quite complex finger configurations, and the traditional methods of skeletonizing the hand don’t work because they occlude each other. So we’re using the side cameras to resolve occlusion.”

As if that wasn’t enough, facial expressions and slight variations in gestures also inform what is being said, for example adding emotion or indicating a direction. And then there’s the fact that sign language is fundamentally different from English or any other common spoken language. This isn’t transcription — it’s full-on translation.

“The nature of the language is continuous signing. That makes it hard to tell when one sign ends and another begins,” Robotka said. “But it’s also a very different language; you can’t translate word by word, recognizing them from a vocabulary.”

SignAll’s system works with complete sentences, not just individual words presented sequentially. A system that just takes down and translates one sign after another (limited versions of which exist) would be liable to creating misinterpretations or overly simplistic representations of what was said. While that might be fine for simple things like asking directions, real meaningful communication has layers of complexity that must be detected and accurately reproduced.

Somewhere in between those two options is what SignAll is targeting for its first public pilot of the system, at Gallaudet University. This Washington, D.C. school for the deaf is renovating its welcome center and SignAll will be installing a translation booth there so that hearing people can interact with deaf staff there.

Bilingual? Tarjimly lets you help a refugee or aid worker right now


All over the world, language barriers are limiting the ability of refugees and immigrants to seek help, and aid workers to provide it. Tarjimly is a new service that connects people who speak one language but need to speak in another, with a person who speaks both — in just a couple minutes. They’re part of Y Combinator’s latest batch and are gearing up for a proper launch.

The company’s co-founders, Aziz Abdulaziz and Atif Javed, told me how the company emerged from a side project built while they worked at Palantir and Oracle, respectively. It was a year ago, when the tide of refugees streaming out of the Middle East was growing.

“We wanted to do something to help refugees at scale, and decided to use our engineering experience,” said Javed. “We actually announced the first early version of the product during the first Muslim ban a year ago — we got a great response because people were happy to have another way to help other than give money or send emails.”

“We signed up like 1,500 people in two days,” added Abdulaziz. “We decided to build a tech nonprofit to solve this problem, and quit in January.”

The basic problem is simply that there aren’t enough translators to go around, and the work they do can’t be delayed by the days or weeks it might take to find one; sometimes, as in cases where there’s imminent danger or critical logistical issues, it can’t be delayed by even an hour.

“Interpreters are a scarce resource and extremely expensive,” Abdulaziz said. Even for more commonly spoken languages like Spanish and German, they can run $80 per hour. “But then say you’ve got a family from Iraq right in front of you, and they speak Kurdish. your pool of resources is extremely limited. And then there’s even Kurmanji Kurdish, and Sorani Kurdish, all these dialects.”

You can’t stock every aid site or headquarters with dozens of interpreters, some of whom may only work a few hours a week. And relying on the local community (which some aid workers do) isn’t a good option either, since the populations are by definition the ones who need help, and anyway may not be around for long. That’s where Tarjimly comes in.

Quick turnaround translations

Right now Tarjimly is only on Facebook Messenger, but an independent, multi-platform app is on the way that will allow cross-platform chats, between Messenger or WhatsApp and SMS for instance. Using the chat interface, an aid provider or refugee indicates their own language and the language of the person with whom they need to speak.

Tarjimly scours its database of volunteers and using a bit of machine learning (naturally) it finds the users most likely to respond quickly. When it finds one, it connects the two through the chat interface; to make things easy and anonymous, the messages are relayed through Tarjimly’s servers, which both obscure the users’ IDs and allow cross-platform chats.

On the right, the requesting user’s screen (in Arabic), and on the left, the volunteer’s screen.

Once connected, the user can enter text or send voice messages; the volunteer just translates them and sends them back for the user to share with their interlocutor how they please. Audio and video chat can be requested, and documents and images can also be sent to the translators in case a quick consultation is necessary before signing something or waiting in a line.

The idea isn’t to guide people through major processes like immigration — dedicated interpreters are still needed for long interviews, technical language and so on — but to handle time-sensitive matters like distribution of food and water or explaining an event or injury.

“We’re focused on the real-time piece,” said Abdulaziz. “We want to bridge that gap between the refugee and the service provider.”

“Refugees are constantly interacting with aid workers,” said Javed. “These people need this all time — like literally every single aid worker needs this every day on the ground.”

Right now the service finds a match in an average of 90 seconds, and these acts of “micro-volunteering” usually only last a few minutes. 16 languages are currently supported (plus dialect variations), with a focus on those spoken by major refugee populations: English, Arabic, Persian, Pashto, urdu, Spanish, French, Greek, Italian, Bengali, Turkish, Somali, German, Portuguese, Kurdish, and Burmese.

The over 2,500 translators on the service have already helped over 1,000 refugees ahead of the company’s formal launch.

[embedded content]

(Wondering whether machine translation has a role here? The truth is it’s just not good enough in many cases. The co-founders worked and studied in this space during their time at MIT and their previous jobs, and are confident that it’s not ready for an application like this, either in its capabilities or its state of deployment. “Language understanding is still very very early stage,” Abdulaziz said.)

As for the possibility of bad translations, perhaps even intentional ones, Tarjimly does let users rate their experience, but Javed noted in a follow-up email that “We’re cultivating a strong community on FB where translators share feedback, ideas, and call out bad actors. One easy solution we have in mind is to use translators to QA each other.” But it hasn’t been a problem so far, he added.

Free where it matters

Tarjimly’s position as a nonprofit is a deliberate one; the company aims to fund the service through grants and donations in order to keep it free for refugees — a population that, while sizable and motivated, isn’t exactly ethical to monetize directly.

“We’re going to start off with a grant based model, but we want to create something that’s sustainable,” said Javed. “If we end up making a product that NGOs and governments are using, I have no doubt that if we go to them and say, ‘look, we want to keep this going,’ they’ll help.”

They consider the service’s ability to scale quickly for low cost a major asset.

“When you ask for grant money, the first thing they ask about is your efficiency — cost versus impact. And we kill at that,” explained Abdulaziz. “We provide an impact, it’s extremely low cost, and scales exponentially in good created. Not everyone can do that much through technology.”

“We want to approach this the same way a Silicon Valley company would approach building a product — very user first,” added Javed. “Of course we want to form partnerships and so on, but we want to get this in the hands of millions of refugees tomorrow.”

Tarjimly co-founders Aziz Abdulaziz (left) and Atif Javed (right).

Taking part in Y Combinator should help there; both founders were enthusiastic about the resources and feedback they’d already received from the accelerator.

The next step, apart from getting the service out there to attract more users and volunteers, is to continue working with aid organizations and people on the ground. The team has already spent a good deal of time on this side of things but will soon depart for a two-week trip to Greece to chat with and observe refugees and aid workers there.

Millions of people could use something like this, so let’s hope it catches on. If you speak multiple languages, consider signing up as a volunteer; a few minutes of your time could make a serious difference to someone in need of immediate help.

Featured Image: Tarjimly

Flitto’s language data helps machine translation systems get more accurate

Simon Lee, founder and CEO of Flitto

Artificial intelligence-powered translation is becoming an increasingly crowded category, with Google, Microsoft, Amazon and Facebook all working on their own services. But tech still isn’t a match for professional human translations and machine-generated results are often hit-and-miss. One online translation service, Flitto, is now focused on providing other companies with the language data they need to train their machine translation programs.

Headquartered in Seoul, Flitto launched in 2012 as a translation crowdsourcing platform. It still provides translation services, ranging from a mobile app to professional translators, for about 7.5 million users. About 80% of its revenue, however, now comes from the sale of language data, called “corpus,” to customers such as Baidu, Microsoft, Tencent, NTT DoCoMo and the South Korean government’s Electronics and Telecommunications Research Institute.

When Flitto launched five years ago, its main competition was Google Translate, says founder and chief executive officer Simon Lee. Google Translate delivered mixed results, but professional translation services were inaccessible for most people. Flitto, whose backers include Japanese game developer Colopl, was created to combine the two. It works with 1.2 million human translators who are paid if their translation is picked by the requestor.

Then in 2016, Google introduced its neural machine translation system, which improved the accuracy of Google Translate. Now many big tech companies, including Microsoft, Amazon, Facebook and Apple, are focused on developing their own artificial intelligence translation tools.

Even though results are getting better, they are still imperfect. AI-based translation systems need a ton of data to train, which is where Flitto comes in.

“There are different ways to translate something that gives different meanings in different situations, so there needs to be a huge set of data and a human checking all of that data to see if it is right or wrong,” says Lee.

He adds “it’s difficult to build up a corpus and IT companies don’t like building corpus because they focus on technology.”

Flitto’s app provides a machine translation first, then crowdsourced translations if requested.

Flitto’s corpus includes sets of human-translated sentences from its crowdsourcing service, which is used for things like slang, pop culture references or dialects that might stymie a machine translation service. Over the last five years, Lee says Flitto has accumulated more than 100 million sets of translated language data.

Corpus providers include the Oxford University Press, which gives researchers access to the Oxford English Corpus, and companies like Microsoft and Google that built corpus to train their systems. But there is still constant demand for new corpus because they take a lot of resources to create. While programs like Deepmind’s AlphaGo were able to train themselves with almost no human help, machine translation still needs a human touch.

“In other fields, machines can create their own data, but in language and translation it’s impossible for machines to create translation data by themselves,” says Lee. “So there always have to be human translators who go through all that data.”