Hope to start testing Slate Desktop on Mac soon.
Yesterday I received this information from Tom:
In a couple of months, I'll be rolling out some new read-made engines for some language pairs where public training corpora are available. Then, translators without TMs can use Slate Connect ($49) with the ready-made engines for less than $30 each engine. This configuration also makes it easier to support a native MacOS Slate Connect running on MacOS, but building engines will still be a Linux or Windows task with Slate Desktop or Pro.
One month ago, Tom wrote:
>Hi everyone. I'm trying to be a bit more proactive keeping you up to date. Jeroen continues working on the OS X build. The C++ details are more than I personally understand, but I know they're real because he's frustrated, stressed and frazzled. Short of a miracle, I'm not expecting a September availability. The best I can offer for now is more frequent updates and my reassurance that this is Jeroen's only work task. We're not giving up. -- Tom
Thanks for the update! Any news?
Compare to SMT (Slate/Moses):
I believe there's a 100% chance these stats will get better. Hardware always gets faster. Algorithms are optimized. Ten years ago, SMT required 1-3 weeks to train an SMT model. Hardware got faster. Academic researchers refined the algorithms. The same will happen over time with NMT. NMT gets faster and more robust, maybe the quality more than 26% over SMT. We'll monitor and bring it to market when it's ready.
SD supports forced terminology translation (anyone know a better term for this?). You create tab-delimited UTF-8 files with source term in the left column and target term on the right. You need to manually tokenize the terms. It's not the best solution, but it will get better. The source and target terms should be naturally cased. So, if you want to translate to "LaserJet", that's what goes in the target (right) column.
There are some discussions on our support forum here on Freshdesk.com. We're not as organized as CafeTran. You have to log-in to read forums, but anyone can create an account. I suggest you read there:
In addition to the terminology files, the traditional SMT community recommends you add your terms to the parallel training data like any other TM. We recently learned how we can improve over this recommendation when you have huge terminology glossaries. I'm not sure when I can implement, but the traditional approach still works.
We haven't done any performance tests, but I can't imagine any change in processing speed when importing glossaries. Re improved recognition result... I'm not sure. When a term is in the terminology file, it can only have one entry. I.e. we don't support translating to from source to multiple translation choices. If the glossary terms are added to the training corpus but not in the terminology, then the statistical frequencies in the language model (i.e. monolingual target data) have the most determine affect on the final translation suggestion. Our Linkedin group (Slate Desktop) is running a workshop now that covers this exact topic. You can join any time and participate.
Also, on our support forum you'll find Igor Goldfarb's comments about his experiments with training data size. His comments are relevant to the "big mama" TM approach. Essentially, quality performance breaks down with these large TMs. Pieter Beens reports that his big mama TM works well, but he feels it could be better. Fair enough.
Quality performance might improve if you take extra steps to normalize them, but this is not part of "out of the box" SD. There are too many variables. SD can be customized, but I always recommend you set a baseline with minimal (baseline) processing. Then increment your improvements.
SD has the ability to split TMX TUs into smaller TMX's based on the text/values in any <prop> tag and values of TU attributes. This is configurable without writing any code. Depending on what values are available in your TMX, you can split into various TMX files by creationid, client, project, and even the x-document value (i.e. original document's file name). You can even re-map these values. For example, if "george" used 5 different creationid values, you can merge them all into one "george" set. Once they're split, you can then re-combine them in any mix to make smaller, focused engines. Of course, this is all undocumented, like all good leading edge new software products :)
Thanks for the update, Tom.
I was wondering about the use of importing my background glossary (big papa) in Slate Desktop. It contains words and word groups like:
the front side
open the front side
I guess that SD will identify these items itself. However, can importing this glossary:
Well, you have to hurry, or else this whole MT approach will get obsolete ...
Neural Machine Translation Improving Fast, Study Finds
A study published on August 16, 2016 claims that Neural Machine Translation (NMT) outperforms phrase-based MT (PBMT) and provides better translations in the “particularly hard” to translate English-German language pair.
In the past, the researchers say, NMT was considered “too computationally costly and resource demanding” to compete with PBMT. Well, NMT literally need(ed) a lot of electricity. However, this has apparently changed beginning 2015, and NMT is now becoming more competitive.
The researchers (Luisa Bentivogli, Mauro Cettolo, and Marcello Federico of Fondazione Bruno Kessler, Trento Italy; Arianna Bisazza of the University of Amsterdam) found that, architecturally speaking, NMT is simpler than traditional statistical MT systems. Interestingly enough, however, they also add that the process is “less transparent” with NMT, saying that “the translation process is totally opaque to the analysis.” How NMT does what it does still seems a bit of a black box.
For the study, the researchers built on evaluation data from the IWSLT 2015 (International Workshop on Spoken Language Translation) MT English-German task and compared results using what they call the “first four top-ranking systems”; that is, NMT and three other phrase-based MT approaches.
The researchers sourced translation material from TED talks (transcripts translated from English into German), reasoning that the language used is structurally less complex, more conversational than formal, and required “a lower amount of rephrasing and reordering.”
As to why English and German, the researchers said using the two languages would be interesting because, despite belonging to the same language family, “they have marked differences in levels of inflection, morphological variation, and word order, especially long-range reordering of verbs.”
“The outcomes of the analysis confirm that NMT has significantly pushed ahead the state of the art”—Bentivogli, Cettolo, Federico, Bisazza
And it is in this aspect of better word reordering, particularly in the case of proper verb placement, that NMT shines. To quote, “one of the major strengths of the NMT approach is its ability to place German words in the right position even when this requires considerable reordering.”
Those Misplaced German Verbs
In contrast, the study indicated that “verbs are by far the most often misplaced word category in all PBMT systems,” which the researchers pointed out as a common problem affecting standard phrase-based statistical MT.
In summary, the outcome of the study’s analysis confirmed that NMT reduced the overall effort by a post-editor by 26% compared to PBMT output. In addition, NMT produced 70% less verb placement errors, 50% less word order errors, 19% less morphological errors, and 17% less lexical errors.
“Machine translation is definitely not a solved problem”—Bentivogli, Cettolo, Federico, Bisazza
However, despite outperforming PBMT systems on all sentence lengths, the performance of NMT degraded faster than its competitors the longer the input sentence became, which was one aspect the researchers singled out as an area for future work on improving NMT.
The researchers’ sense of excitement is palpable when they write “machine translation is definitely not a solved problem, but the time is finally ripe to tackle its most intricate aspects.”
Research Editor at Slator. Wide reader, online course consumer, computer science and transhumanism enthusiast, among other things. Bikes to work, so not a total couch potato.
We're working on the Mac build. We have our own build environment for the software in Slate Toolkit, and the trick is to get that working for OSX as well. There's a few technical unknowns to be conquered there, so it's not an easy thing to plan. Beyond that it's a matter of building a Mac installer, and lots and lots of testing!
>I think we're looking at late August or early September for first testing.
That would be great, since CafeTran's ready for Slate Desktop now. On Mac too: