Start a new topic

Replace terms with Glssary before submitting to MT?

It seems so obvious, and has for many years, but why can't any CAT tool do the following:


If you are using e.g. Google Translate on a text, and GT keeps offering you great, useful suggestions, why can't your CAT tool fix the few words GT consistently gets wrong?


This would be WAY more useful to me than the usually useless stuff auto-assembling offers me (in any CAT tool). For years, in every CAT tool (e.g. CafeTran, Déjà Vu), Google Translate, and now DeepL, have offered me way better rough drafts than anything my CAT tool could auto-assemble. So why haven't developers cottoned on yet and started from the MT output, rather than trying to patch up the auto-assembling result, like a few CAT tools now can?

That is:

1. fix the MT output with your glossaries

instead of

2. fixing your auto-assembling result with MT

For example (and this is just an example; that is, GT actually doesn't make this mistake):

The patent application you are translating contains the word "uitvoeringsvormen" zillions of times, but Google Translate consistently translates it as "execution forms", whereas it should be "embodiments". You have it in your glossary: uitvoeringsvormen = embodiments. Why can't CT just change it in the Google Translate output before it hits yr target box? Because it can't, you end up having to manually change it a hundred times while working. My Computer is definitely not Assisting me here!


I just checked, and GT4T (which I am currently using for all my MT needs as I translate in CafeTran) has a feature I have not yet tried, called "Pretranslate using Glossary", which is explained in a tooltip as: 


"Replace terms with glossary before submitting to MT" (!!!)

This is exactly what I am talking about. I am going to test it and will report back here.

Although the user can edit GT4T so-called "Simple Glossary" in Excel (which pops up automatically), the data is stored in a simple, tab-delimited txt file!


Holy cow, it works, perfectly.

I am translating a patent application with a number of highly specific terms, which are consistently mistranslated by every machine translation provider. I quickly added the specific terms to my GT4T Simple Glossary*, and now every time I come to a new segment in CT and press the special GT4T keyboard shortcut, I am presented with a little dialogue with a list of five different machine translations of my segment. And guess what? Every single one of my difficult terms has now been magically translated correctly in the machine translations!


* adding new terms to  GT4T Simple Glossary couldn't be easier. You just select the source term in CT and hit the keyboard shortcut. if the term is not already in your Simple Glossary, you can click "a", and a little dialogue will pop up where you can quickly enter the target.



I think that it's already partially present in CafeTran: masking non-translatables. Add to this that these 'non-translatables' will be replaced via AA after insertion et voila.

We did have 2 options for MT repair in the past, wasn't one of them what you are suggesting now?

Nope ...


For what it's worth, I don't think that pretranslating is very clever. Interference is a big danger. I think that masking and substituting after MT is cleverer.

And yes, with DeepL's quality (still improving), I think that it'll be a very useful option. One that can make big glossaries (etc.) almost superfluous. Let the MT do the work, while you keep control via your 'Lexicon'.

1 person likes this

Hans, you might be right about:


"For what it's worth, I don't think that pretranslating is very clever. Interference is a big danger. I think that masking and substituting after MT is cleverer. "


...but either way, this is a very handy feature, and sorely missing from CafeTran, and all other CAT tools.

I only just discovered I can do it with GT4T, and boy is it speeding up my patent application this evening.

Jost has basically been saying it for some time already, but no one has really cracked it yet.

Now that all these neural MT systems are getting really good, and Linguee too (which, btw, can also be queried straight from GT4T), I think a lot more focus should be put on #1:

1. fix the MT output with your glossaries 

2. fixing your auto-assembling result with MT

Hey Michael, look here:


Seals on ze beach!

ha ha, you know what they say:

Great seals think alike.

I second this. Has anybody used Team Machine with Auto-assembling and got it right? Even keeping everything out of auto-assembling, except a specific resource (fragments TMX or glossary only related to the terms needing that MT does consistently wrong), the results are disappointing/unusable for me. The MT output is actually worsened, it's not just a word that gets replaced, sounds like the segment is modified before the query, which leads to a different result. It would be more practical to apply a “Fix Machine Translation with terms” than the current implementation. GT4T's implementation is interesting.

Hi Hans,

Just gave that statement of yours (below in red) some more thought, and I think you're right. I will ask Dallas (the developer of GT4T) what he thinks about this. 

It does seem better to leave the original src text untouched, and send that, because changing it, by inserting target-language terms into it (which is what GT4T is doing, if I understand it correctly) might degrade the final quality.

"For what it's worth, I don't think that pretranslating is very clever. Interference is a big danger. I think that masking and substituting after MT is cleverer. "

I'm not sure it will work, but it makes me think of another possibility:
  1. Take your best TM, and export only the shortest segments ( a few hundreds) to a Tab delimited file.
  2. Translate the source with MT.
  3. Replace the Tab delimited file source with these MT translations.
  4. Sort the file by string length order (from the longest to the shortest).
We can now use it (Michael in is Excel file, the Mac guys in the blacklist file).
We then have a few hundred chunks of DeepL translations aligned with (sub)segments from the original, good quality TM, with the longest chunks treated first, thus reducing the chances of getting grammatically poor results (which I get in French when I simply replace a masculine noun with a feminine one, for example, with the "Glossary only" approach).

It should work. Or am I missing something?

I guess that I’ll just wait for Igor’s solution ;).
Hey M, Lilt employee on FB: Lilt is not PEMT, it's the opposite. The translator can use the mt suggestions or ignore them, the mt will react and adapt to translator's input. In traditional pemt setting, tge translator is locked into mt output which s/he then is stuck trying to rework. Lilt uses mt to enable translator rather than impede their work. Have you tried Lilt?
And about FB itself: Facebook is designed to be addicted to it. That is how we are constantly distracted. But if nobody can concentrate, no one comes up with a well thought out idea. Time to demand our attention back.
Login to post a comment