It seems so obvious, and has for many years, but why can't any CAT tool do the following:
If you are using e.g. Google Translate on a text, and GT keeps offering you great, useful suggestions, why can't your CAT tool fix the few words GT consistently gets wrong?
This would be WAY more useful to me than the usually useless stuff auto-assembling offers me (in any CAT tool). For years, in every CAT tool (e.g. CafeTran, Déjà Vu), Google Translate, and now DeepL, have offered me way better rough drafts than anything my CAT tool could auto-assemble. So why haven't developers cottoned on yet and started from the MT output, rather than trying to patch up the auto-assembling result, like a few CAT tools now can?
1. fix the MT output with your glossaries
2. fixing your auto-assembling result with MT
For example (and this is just an example; that is, GT actually doesn't make this mistake):
The patent application you are translating contains the word "uitvoeringsvormen" zillions of times, but Google Translate consistently translates it as "execution forms", whereas it should be "embodiments". You have it in your glossary: uitvoeringsvormen = embodiments. Why can't CT just change it in the Google Translate output before it hits yr target box? Because it can't, you end up having to manually change it a hundred times while working. My Computer is definitely not Assisting me here!
I just checked, and GT4T (which I am currently using for all my MT needs as I translate in CafeTran) has a feature I have not yet tried, called "Pretranslate using Glossary", which is explained in a tooltip as:
"Replace terms with glossary before submitting to MT" (!!!)
This is exactly what I am talking about. I am going to test it and will report back here.
Although the user can edit GT4T so-called "Simple Glossary" in Excel (which pops up automatically), the data is stored in a simple, tab-delimited txt file!
And yes, with DeepL's quality (still improving), I think that it'll be a very useful option. One that can make big glossaries (etc.) almost superfluous. Let the MT do the work, while you keep control via your 'Lexicon'.
Please keep off topic talks in the off topic section, Hans.
‘Hey M, Lilt employee on FB: Lilt is not PEMT, it's the opposite. The translator can use the mt suggestions or ignore them, the mt will react and adapt to translator's input. In traditional pemt setting, tge translator is locked into mt output which s/he then is stuck trying to rework. Lilt uses mt to enable translator rather than impede their work. Have you tried Lilt?’
Here's my insightful and succinct (ha ha ha) opinion on the matter of Lilt:
1. its UI is crap
2. it’s online, which is ALWAYS shit
3. its MT engine is … crap
The one thing that is good about it is the concept of ‘Adaptive MT’. However, because of the above three reasons, there is no point using.
I believe the same applies, to SDL Studio's recent implementation of Adaptive MT: their base engines are terrible, so no matter how good the system is, what's the point? Plus, you have to use it inside Studio.
The idea of adaptive MT is of course very interesting. However, the only way I would want to use it would be if the base engine it uses is good to start with. Otherwise, it's just a waste of time. For an adaptive MT system to be any good, I believe it needs to be MT agnostic, meaning: it needs to be possible to use it with any underlying MT engine. You need to be able to choose your favorite MT engine (which is always changing!), and use that, and have the system learn on-the-fly from your edits.
Anyway, back to the topic:
Dallas (the developer of GT4T) just released an updated version of his VERY cool idea of ‘Fixing MT results using your own Glossaries’, which I am going to try immediately on the rest of my patent for this evening. I think GT4T's implementation is already way better than either Lilt or SDL's SDL ‘AdaptiveMT’, which is pretty amazing, but mirrors my experience with CafeTran: one guy manages to produce a CAT tool which is way better than anything produced by companies with millions of euros at their disposal and vast development teams.
@Alain: to be honest, I couldn't figure out your system! ;-)
@Dallas: thanks for your hard work and amazing tool!
Hi Igor, here is an idea:
### Adaptive MT idea: ###
CT records what is entered into the target box, in two stages:
STAGE #1.: CT records what is initially inserted. for the purpose of my idea, this will be raw MT output
STAGE #2.: CT records any changes the user makes, manually, to specific terms. these may or may not be present in a Glossary/TM
CT then automatically makes the same exact edits in any following segments, to further raw MT results.
I am not sure when STAGE #2 should be done. Since this idea relies on recording 2 stages, I think a final KBS, to be clicked right before confirming the segment and moving to the next segment would be a good idea, as there is no way for Ct to know when the user is finished editing the MT results. or, maybe add another KBS. for my workflow, I would need sty like:
"Make any Adaptive MT changes in target text, add checked segment to memory, and go to next unchecked segment "
I am currently doing things in a somewhat similar way, but not quite as automated.
When I arrive in a new segment, I use a keyboard shortcut to translate it using GT4T (this could be any MT provider, also, obviously, CT's built in ones). In certain kinds of texts, for example the current patent application and working on, the MT results will be almost perfect. However, the MT engine will usually consistently get certain specific terms or, which I will then have to manually change to the correct forms each time. What I am currently doing is:
I add a Glossary term pair for each of these, so I add: source term TAB DESIRED target term; INCORRECT terget term (offered by MT engine)
Once I add this Glossary entry once, all further instances of the incorrect term can be easily changed, either by selecting the incorrect terms, and right-clicking, and selecting the correct term, or via my AHK script/KBS*, which does the same. Usually, only one term will need to be changed. However, if more terms need to be changed in each segment, the idea I sketched above would obviously be handier. Actually, it would be handier in every case, as it requires no user input/active thinking: CT just watches what you are doing and copies any recurring changes you make.
Actually, I remember you said you had added something like this already to CT recently. Is this true?
*see my AHK script/KBS @ https://cafetran.freshdesk.com/support/discussions/topics/6000051595
Is very smart. I should have come up with this myself ...
CafeTran can already detect fuzzy matches and indicate different words. Now it'll have to store the different term pairs. For a segment with similar syntax for SL and TL and only one difference this will be easier than for a segment with different (or opposite) syntax for SL and TL and multiple differences.
Perhaps this is possible with a TM4T too (I don't know), but I can see that it's possible with a glossary with alternative translations for administrator = administrateur;administrator;beheerder;systeembeheerder.
About this part:
Why would an extra key be necessary? I'd rather see this is an extra feature, that can be activated in the Prefs. The default keyboard shortcut to go to the next segment and add the translation to the TM will do. Unless I'm missing something.
>Now it'll have to store the different term pairs
Now it will have to remember which alternative translation has been selected by the TR to override the default MT suggestion.
@Hans: yeah, yr right: that 2nd KBS isn’t really needed.
I suppose it's only value would be that it would allow the user to have one last look at the fixed MT before confirming it. Without such a KBS, the user wouldn't get to see any of the magic changes before leaving the segment.
Holy cow, it works, perfectly.
I am translating a patent application with a number of highly specific terms, which are consistently mistranslated by every machine translation provider. I quickly added the specific terms to my GT4T Simple Glossary*, and now every time I come to a new segment in CT and press the special GT4T keyboard shortcut, I am presented with a little dialogue with a list of five different machine translations of my segment. And guess what? Every single one of my difficult terms has now been magically translated correctly in the machine translations!
* adding new terms to GT4T Simple Glossary couldn't be easier. You just select the source term in CT and hit the keyboard shortcut. if the term is not already in your Simple Glossary, you can click "a", and a little dialogue will pop up where you can quickly enter the target.
We did have 2 options for MT repair in the past, wasn't one of them what you are suggesting now?
For what it's worth, I don't think that pretranslating is very clever. Interference is a big danger. I think that masking and substituting after MT is cleverer.
Hans, you might be right about:
"For what it's worth, I don't think that pretranslating is very clever. Interference is a big danger. I think that masking and substituting after MT is cleverer. "
...but either way, this is a very handy feature, and sorely missing from CafeTran, and all other CAT tools.
I only just discovered I can do it with GT4T, and boy is it speeding up my patent application this evening.
Jost has basically been saying it for some time already, but no one has really cracked it yet.
Now that all these neural MT systems are getting really good, and Linguee too (which, btw, can also be queried straight from GT4T), I think a lot more focus should be put on #1:
1. fix the MT output with your glossaries
2. fixing your auto-assembling result with MT