Start a new topic

Replace terms with Glssary before submitting to MT?

It seems so obvious, and has for many years, but why can't any CAT tool do the following:

 

If you are using e.g. Google Translate on a text, and GT keeps offering you great, useful suggestions, why can't your CAT tool fix the few words GT consistently gets wrong?

 

This would be WAY more useful to me than the usually useless stuff auto-assembling offers me (in any CAT tool). For years, in every CAT tool (e.g. CafeTran, Déjà Vu), Google Translate, and now DeepL, have offered me way better rough drafts than anything my CAT tool could auto-assemble. So why haven't developers cottoned on yet and started from the MT output, rather than trying to patch up the auto-assembling result, like a few CAT tools now can?


That is:

1. fix the MT output with your glossaries

instead of

2. fixing your auto-assembling result with MT


For example (and this is just an example; that is, GT actually doesn't make this mistake):


The patent application you are translating contains the word "uitvoeringsvormen" zillions of times, but Google Translate consistently translates it as "execution forms", whereas it should be "embodiments". You have it in your glossary: uitvoeringsvormen = embodiments. Why can't CT just change it in the Google Translate output before it hits yr target box? Because it can't, you end up having to manually change it a hundred times while working. My Computer is definitely not Assisting me here!


NEWSFLASH!!!

I just checked, and GT4T (which I am currently using for all my MT needs as I translate in CafeTran) has a feature I have not yet tried, called "Pretranslate using Glossary", which is explained in a tooltip as: 

 

"Replace terms with glossary before submitting to MT" (!!!)


This is exactly what I am talking about. I am going to test it and will report back here.


Although the user can edit GT4T so-called "Simple Glossary" in Excel (which pops up automatically), the data is stored in a simple, tab-delimited txt file!


Michael



Matthew B. Crawford uit 2015, getiteld The World Beyond Your Head: On Becoming An Individual In An Age Of Distraction.
If you can read Dutch or DeepL is your uncle: https://decorrespondent.nl/7483/wat-je-terugkrijgt-als-je-van-facebook-gaat/953614707738-139d18dd nice article about distraction oh there is a fly on the wa
No offense, but I had to turn off Reply notifications for this topic, since it was too disturbing.

Please keep off topic talks in the off topic section, Hans.


1 person likes this
Jongens, dit gaat dus niet alleen over FB maar over alle afleiding die moderne technologie veroorzaakt! Zeer interessant om te lezen > Boys, this is not just about FB but about all the distractions that modern technology causes! Very interesting to read
Oops, sorry Alain, you’re right of course.

@Hans:

 

regarding your:

 

‘Hey M, Lilt employee on FB: Lilt is not PEMT, it's the opposite. The translator can use the mt suggestions or ignore them, the mt will react and adapt to translator's input. In traditional pemt setting, tge translator is locked into mt output which s/he then is stuck trying to rework. Lilt uses mt to enable translator rather than impede their work. Have you tried Lilt?’

 

Here's my insightful and succinct (ha ha ha) opinion on the matter of Lilt:

 

1. its UI is crap

2. it’s online, which is ALWAYS shit

3. its MT engine is … crap

 

The one thing that is good about it is the concept of ‘Adaptive MT’. However, because of the above three reasons, there is no point using.

I believe the same applies, to SDL Studio's recent implementation of Adaptive MT: their base engines are terrible, so no matter how good the system is, what's the point? Plus, you have to use it inside Studio.

 

The idea of adaptive MT is of course very interesting. However, the only way I would want to use it would be if the base engine it uses is good to start with. Otherwise, it's just a waste of time. For an adaptive MT system to be any good, I believe it needs to be MT agnostic, meaning: it needs to be possible to use it with any underlying MT engine. You need to be able to choose your favorite MT engine (which is always changing!), and use that, and have the system learn on-the-fly from your edits.

 

Anyway, back to the topic:

 

Dallas (the developer of GT4T) just released an updated version of his VERY cool idea of ‘Fixing MT results using your own Glossaries’, which I am going to try immediately on the rest of my patent for this evening. I think GT4T's implementation is already way better than either Lilt or SDL's SDL ‘AdaptiveMT’, which is pretty amazing, but mirrors my experience with CafeTran: one guy manages to produce a CAT tool which is way better than anything produced by companies with millions of euros at their disposal and vast development teams.


Michael

@Alain: to be honest, I couldn't figure out your system! ;-)


@Dallas: thanks for your hard work and amazing tool!

Michael: @Alain: to be honest, I couldn't figure out your system! ;-)
Don't worry. The idea was quite simple, but my explanation was, well, inferior to average MT. ;-)

Hi Igor, here is an idea: 


### Adaptive MT idea: ### 


CT records what is entered into the target box, in two stages:

 

STAGE #1.: CT records what is initially inserted. for the purpose of my idea, this will be raw MT output

STAGE #2.: CT records any changes the user makes, manually, to specific terms. these may or may not be present in a Glossary/TM

 

CT then automatically makes the same exact edits in any following segments, to further raw MT results.

 

I am not sure when STAGE #2 should be done. Since this idea relies on recording 2 stages, I think a final KBS, to be clicked right before confirming the segment and moving to the next segment would be a good idea, as there is no way for Ct to know when the user is finished editing the MT results. or, maybe add another KBS. for my workflow, I would need sty like:

 

"Make any Adaptive MT changes in target text, add checked segment to memory, and go to next unchecked segment "

 

*****************************

 

I am currently doing things in a somewhat similar way, but not quite as automated.


When I arrive in a new segment, I use a keyboard shortcut to translate it using GT4T (this could be any MT provider, also, obviously, CT's built in ones). In certain kinds of texts, for example the current patent application and working on, the MT results will be almost perfect. However, the MT engine will usually consistently get certain specific terms or, which I will then have to manually change to the correct forms each time. What I am currently doing is: 


I add a Glossary term pair for each of these, so I add: source term TAB DESIRED target term; INCORRECT terget term (offered by MT engine)


Once I add this Glossary entry once, all further instances of the incorrect term can be easily changed, either by selecting the incorrect terms, and right-clicking, and selecting the correct term, or via my AHK script/KBS*, which does the same. Usually, only one term will need to be changed. However, if more terms need to be changed in each segment, the idea I sketched above would obviously be handier. Actually, it would be handier in every case, as it requires no user input/active thinking: CT just watches what you are doing and copies any recurring changes you make.


Actually, I remember you said you had added something like this already to CT recently. Is this true?


*see my AHK script/KBS @ https://cafetran.freshdesk.com/support/discussions/topics/6000051595

This here:


image



Is very smart. I should have come up with this myself ...


CafeTran can already detect fuzzy matches and indicate different words. Now it'll have to store the different term pairs. For a segment with similar syntax for SL and TL and only one difference this will be easier than for a segment with different (or opposite) syntax for SL and TL and multiple differences.


image


Perhaps this is possible with a TM4T too (I don't know), but I can see that it's possible with a glossary with alternative translations for administrator = administrateur;administrator;beheerder;systeembeheerder.


About this part:


image



Why would an extra key be necessary? I'd rather see this is an extra feature, that can be activated in the Prefs. The default keyboard shortcut to go to the next segment and add the translation to the TM will do. Unless I'm missing something.

>Now it'll have to store the different term pairs


Now it will have to remember which alternative translation has been selected by the TR to override the default MT suggestion.

@Hans: yeah, yr right: that 2nd KBS isn’t really needed. 


I suppose it's only value would be that it would allow the user to have one last look at the fixed MT before confirming it. Without such a KBS, the user wouldn't get to see any of the magic changes before leaving the segment. 

Login to post a comment