Start a new topic

Glossary (TM) property: Use for MT improvement

Would it be possible to limit the use of glossaries/TMs for improvement of the MT result? So that only specific glossaries/TMs (very short ones, actually) are being used to improve the MT result.


E.g. Google Translate does a fine job, but it always translates specific words (like the quite frequent 'operator') wrong.


I don't want to spoil Google Translate's good results with my glossary in statu nascendi -- but I do want to have the 'operator' translated correctly (with 'operator' instead of 'exploitant').


Thank you Sir!


2 people like this idea

I second that. The current implementation for improving MT results rarely yields good results in my experience and language pair (EN-FR, for EL-FR I don't even try).

MT errors tend to be repetitive, so assigning one fragments TM or glossary to it would help replace the translation for specific terms or correct recurrent errors more efficiently, and as they occur.

Ideally, if you correctly translate "operator" once, by manually typing it, this would be remembered by CT and used to fix the GT output next time. Ha ha, guess what this is called? You guessed it: "Adaptive MT" ;-)

Instead of using corpora and training a custom (or even Adaptive) MT, this can provide a good "ad hoc" solution for MT APIs, as  projects may command other terminology choices and handling.

Well, if you keep a given resource out of auto-assembling, it will not be considered for the MT improvement either. It would be really strange if the user accepted the term usage in auto-assembling, but not in MT. 


Ideally, if you correctly translate "operator" once, by manually typing it, this would be remembered by CT and used to fix the GT output next time.


In CT, you can achieve the same by right-click overriding the term with its synonym.

I think fine tuning resources for improving MT with auto-assembling requires a different approach than for leveraging fragments/glossaries in the standard CT auto-assembling usage.


Way I see it:


- I need to add/have many glossary/fragment entries in standard usage.

- I need to minimise the entries for MT improvement only to those necessary, making ad hoc adjustments.


For MT improvement, I find too many entries (not just resources) do not play well, and also the same entries do not play equally well in the two scenarios, they tend to render the result unusable, and anyway auto-assembling in French seems especially difficult (with articles, pronouns, conjugations, declinations and so on that mess up results).


So, I'll try fine tuning a resource for MT improvement and have only that one enabled for auto-assembling, and report back.

>> Ideally, if you correctly translate "operator" once, by manually typing it, this would be remembered by CT and used to fix the GT output next time.


>In CT, you can achieve the same by right-click overriding the term with its synonym.


If I'm not mistaken, this requires that the wrong translation, suggested by the MT, has to be present in your glossary. And it might be a huge barrier, for linguists, to add this wrong translation.


Like 'seal' = 


image


Instead of:


image


Google Translate does it wrong, always.


The correctional list for the MT would be short. So I'd like to repeat my request (as if that would increase the chance that it's honoured ;)).


BTW: Why is it that MT engines translate seals the wrong way, in sentences that contain other mechanical parts like bolts, screws and / or verbs like mount, tighten etc. ?
@idimitriadis/Hans: That seems to be the best approach, at least for my language pair and Google Translate: instead of using AA at all, use pure GT output, tweaked with a small glossary for specific terms that GT keeps getting wrong.

 

Michael, thank you for confirming this approach!

Seals: Volk explains: “When Google Translate has to translate a text as someone is inputting it, the system has to split up the text by the sentence and send each sentence to different processing units. Due to the speed at which the translation has to be served up, those units have no way of communicating with each other and, hence, are unable to process context that goes beyond the sentence level.” https://slator.com/academia/swiss-science-foundation-grants-usd-0-5m-to-take-neural-mt-beyond-the-sentence/

I was looking at this again, while trying out the DeepL integration.


Being able to enforce terms into the MT output from a specific glossary would be REALLY USEFUL!


At the moment, I just add a note to remind me to change all the instances of a specific term, but then DeepL isn't consistent, so I have to watch out for a 'new' translation of each term.


Teaming with Auto-assembly is just not very useful - it almost always messes up the sentence structure, which then requires far too much editing time.


What if you use the option "Keep out of auto-assembling" on all other resources except the glossary or fragments TM that you intend to use with MT?


Alternative approach: enable "Team high priority fragments only" as well in Preferences > MT services, and set your glossary as the only resource with High priority.


Does "Team auto-assembling with machine translation" work better for MT output with any of these methods?


Jean

I'm afraid changing these settings does not improve the output at all.

CT seems to use the auto-assembled output for MT, even for the resources where I've checked "Keep out of auto-assembling".


I really need an option that will enforce terms from a termbase and not try to assemble the segment - the output from DeepL is OK in terms of sentence structure etc. - I just need to 'enforce' client-/job-specific terms.

I see… I think Igor might need to step in, as I have no further suggestions.

Login to post a comment