Start a new topic

Help with glossary fuzziness

I'm still struggling to understand how CTE understand fuzzines. Here is a very simple example:


My glossary contains "device", which Italian translation is "dispositivo". While translating, when CTE detects "device" in the source editor if puts "dispositivo" in the target (correct).


But when it detects "devices", it puts the italian translation of "devise" in the target segment. Why?


Note that "Prefix matching" is selected and "Minimal prefix length" is set to 4. "Look up word stems" is unticked.


In other words, what settings are necessary so that CTE picks up the "device" part of "devices"?


Also, although "Auto-asembly" is unticked CTE automatically puts the translated term in the target editor. Why?



I'm not considering to put both "device" and "devices" in the glossary, unless it's the only possible solution.

Try disabling "Prefix matching" and only use "Look up word stems", which is the recommended setting.


Look up word stems uses Hunspell dictionary (make sure you have one installed for source language), which should pick up the plural of "device".


As you have experienced, minimal prefix length tends to introduce too much fuzziness.


---


Auto-assembling: when you say "auto-assembly is unticked", what do you mean exactly?


Also, what settings do you use to have auto-assembling populated directly in the target segment editor?


Consider adding a screenshot.

I agree with Jean. Note that the quality and coverage of Hunspell’s algorithms varies per language. E.g. for German some plurals and flexions are found, whereas similar ones aren’t. It is possible to optimize these algorithms yourself, but this requires thorough knowledge.

Thank you Jean.


"Prefix matching" is disabled and "Look up word stems" is enabled:

image


Auto-assembling: when you say "auto-assembly is unticked", what do you mean exactly?

I'm not using Auto-assembly. This is my setting:


image


Also, what settings do you use to have auto-assembling populated directly in the target segment editor?

As far as I can tell (form the Preferences file and the menus), mt CTE is not set to populate directly in the target segment editor. Certainly I don't want it to do that. Where should I look in particular to make sure that auto-population is not enabled?


image


Back to my main question: from this screenshot you can see that CTE is finding "dispositiv" in the glossary, which is correct. However it doesn't find the plural of "device" (although Hunspell's EN dictionary is installed). Instead, it finds "vice" in "device" and adds "morsa", which is the Italian translation for "vice".


The glossary contains two instances of the English "device" (as is, no pipe symbol).


Any further help would be much appreciated.


Thanks


Mario



I would recommend keeping the Prefix matching option on. Also, make sure that "Automatic fragments adjustment" option is enabled in Preferences > Auto-assembling tab. This way, CafeTran learns your first choice of the matched terms with the same prefix and synonyms with the same prefixes and applies it subsequently in the current session.


Of course, if you increase the minimal prefix length to 5, the unwanted "divise" term will not be matched in this specific case.  

Thank you. This seems to have fixed the problem. I'll keep experimenting.

Just one question: if I set the minimal prefix length to 5, what happens to less than 5 characters long word?


> what happens to less than 5 characters long word?


Nothing. It will be treated as just the word.

I wonder if a percentage value can be used instead of an absolute value.

You can try it. However, the percentage value is not as consistent across the terms with various length as the fixed one. I wouldn't recommend it.

You can try it. However, the percentage value...


So, can percentage values be used already instead of absolute values? What to insert exactly? Is 80% OK?


Also, does the issue explained above mean that the Hunspell dictionary is not providing fuzziness functionality?

> So, can percentage values be used already instead of absolute values? What to insert exactly? Is 80% OK?


Opposite to the fixed prefix length, any percentage values provide the element of fuzziness which you might not be willing to accept. So I don't know if, say, 80% will work for you. Hunspell-based stemming can also be quite imprecise as you noticed in your example ("device" word gives the "vice" stem for which the translation is unrelated). In my opinion, the Prefix matching with the fixed prefix length set seems to provide the optimal results.

Thank you, I accept your explanation. However—just for the sake of trying—when I inserted 80% in Prefix matching CTE didn't let me save the Preferences file upon clicking on OK.

Maybe there has been a misunderstanding since the start of this post. I meant, the ability to use percentage values instead of integers in Glossary > Minimal Prefix Length.


Now I know that it's not possible, for CTE doesn't accept percentage values. Hence I wondered whether it might be a future possibility if it makes sense for most language pairs.

OK, I'm sold.


I've been struggling since the beginning to find the perfect solution for my tri-lingual glossary, but it's very difficult to arrive at a good compromise in consideration of the different needs of my working language pairs (singular-plural, masculine and feminine and the like) without being plagued by high fuzzyness . Regular expressions seemed to me the best solution until now, but since I'm not using auto-assembly and I rely upon a single, large glossary only as in translation old school, I feel that I should reconsider my early conviction.


I translate from Japanese and English, and only rarely into English. Therefore, for me it's particularly important to find the right Italian word without too much fuzziness.


So I have two basic questions:

1) Singular and plural for English and Italian as source language should be placed on a single glossary line?

2) If the answer for 1) is yes the glossary would become very large. Is this a problem in terms of speed?


Thank you

Login to post a comment