Start a new topic

Help with glossary fuzziness

I'm still struggling to understand how CTE understand fuzzines. Here is a very simple example:


My glossary contains "device", which Italian translation is "dispositivo". While translating, when CTE detects "device" in the source editor if puts "dispositivo" in the target (correct).


But when it detects "devices", it puts the italian translation of "devise" in the target segment. Why?


Note that "Prefix matching" is selected and "Minimal prefix length" is set to 4. "Look up word stems" is unticked.


In other words, what settings are necessary so that CTE picks up the "device" part of "devices"?


Also, although "Auto-asembly" is unticked CTE automatically puts the translated term in the target editor. Why?



Actually, it's very simple with AutoHotkey:


Copy content of Source Term field:

Send keystroke Ctrl+A.

Send keystroke Ctrl+C.

Send keystroke Enter.

Paste content of Source Term field.

Send keystroke Tab.

Copy content of Target Term field:

- Send keystroke Ctrl+A.

- Send keystroke Ctrl+C.

Send keystroke Enter.

Paste content of Target Term field.


(I can record this, next weekend.)

OK, it's a huge work but I hope it's more effective and rational than with REs. As for macros, onde day I'll have to try.

Yes, I add both forms to my glossary. Even with articles.


BTW: I use macros to duplicate the singular or plural to separate lines, so that I only have to type the plural -s or remove it.



Thank you. My glossary is about one eightieth of yours, so I shouldn't expect problems in this area.


But, are you registering both singular and plural forms for English? I don't know what to ask for German and Dutch.

1) Not sure, since you don't use AA

2) My glossary has 1234907 lines at the moment, no delay.

OK, I'm sold.


I've been struggling since the beginning to find the perfect solution for my tri-lingual glossary, but it's very difficult to arrive at a good compromise in consideration of the different needs of my working language pairs (singular-plural, masculine and feminine and the like) without being plagued by high fuzzyness . Regular expressions seemed to me the best solution until now, but since I'm not using auto-assembly and I rely upon a single, large glossary only as in translation old school, I feel that I should reconsider my early conviction.


I translate from Japanese and English, and only rarely into English. Therefore, for me it's particularly important to find the right Italian word without too much fuzziness.


So I have two basic questions:

1) Singular and plural for English and Italian as source language should be placed on a single glossary line?

2) If the answer for 1) is yes the glossary would become very large. Is this a problem in terms of speed?


Thank you

Maybe there has been a misunderstanding since the start of this post. I meant, the ability to use percentage values instead of integers in Glossary > Minimal Prefix Length.


Now I know that it's not possible, for CTE doesn't accept percentage values. Hence I wondered whether it might be a future possibility if it makes sense for most language pairs.

Thank you, I accept your explanation. However—just for the sake of trying—when I inserted 80% in Prefix matching CTE didn't let me save the Preferences file upon clicking on OK.

> So, can percentage values be used already instead of absolute values? What to insert exactly? Is 80% OK?


Opposite to the fixed prefix length, any percentage values provide the element of fuzziness which you might not be willing to accept. So I don't know if, say, 80% will work for you. Hunspell-based stemming can also be quite imprecise as you noticed in your example ("device" word gives the "vice" stem for which the translation is unrelated). In my opinion, the Prefix matching with the fixed prefix length set seems to provide the optimal results.

You can try it. However, the percentage value...


So, can percentage values be used already instead of absolute values? What to insert exactly? Is 80% OK?


Also, does the issue explained above mean that the Hunspell dictionary is not providing fuzziness functionality?

You can try it. However, the percentage value is not as consistent across the terms with various length as the fixed one. I wouldn't recommend it.

I wonder if a percentage value can be used instead of an absolute value.

> what happens to less than 5 characters long word?


Nothing. It will be treated as just the word.

Thank you. This seems to have fixed the problem. I'll keep experimenting.

Just one question: if I set the minimal prefix length to 5, what happens to less than 5 characters long word?


I would recommend keeping the Prefix matching option on. Also, make sure that "Automatic fragments adjustment" option is enabled in Preferences > Auto-assembling tab. This way, CafeTran learns your first choice of the matched terms with the same prefix and synonyms with the same prefixes and applies it subsequently in the current session.


Of course, if you increase the minimal prefix length to 5, the unwanted "divise" term will not be matched in this specific case.  

Login to post a comment