Start a new topic

Term recognition (reloaded)

Simple detail question:

How can I make CT recognize this term?

image


Please note that there are different occurrences

  • l’Amérique du Sud
  • l'Amérique du Sud
  • l`Amérique du Sud

Note the difference in the apostrophes (the 3rd case is quite seldom). It depends on programs and some other aspects, which apostrophe is being used.


If I understand correctly, I have the following options

  • Prefix matching: this means to get many false matches, depending on the glossary. This would be as a kind of bad case okay, but if I see it correctly, this does not work for multi word matches, such as "Amérique du Sud".
  • Pipe character at the start of the word. Really? For any French word starting with a vocal?
  • Enter the term with the "l'" (and the two or three apostrophe flavors). Not seriously?


Perhaps I oversaw something?


The handling of terms with apostrophes concerns users translating from Fench, Italian, Catalan and many more (that I might ignore here). I do not think this is an exotic problem.



It's all the fault of that list:


Edit > Preferences > memory > Do not match


Remove the characters that are causing the problem from this list, and your problem will be solved (I think), but in doing so, you will cause new problems ;-)

Hm, I prefer to keep my hands off this list, as indeed it will create new problems (lowering match rates of 100 % matches and some other surprises).

It is only a minor, but permanent annoyance, though it could be considered as a basic feature that is missing.

Yeah, same here, I try not to touch that list, as it is a Pandora's box.

You can check the "Look up word stems" glossary feature provided that you have the Hunspell spell checker dictionaries installed for your language.

"Look up word stems" is activated, but no term is recognized. Hunspell is installed (for German, of course, not for French).

 

Of course, you wish to look up stems of the source terms so the French Hunspell dictionary needs to be installed too.

The CafeTRan GUI heavily suggests that only one spellchecker can be installed (respectively that anything else will be overwritten).
But indeed, I have it already installed.

image

No term recognition.

 

No term recognition.


Apparently, Hunspell spellchecker does not extract a stem from l’Amérique.

I will check this later on again. Do I assume right that Hunspell only extract stems it knows. E.g. these supercomplicated chemistry things – eg. Ammoniumtetraphenylcyclopentadienone – won't be extracted, as Hunspell does not know them?

A kind of GUI telling the user which spell checkers are installed and which not would be nice (or at least "open folder where spellcheck files reside"). And is the position of the spell check file there, in the corresponding folder, a guarantee the spellchecker is installed?

 Do I assume right that Hunspell only extract stems it knows?


Probably yes. You may need to ask Hunspell developers for exact functioning of its algorithms.


And is the position of the spell check file there, in the corresponding folder, a guarantee the spellchecker is installed?


Yes, if you install the right file for your language. You may need to activate the Look up stems function in Preferences > Glossary tab.


 


Okay, these are my next results:

a) I added „Amérique“ as a one-word glossary entry. Works indeed like a charm, "Amérique" is recognised. Obviously the "Look up word stems" glossary feature (provided that you have the Hunspell) only works with one word terms (I assume Hunspell has in case of French only one-word entries)
b) I added „U+0041“ (this is the concerned apostrophe) to Additional space characters, with a comma, as prescribed (the only other entry is the locked space). But now neither Amérique nor Amérique du Sud are recognised (NB: I did not restart CT, but only went up and down to have this result). Can we call this Pandora's box?

And now?
No chance to recognise two- or more-word glossary entries with apostrophes?

 

I forgot the last point:

c) After inserting U+0041 (the concerned apostrophe) to Additional space characters, CT does not even recognise "en Amérique du Sud". After deleting it again from this field, it is of course recognised. Sense?

 

Why do you do b) if a) works "like a charm" for you. I wonder what you wish to achieve by making CT treat your apostrophe as a space.


No chance to recognise two- or more-word glossary entries with apostrophes?


It's not clear to me either. If you have two or three word phrase in the glossary, then you should expect exact matching for such multiword glossary phrases. If you expect fuzzy matching, use translation memories instead.  

Next round perhaps?
I inserted U+002D to play around with the small hyphen.
This is what it recognised:

image

This had not been recognized before, I would not ecen expect it (this tag is bad luck, IMHO).

At least in this case "France-Amérique du Sud" (without tag) is being recognised, this was my intention.

 

Login to post a comment