Start a new topic

Term recognition (reloaded)

Simple detail question:

How can I make CT recognize this term?

image


Please note that there are different occurrences

  • l’Amérique du Sud
  • l'Amérique du Sud
  • l`Amérique du Sud

Note the difference in the apostrophes (the 3rd case is quite seldom). It depends on programs and some other aspects, which apostrophe is being used.


If I understand correctly, I have the following options

  • Prefix matching: this means to get many false matches, depending on the glossary. This would be as a kind of bad case okay, but if I see it correctly, this does not work for multi word matches, such as "Amérique du Sud".
  • Pipe character at the start of the word. Really? For any French word starting with a vocal?
  • Enter the term with the "l'" (and the two or three apostrophe flavors). Not seriously?


Perhaps I oversaw something?


The handling of terms with apostrophes concerns users translating from Fench, Italian, Catalan and many more (that I might ignore here). I do not think this is an exotic problem.



@Igor: I was doing b) because a) only recognizes one-word terms, but not three-word-terms.

 

> It's not clear to me either. If you have two or three word phrase in the glossary, then you should expect exact matching for such multiword glossary phrases

 

But this is exactly the point. They are not recognized when being behind an apostrophe (as said above, the flavor of apostrophes might differ, depending on several factors)

 

Yeah, I have been getting that too: entries in my glossary are not being highlighted if they are touching a comma, apostrophe, and a number of other characters, which is a pain in the ass.


Michael


1 person likes this

If you wish any character to be skipped during matching add it to the "Do not match" list in Preferences. That's a fast solution to remove any characters that might interfere with the matching of words or fragments.  

Hmm, I checked, and the comma is in that list. I will try to see if it happens today and send in some screenshots. Here is my current list:


,.。:;!¡?¿[]"«»‘’“”„'’()


and written differently:

,

.

:

;

!

¡

?

¿

[

]

"

«

»

'

(

)

This might help in case of commas, but with the apostrophe it does not help. Even not after a restart (for the records: in the cases above this restart finally has also been made).

 

So any terms with the comma should be recognized and displayed in the Matchboard. They should also be highlighted in the source segment. There is only one current limitation to the highlighting. CafeTran does not highlight the phrase in the source segment if such a character is in the middle of the matched phrase. The Matchboard shows such a phrase just fine.   

Hm, after re-opening the project (sure, a childish manner to see if there might be a christmas present) first with the old and then with the new build I see that „Amérique“ (as a one-word glossary entry) is no more recognised, despite of having French Hunspell installed and "Look up word stems" activated. This is pretty awkward (besides all the Pandora box games with "Additional space character" above).

And no, "l'Amérique du Sud" keeps being undisplayed.
FYI: the Frequent words feature displays this:

image

Well, it shouldn't it, should it?

There are actually no more settings changed, only the default settings for "Additional space character" and "Do not match".

 

And clicking on the number of "lAmérique du Sud" gives zero results (maybe it would have found "l'Amérique du Sud"), this should not be the case, no matter how to resolve this.

 

1. Remove that apostrophe from "Do not match" list.

2. Deactivate "Prefix matching" and activate "Look up word stems" for glossaries. You may have too much fuzziness with the two options on at the same time.

3. Restart CafeTran.


"Do not match option" CafeTran removes listed characters from the matched segments to increase the chance of finding. However, the "Look up word stems" needs the apostrophe to determine the stem of the word in this case. 

  • Do not match content: ,.。:;!¡?¿[]{}()"«»‘’“”„‚ (see also here, BTW)
  • Prefix matching deactivated (it is always)
  • Look up word stems activated (and French Hunspell installed)
  • CT restarted


Nope.

What's the glossary word you are trying to match, and to what word in the source segment?

Amérique du Sud => Südamerika

 

For multiword terms, like in your example, just add the article to your term to match the same in the source segment. 

As a really very, very provisional workaround, okay.


But then again it turns glossary work into a real pain (see here and look for acheter). I still did not test to convert the glossary into TMX - would this help?

Login to post a comment