Start a new topic

Apostrophes in source and recognition of glossary terms/TM fragments

Hello,


I’m opening this discussion here in relation to yesterday’s post in the CafeTran support section of the ProZ forums: https://www.proz.com/forum/cafetran_support/332131-glossary_problem_regarding_words_with_apostrophe.html


In short the question is: In CafeTran, what are the requirements for ensuring the systematic recognition of terms (glossary entries/fragments) that appear with an apostrophe in the source text?


This is especially important for those who translate FROM French, a language which uses apostrophes in abundance. This has been discussed in the past, but I prefer to create a new post since related changes have been introduced in CafeTran.


I understand that one of the methods to achieve this is via the “Prefix matching” feature, as modified in build 2017122101:


- normalization of source segment apostrophes for the “Prefix matching” feature (all the major variants of apostrophes appearing in the source segments are internally treated as the straight apostrophe to increase the probability of finding the match).


What settings should be used for this to work?


Should the various types of single apostrophes be removed from the “Do not match” option in Preferences > Memory? 


Could another method make use of the “Look up word stems” feature, which can be enabled for Fragments or Glossaries in the Memory or Glossary tab of the Preferences?


In that case, what settings should be used?


If “Look up word stems” is enabled, CafeTran uses word stemming when querying for fragments or glossary matches respectively. CafeTran provides stemming based on the Hunspell dictionary.


If I understand correctly, since the stemming is related to the source text, not the target text, for this feature to work, one should have installed a spellchecker for the source target language, not just the target language. Is that correct?


Thanks in advance!


Jean


Please see the corresponding threads in this forum. It took me months and many, many nervs to convince Igor of this necessity. If I am not mistaken, you need to delete the apostrophes from the Do not match field. And: It does not work for all apostrophes. It is a pity that this has not been documented yet.


But hey, it's only French. 

Thanks tre, and also thanks for your efforts to highlight this issue.


I have skimmed through the corresponding threads yesterday (in some of which I have also participated, although I don't generally translate from French), but I have not been able to systematically reproduce a solution that I can then explain to another fellow translator.


Furthermore, while checking the ChangeLog, I see three modifications in relation to apostrophes:


The latest build (2017122101) also features the following improvements:

[…]

- normalization of source segment apostrophes for the “Prefix matching” feature (all the major variants of apostrophes appearing in the source segments are internally treated as the straight apostrophe to increase the probability of finding the match).


The new build (2018031501) of update 2 is available with the following: Automatic recognition of segment words preceded by the curly apostrophe. The user needs to remove the curly apostrophe from the "Do not match" list in Edit > Preferences > Memory tab to have it matched.


The new build of CTE 2019 Forerunner (2018092401) is available for download, with the following improvements to the spell-checker function:

[…]

- recognition of words with the curly apostrophe (instead of the typewriter/straight apostrophe) inside.


I would certainly like to document this in my reference files as well, but I cannot only do so once I understand exactly what steps are involved.


The apostrophes need to be removed from the Do not match field in Preferences > Memory, so far I have understood.


If also understand "Prefix matching" and "Look up word stems" should not be used together. Can either of these settings be used to achieve the desired result?


I am missing any other existing requirements to achieve a systematic recognition of terms in conjunction with apostrophes in the source text.


I hope Igor can provide the steps involved, so that they can then be documented and reused by anyone.

*but I CAN only do so once I understand exactly what steps are involved.

Sorry, when speaking of documentation I referred to Igor's docs, not yours (and you do a great job, thanks, Jean!).


Please see the Proz.com thread to see my solution.


Login to post a comment