Start a new topic

REQ: Recognition of terms with apostrophes?

About 3 months ago, we had a discussion here, with no result  To keep it short:

  • Terms with one behind a straight apostrophe are sometimes recognized*
  • Terms with several words behind a straight apostrophe are not recognized
  • Terms with one word behind a curly apostrophe are not recognized
  • Terms with several words behind a curly apostrophe are not recognized

*You need to install the source language spell checker and the word must be inside the dictionary.


Up to now there were no really suitable workarounds presented. Is there any kind of hope that this very basic point will function in the very near future?



Add the apostrophes to the "Do not match" list in Edit > Preferences > Memory tab. No spellchecker dictionary is needed.

No, this does not work (including – of course – a restart after adding the straight and the curly apostrophes).

Looking for the term with the quick search bar works (so it is there in the glossary), but this is not a feasible workaround neither for translation nor for proofreading/checking. To be absolutely sure, I even copied the concerned apostrosphe from the source segment (just to check with the correct curly apostrophe).

It works in the CTE 2018 Akua version during automatic matching, both with single word and multi-word entries. 

I cannot exclude that there is any other setting avoiding the automatic recognition, that at least does not work in a reliable manner.

Check with straight apostrophes

image

Here, "accise" is recognized, "autres" not.

And here with another flavor (this apostroph is also in the DNM field):

image

Here, "experience" is not recognized.

In both cases, the two missing terms are not displayed in the glossary pane. But they are not isolated. Some more examples

image

"humanité"

image

"infraction"

Correction: the apostrophe before "expérience" (with and without accent in the TB) is a "correct" curly one, I was unsure about that.

 

You mean in the apostrophe characters in the middle of the words - not quotes. In this case, the straight apostrophe works with the "Prefix matching" option for the glossary enabled.

This works sometimes, but not always


image

There are remaining the same problems as in the first posting, only that no spellchecker is necessary, plus:

  • prefix matching causes a lot of noise and a big number of unwanted term matches
  • but even with Prefox matching activated the same terms sometimes are being recognized and sometimes (most of the times) not
  • the curly apostrophe is the more common apostrophe, at least in Word Documents. Some files are even mixing both most common apostrophes, depending on where text chunks may come from (copied from text files, execel files etc.)
  • ... and we are not even speaking of two (or more) word matches



If you wish to keep exactness of your glossary terms, that is, no prefix matching,  just provide an exact term (with the apostrophe too) in your glossary, as a standalone term or source side synonym.

This is a viable solution from the theoretical point of view, from the practical point of view, it isn't at all. It would imply
  • at least nine variants to fetch the most common variants of one term (see here = three apostrophes variants with the 3 most commons letters)
  • not to talk about some verbs that might imply more letters to come (see here, look for acheter)
  • even then, a quite basic task such as work with client glossaries – if you do not want to rework (in many cases regulary changed and updated) glossaries that can contain several hundreds or thousands of terms (case of Volkswagen) – cannot be accomplished with CT. Is this right?

In the end, we're not talking about an exotic Zulu dialect. French is besides English and German one of the most common working languages in the EU, while another case, Italian, is at least an official language of the EU. Perhaps I have a too naive view on finding a simple string (term) inside another string (the segment) and this point is really hard to fix, but on the other hand side this apostrophe problem is rather unique.

When I provide a prefix matching solution to increase the fuzziness, you complain about the fuzziness. When I suggest keeping the glossary terms exact, you complain about them being just exact.


Perhaps, another solution will be figured out in the future although it might require checking each segment word for all the variants of apostrophes. I am not sure other users might accept the speed penalty for such an complex word analysis. Probably some neural network approaches would solve your problem.

> When I suggest keeping the glossary terms exact, you complain about them being just exact.

I don't complain about them being exact, but about entering nine variants for just one term (with the problem that some more can get lost) and the impossiblity to work with many client glossaries. In the end it is „just“ about separating words with apostrophes between them and counting/considering them as two (CT counts them as one, BTW).

 

> In the end it is „just“ about separating words with apostrophes between  them and counting/considering them as two (CT counts them as one, BTW).


So how would you separate and count the words such as can't, don't, shouldn't etc? 

The question of counting words is another thread, but I'd suggest to have a look at other office programs and tools (quick check: memoQ counts them indeed as one I actually do not have any Studio project at hand).

However in EN there is only about one dozen words with one letter "t" concerned (do, does, has, have, had, must, should, could, will, did, can etc.). How much harm would it do to see them as two words?

 

A very prominent Dutch user pointed me to a possible solution:

• Add the corresponding characters to the field "Additional space characters (Unicode)" under Preferences > Memory, e.g. "U+006CU+2019" fpr "l’".
• now the terms are being recognized

What are the disadvantages of this provisional workarounds?
• the festure mentioned above is <a href ="https://cafetran.freshdesk.com/support/search/solutions?term=additional+space&authenticity_token=rIuj6iqSnyNPoYMrmyb1HYyelPIUaoOYzKDUQUN8W1w%3D">not documented at all</a>
• it will most probably have a negative impact on match values (and even on TM content?)
• there are three most probable letters, with the most probable apostrophes (straight and curly) there are at least six more entries in a relative small field. Taking every possible combination of letters (the less common e.g. "m’", "t’", etc.) and apostrophes seems nearly impossible
• I am unsure if this workaround really fetches all concerned terms

So it cannot be but a very provisional workaround. From the postings above I see there won't be another solution. That's a pity.

 

Login to post a comment