Start a new topic

REQ: Recognition of terms with apostrophes?

About 3 months ago, we had a discussion here, with no result  To keep it short:

  • Terms with one behind a straight apostrophe are sometimes recognized*
  • Terms with several words behind a straight apostrophe are not recognized
  • Terms with one word behind a curly apostrophe are not recognized
  • Terms with several words behind a curly apostrophe are not recognized

*You need to install the source language spell checker and the word must be inside the dictionary.


Up to now there were no really suitable workarounds presented. Is there any kind of hope that this very basic point will function in the very near future?



I do not remember exactly, but the issue is about deleting the apostrophe, not adding it. If I remember correctly, with a later update this was not necessary any more.


It must be an apostrophe (straight and/or curly). Some OCR software use even an accent: "pourvu d`au moins une" ...

I mean apostrophes, not semicolons, of course! :D

Hi all, I am having the same issue, with a French source text. I use a glossary containing the term "au moins une", and in the source sentence, "...pourvu d’au moins une ...", the term is not recognized. Tre, you mentioned that the issue is resolved, but does it require any special setting for it to work? I have all possible flavors of semicolons in the "Do not match" field, but does this actually help or does this only apply to translation memories? I am still new to cafetran and feeling a little lost at the moment... Thanks in advance for any help.

Pray the Lord: the newest update resolves this issue. Thanks, Igor.

 


1 person likes this
Addendum: When using the solution above, any term with an "l" is no more recognized ...

 

Indeed this only a rhetorical question. I think the question to find any solution to this should be in the foreground.

 

> Is this deemed as acceptable?


Are you asking a developer (he doesn't know the answer) or the person who posted this workaround? Or is this a rhetorical question? 

image
Just as predicted above, Pest or Cholera with this "workaround". The segments should read "35 ml d'huile" and "-niveau d'huile". Is this deemed as acceptable?

 

A very prominent Dutch user pointed me to a possible solution:

• Add the corresponding characters to the field "Additional space characters (Unicode)" under Preferences > Memory, e.g. "U+006CU+2019" fpr "l’".
• now the terms are being recognized

What are the disadvantages of this provisional workarounds?
• the festure mentioned above is <a href ="https://cafetran.freshdesk.com/support/search/solutions?term=additional+space&authenticity_token=rIuj6iqSnyNPoYMrmyb1HYyelPIUaoOYzKDUQUN8W1w%3D">not documented at all</a>
• it will most probably have a negative impact on match values (and even on TM content?)
• there are three most probable letters, with the most probable apostrophes (straight and curly) there are at least six more entries in a relative small field. Taking every possible combination of letters (the less common e.g. "m’", "t’", etc.) and apostrophes seems nearly impossible
• I am unsure if this workaround really fetches all concerned terms

So it cannot be but a very provisional workaround. From the postings above I see there won't be another solution. That's a pity.

 

The question of counting words is another thread, but I'd suggest to have a look at other office programs and tools (quick check: memoQ counts them indeed as one I actually do not have any Studio project at hand).

However in EN there is only about one dozen words with one letter "t" concerned (do, does, has, have, had, must, should, could, will, did, can etc.). How much harm would it do to see them as two words?

 

> In the end it is „just“ about separating words with apostrophes between  them and counting/considering them as two (CT counts them as one, BTW).


So how would you separate and count the words such as can't, don't, shouldn't etc? 

> When I suggest keeping the glossary terms exact, you complain about them being just exact.

I don't complain about them being exact, but about entering nine variants for just one term (with the problem that some more can get lost) and the impossiblity to work with many client glossaries. In the end it is „just“ about separating words with apostrophes between them and counting/considering them as two (CT counts them as one, BTW).

 

When I provide a prefix matching solution to increase the fuzziness, you complain about the fuzziness. When I suggest keeping the glossary terms exact, you complain about them being just exact.


Perhaps, another solution will be figured out in the future although it might require checking each segment word for all the variants of apostrophes. I am not sure other users might accept the speed penalty for such an complex word analysis. Probably some neural network approaches would solve your problem.

This is a viable solution from the theoretical point of view, from the practical point of view, it isn't at all. It would imply
  • at least nine variants to fetch the most common variants of one term (see here = three apostrophes variants with the 3 most commons letters)
  • not to talk about some verbs that might imply more letters to come (see here, look for acheter)
  • even then, a quite basic task such as work with client glossaries – if you do not want to rework (in many cases regulary changed and updated) glossaries that can contain several hundreds or thousands of terms (case of Volkswagen) – cannot be accomplished with CT. Is this right?

In the end, we're not talking about an exotic Zulu dialect. French is besides English and German one of the most common working languages in the EU, while another case, Italian, is at least an official language of the EU. Perhaps I have a too naive view on finding a simple string (term) inside another string (the segment) and this point is really hard to fix, but on the other hand side this apostrophe problem is rather unique.

If you wish to keep exactness of your glossary terms, that is, no prefix matching,  just provide an exact term (with the apostrophe too) in your glossary, as a standalone term or source side synonym.

Login to post a comment