Start a new topic

Question: "Do Not Match" option

 Hi,


Do somebody undestand the logic of the "Do Not Match" option in the Preferences?




I just wonder what are the advantages of using (or not) those ,.。:;!¡?¿[]{}()"«»‘’“”„‚.
I sometimes feel that CT would give better matches (less false matches) if I simply erased all of them. Any opinion or experience on that matter?

>I sometimes feel that CT would give better matches (less false matches) if I simply erased all of them.


'less false matches' as in matches automatically inserted into the Target segment pane, that are correct textually but have too much or too little of the DNM chars? 

Yes, that's what I meant. I thought there would be some penalty or warning when this happens.

Or there is simply something I don't understand (like: What is the usefulness or merit of this "Do not Match option"? Why is it there?).

 

>Yes, that's what I meant. I thought there would be some penalty or warning when this happens.


No, it's not there (yet). You can find an old posting here of mine, with some examples. It would be nice if CafeTran would correct these DNM characters in both directions, some day. Currently, it's not possible.


There is some kind of penalty: you can get a warning when the characters (or formatting) differ, while the textual content is identical.


>Or there is simply something I don't understand (like: What is the usefulness or merit of this "Do not Match option"? Why is it there?).


It's there to get more matches at the segment and subsegment level. And indeed you get a lot of them in CafeTran. The price that you pay for this, is that you often have to adjust the DNM characters.


Another reason why I store everything in my glossary: if a fuzzy match has too many differences in DNM (too many or too few of them), I just press F1 and let CafeTran insert the auto-assembled translation of the segment -- which has all DNM characters at the right place.

One more note: there's a simple solution for the Too Many DNB scenario, just add them to Characters to remove and press that keyboard shortcut whenever necessary.


There's no simple solution for the Too few DNB scenario, since this would require comparison of the textual content of the FM. If "ON" is surrounded by two quotes in the source, its translation (EIN) should be surrounded by two quotes too.


Quite complex to code, I'd say. If it would have been easy, it would already be present in CafeTran.

@cafetran-training: I just press F1 and let CafeTran insert the auto-assembled translation of the segment -- which has all DNM characters at the right place.

So we're back to AA, which is not a solution for me (JP-FR oe EN-FR(Canada), since AA results are always very poor with these language pairs. (If I'm right, AA is mostly useful when you have a huge Big Mama like the DGT.)

Anyway, I will do some tests without the ,.。:;!¡?¿[]{}()"«»‘’“”„‚. and see what happens...
Aaaargh, I remember. If I remove all the ,.。:;!¡?¿[]{}()"«»‘’“”„‚, I don't get any fuzzy matches, none at all (but of course concordance search give results). Why... ?

 

>If I'm right, AA is mostly useful when you have a huge Big Mama like the DGT.)


there are different views on that. Some say: store everything in your Memory to make AA useful, others say: store everything in your glossary to get the best result from AA.


It all depends on a lot of things:


  • SL
  • TL
  • Text type
  • etc.

I don't seem to have that problem, although I remember I once complained about one-word matches from a TM for Segments that ALWAYS went wrong. As if on purpose. Like: Warning | WARNING | Warning! | Warning: | WARNING: | WARNING! And then Igor did something with the metaphysics of the quantum-mechanical polarisation of the nothingness, and it was OK.


H.

Alain: AA is mostly useful when you have a huge Big Mama like the DGT.


AA is always useful, because it proposes the best solution based on your settings and CT's algorithms, no matter the size of your resources.  However, insertion of AA matches can be a tremendous pain in the arse, even for languages that are very suitable for it (e.g. because of word order differences). On the other hand, it takes exactly one keystroke, any keystroke, to get rid of the inserted AA.


H.

>I sometimes feel that CT would give better matches (less false matches) if I simply erased all of them.


I wouldn't count them as false matches. They are very high fuzzy matches which get the status of exact matches to be picked by auto-assembling and offered as the translation of the target segment automatically. The translator usually corrects the punctuation in such matches and moves no. 


Login to post a comment