Some days ago I made the following discovery:
Left side: my TM
Right side: Trados TM (customer)
Both with the same priority.
Only 58 %? Why? There is only one word different, with most letters the same.
For Latin-based languages, CafeTran compares whole words in a segment (not single characters within words, which is much slower). This is the compromise between speed, efficiency, accuracy, and, what is the most important of all, the total speed of dynamic matching features such finding hits/fragments and their auto-assembling. Switching to character-based calculation would be at the cost of the total responsiveness of the matching results.
> Is the penalty for a missing word the same as for another word – a number – even with the same number of decimals (this would be the nearest explaination)?
Yes, the penalty is the same, as the different number is treated like a different word.
> Should these propageted segments with different numbers not have a much higher fuzzy match rate?
You may be right and that's something to consider to half the penalty for different numbers. My only worry is to keep it still fast. Each word in a segment would need to be checked if it a number. For short segments that would be okay but imagine the never-ending sentences in legal translations.
See the following screenshots with longer and short sentences:
The example with the screw can show that in case of a letter number combination (mostly product names or here screw sizes) the higher match rate might perhaps not be welcome (the Volvo segment is for our Dutch superuser).
The test I did was with a simple Word document, so there were no tags inside the text.
Before coming up with the tuned-up solution for numbers, why don't simply lower the percentage of the fuzzy segments display for such short technical segments (e.g to 33%)?
> were automatically propagated.
Yes, the propagation is not related to the fuzzy percentage accuracy in any way.
Even with a fuzzy rate of 33 % this propagated segment does not show up. Yes, I understand that propagation and fuzziness do not relate, and the propagete feature works like a charm in many cases (in some others not, e.g. when the order of numbershas been changed).
By defining more complex numbers as non-translatables, CT should be able to propagate such segments when the project is reloaded. A non-translatable regular expression:
catches the numbers in your example.