Total Recall vs. Preliminary matching for Japanese


This is a test report on the comparison of a Total Recall TM ("TR-TM") and a preliminary matching TM ("Pre-TM") as applied to Japanese as the source language.

The results are very interesting.

The source document is attached (8 segments). All the sentences were taken from the original TM ("Org-TM")) from which TR-TM and Pre-TM were created, and were modified in part for this testing.


1. Basic information

  • Size of Org-TM: Just below 170,000 TUs
  • Fuzzy match threshold: 50%
  • Fuzzy match display limit: 6 (max)

2. The results

Pre-TM: 58 TUs

# of Fuzzy matches: 17

* These results are the same as from Org-TM.

TR-TM (hits per word = default 100): 9,517 TUs

# of Fuzzy matches: 6

TR-TM (hits per word = 1,000): 62,434 TUs

# of Fuzzy matches: 28

3. Observations

  • I expected that Pre-TM would be the best performer, but TR-TM with a hits-per-word of 1,000 ("1,000-TR-TM") returned more fuzzy matches.
  • Those TUs which are included in 1,000-TR-TM but not in Pre-TM are just above 50% in match rate (i.e., on the brink).
  • Fragments were not completely the same (approx. 70%-80% identical).

This is a Tips and Tricks topic, but I can't give you any, at least for now.

