Yesterday I received a file with 129 translations units, an addendum to recently completed translation of a manual. Looking at it I realised that more than 50% had already been translated with the original translation of a manual from the same customer. The translation of the manual had been done by a colleague and I had aligned the English original and the German translation just the week before and after aligning the two I had stored the memory in Total Recall.
I was pretty amazed that after setting the new file up as a new project only about 5 segments were shown as match in the Matchboard. I was forced to manually copy the translation from the translated manual into the corresponding segments of the new translation.
I had already noticed this with a translation for a Bluetooth device.
How can I improve the recognition of matches?
If you remove the tick (I circled it in red) it says everything is loaded into memory. I like your idea with the test job. I'll just copy 4 or 5 of the segments it didn't recognize earlier into the fake job and see what happens. I'll let you know.
Well, I renamed the 800 word file and set it up as a new project. I removed the tick on Recall in context so that the whole database for Total Recall was available then ran the preliminary translation and after that the enter all 100% matches. I also disabled the "Big Mama" in Translation Memory (TMX). The recognition rate was better but still not 100%, what it ought to have been. I then tested the same enabling the "Big Mama" instead. The recognition rate was 100%. Looks like Igor still has some work to do with regards to Total Recall.
Joachim: The recognition rate was better but still not 100%
That's almost impossible. Almost, because if you have a huge database table, and all of the words in the segment that's not being recognised occur more than 1,000 times* in the table, it is possible. Highly unlikely.
*Your settings. The default is 100, I lowered it to 50, and when Recall Segments was introduced, I managed to "prove" it didn't work by using a term consisting of three very common words from the end of a 2.5 million segments table in a segment.
I unchecked the field "Recall in context", thus there is no longer a limit to the number of segments containing a word. As a result all segments containing a specific word are entered into the temporary TMX file, if I understand item 3 of "Recalling Segments..." correctly.
I used Across as CAT tool before switching to CafeTran and Across didn't use a TMX at all. Everything was saved in a MS-SQL database. TMX was merely used for exchanging translation memories with other CAT tools. Using a database does have a lot of advantages, provided the retrieval works. At present I gain the impression that with CafeTran the retrieval with TMX files "Big Mama" works better than with the database. With Across the retrieval worked fine looking not for words but only for segment matching. The user could select the percentage of the matches he would like to see. But, Across is expensive and not as user friendly as CafeTran.
Joachim: ...if I understand item 3 of "Recalling Segments..." correctly.
I think you are right. I never tried that, because you can end up with a huge TMX file, almost beating the purpose of Recall Segments.
Using a database does have a lot of advantages
I tend to agree (as an old DV hand). Igor doesn't.
But there must be something else going on with your approach, since I don't face your problems.
Igor. Igor! IGOR!
Joachim: With Across the retrieval worked fine looking not for words but only for segment matching.
That's probably why His Igorness went for TMX. Unless you use mark-up in the database (as Studio does, creating a monster), you can only get exact matches from the database, no fuzziness. However, I still don't understand why you don't get them exact matches using Recall Segments. IGOOOOR!
Well, that makes it two of us that don't understand it.
Tags are a possible explanation of Joachim's not getting the same accuracy with the retrieved segments from Total Recall as with his TMX memory file. When storing segments in Total Recall, CafeTran removes the mark-up/formatting tags for simplicity reasons and for their better recognition/reuse in other projects. When the user wishes to match the recalled segments to the same project with tagged segments, he should still get the exact match as he reaches the segment and insert the tags. However, the automatic insertion of all exact matches in "one go" will skip the segments that need tag reinsertion.
BTW, unchecking contextual retrieval from Total Recall is not a good idea as it trashes your working memory with thousands of unneeded segments. It should be used only with small Total Recall tables.
Yes tags might be the reason. The tags in the aligned file are not necessarily located at the same spots as in the second file. But then shouldn't the segments ought turn up in fuzzy matches then?
I agree that unchecking contextual retrieval is not a good idea. Setting it to 1000 did improve the recognition though. Looks like with the default 100 the number of segments is too low and quite a number of segments containing matches are skipped.
> But then shouldn't the segments ought turn up in fuzzy matches then?
Yes, the segments should even appear as 'exact ones" as you go along only to reinsert the tags at the correct positions. Note that some tags, especially those from non-CafeTran projects, may cause merging of tag adjacent words when there is no space in between (e.g. "WordTagWord"). You can inspect such segments when you open the Total Recall table for the manual search.
Well, they didn't in the first round. After renaming the file and increasing the value from 100 to 1,000 for the contextual retrieval, more segments were recognized.
Igor, it looks like tags also might influence auto-propagation. I had a text yesterday in which the same sentence in source language appeared three times but with a differing number of tags. Auto-propagation didn't work for this sentence.
Yes, that's an expected behavior. CafeTran has no way knowing the tags positions if their number and positions do not match. Such segments should not be auto-propagated as they need user's correction of tag positions.