Start a new topic

Total Recall, subsegment matching, fuzziness and Kevin Flanagan’s ‘Lift’.

Hi Igor,

OK, I have a question, which came about in connection with a discussion that is currently going on over at Proz about the new Lift technology, and Total Recall.

It basically boils down to this: how does subsegment matching in CafeTran relate to, or work in 9if it does), Total Recall? 

Hans (van den Broek) and I have been trying to understand it in the forum discussion above. Hans says that since the TR database is SQLite, it cannot apply fuzziness in its operations. I am not sure, or at any rate, don't have a clue.

If you look at my post titled " here’s an example of what I mean (CafeTran LIFTing)" (which is here:, you'll see that there seems to be some subsegment matching (and hence fuzziness) in my example screenshots. However, how is this possible?

Sorry if my question is not very well posed. I'm a bit short on time, as usual.


A couple of weeks ago, I tested the lot. I still have the test, but I don't know exactly what I did anymore (it wasn’t for publication…), so I repeated it. That's "test2" in the ZIP. The earlier files are also in the ZIP.

I wrote a document with only four words in it - raises animals and processes - copied from a real-life EU document, opened it in CT, opened the EN-NL DGT as a table, and ran Total Recall. There's obviously no segment match, so CT starts mining the three words (I take it CT treats "and" as a stopword, Igor can explain), and this results in the expected 300 segments (minus a few, probably because double, Igor can explain).

AA enabled (as always):

I don't see any "virtual matches," but it may have something to do with it. Igor can explain.


By the way, I also tested subsegment matching using TR a couple of weeks ago. I never saved that test, because it showed the expected result: It doesn't work.

I opened the DGT, and selected a phrase near the end of it, consisting of words that are common enough to yield the required number of hits (50 per my settings then). I created a document with that phrase, consisting of three entries, starting and ending with the phrase, and one with it in the middle. No subsegment results, as expected. However, I was a little confused to see a (sub)segment result of part of the phrase, that later turned out to be a segment match. No contradiction there. I'm not going to repeat that test, but feel free to try it yourself.


Two more articles on Total Recall which explain the storage and retrieval of the segments to the working memory:




The recalled segments take part in the subsegment matching that gives hits producing virtual matches. Based on the hits frequency, CafeTran creates virtual matches. See two important options which determine the accuracy of the hits:

1. Edit > Options > Subsegment to auto threshold (when the subsegment is used for auto-assembling).

2. Edit > Options > Subsegment to virtual threshold (when the subsegment is used for auto-assembling and placed in a separate virtual map which holds the virtual subsegment matches).

I started a series of articles on Total Recall. Please see the first article here:


Thanks, Igor. The first KB article!

And I'm beginning to see the light. And why I shouldn't have lowered the number of matches for Recall.

Everything is still as I claimed it was: Total Recall doesn't recall fuzzy matches and subsegments, only complete segments and words. However, if you have enough "word hits" (segments with the word), you will most likely get a segment with the word in the subsegment in the resulting TMX file where subsegment matching does work. Bad luck if you lowered the number of matches (like I did, from 100 to 50), and if the subsegment you're looking for is at the end of the table (as I selected in my deleted test). Correct?


So I tried to add a TMX memory to the database, thinking it would add a table to it, but it doesn't. You can add the TM to an existing table, or you can add a table, of course. Great! Excellent!


Login to post a comment