Start a new topic

Remove duplicates from Total Recall databases or tables?

Hi guys and gals,


Is there any way to remove duplicates from Total Recall databases or tables?


Michael   


>Is there any way to remove duplicates from Total Recall databases or tables?


Nope.


Only: Export everything. Load as a memory, perform Tasks to remove duplicates. But hey, does it really matter, considering the prices of disk space in military grade laptop tanks.

Hmm, OK. I'm less worried about disk space than the possible effect of having tons of duplicates on match performance in CT.

 

Hmm, I suppose I could also open the .db in TMLookup, and use TMLookup's duplicate removal tool on it.

 

>I could also open the .db in TMLookup, and use TMLookup's duplicate removal tool on it.


I kind of remember that Igor said that the removal of duplicates wasn't so simple. So, perhaps Farkas has found the hen with the golden eggs? Something that he'd be willing to share? Give some inspirational hints? I know that you have the best relations with him, hint, hint. 



Hello Michael, I think the safest bet would be to run maintenance tasks on the recalled TMX (or simply create a new TR table).

The problem is, my TMLookup SQLite .db is 27.5 GB :-)

Although I'm not planning to let my new CT Total Recall db get out if hand like that, I do expect it to eventually grow too large to Recall in its entirety inside CT.

 

Huge!


Keeping TMs well organized helps you Join them ad hoc if you need it or rebuild your TR tables anytime with no fuss.


I guess you keep separate TRs for your own TMs and EU DGT, Opus stuff etc. or do you usee one big fat db to rule them all?


idim: I think the safest bet would be to run maintenance tasks on the recalled TMX


How very true. Not only the safest, but also the fastest solution. In fact, I suggested it when a "Japanese user" ended up with more TR TM segments than the the number of segments in the table. Not sure if a TR TM is useful for Japanese as a ST, though.


Using TMLookup might be a problem. I never used TML, but it seems it creates a .db for each and every table. Don't know what happens if you open a CT DB with multiple tables in TML. You can of course open the table in a SQLite browser, and execute an SQL command. For more information, see Lenting. He knows everything.


H.

idim: Keeping TMs well organized helps you Join them ad hoc if you need it or rebuild your TR tables anytime with no fuss.


If you use TR as His Igorness intends it to be used (I don't), that is, adding TMs all the time, after each and every job, duplicates will be inevitable. And that's no problem, in view of the solution you yourself provided above: It's the TR TM that counts.


H.

It would take ages for very large SQL databases to analyze their rows for duplicates. However, there is no harm in keeping such duplicates in terms of performance, because they are eliminated by default during the recall to the working translation memory. 


1 person likes this
OK, thanks everyone!

I'll just ignore them for the time being, as they don't seem to be doing any harm. The harm is more psychological (I just don't like the idea of duplicates).


 

IK: ...because they are eliminated by default during the recall to the working translation memory.


That didn't seem to be the case with the Japanese user I mentioned above who rather recently ended up with more segments in the TR TM than in the original table. Please explain..


H.

The Total Recall progress bar (in Windows look and feel showing the numbers) was misleading a few updates before. It showed the progress of TR analyzed segments - not the number of actual segments loaded. Or the user may have changed the TR option to keep all the duplicates.

Slightly different topic. I noticed TMXs take a long, long time to import into a TR table. Am going to try to import a Glossary next. Just importing a TMX of around 900 MB looks like it is going to take hours. No tragedy, of course, but if you need to restart CT in between for whatever reason, you have to start all over again, and of course end up with even more duplicates ;-)

@Igor: is importing from a Glossary any faster?

 

Login to post a comment