Start a new topic

TM handling for a huge file?

I am missing some instructions on how to handle TMs for huge files (here a 55 K word file).

  • the TM only contains only the file's segments
  • IMHO this means that there is no advantage when using Total Recall
  • the TM engine gets slower with the project advancing (actually 35 K words translated)
  • in the mean time, I created two TMs: the first one with most of thes segments, set to 'Preliminary memory matching' and Read-only, the second as normal TM for current segments (am i right that this is the best procedure?)
  • but even now, after confirming a segment, CT is unresponsive for about 10, sometimes even 30 seconds, with the progress bar at the bottom moving

Are there any settings I should change?





Torsten: Are there any settings I should change?


I think the PMM is the culprit. The TM cannot possibly contain more that 55k words SL, and that shouldn't be a problem for CT (200-300k words is still fine here, 600k words causes a delay of a few seconds).


H. (not a PMM fan anyway)

 The slow behaviour is the same with the "Automatic" workflow (this is why I switched to 'preliminary memory matches').

After checking the file with Hans (who does not have these problems), I can give at least two hints that help a bit:
  • Set the TM to 'Low priority' instead of 'High priority'
  • Filter for untranslated files


Torsten: Set the TM to 'Low priority' instead of 'High priority'


But then again, it should be set to the highest possible priority. After all, "the TM only contains only the file's segments" so any other TMs should not overturn it.


I made a mistake above when I said "200-300k words is still fine here". That should have been 200-300k entries. That's a huge difference. The TM of your 55k words file cannot exceed around 10k entries, and CT should be able to handle that easily.


There's something very wrong here. I expect His Igorness to solve it, I can't (though I still think it's that PMM...).


H.

An option might be to split the file, and/or to disable all propagate options.


H.

CT is unresponsive for about 10, sometimes even 30 seconds, with the progress bar at the bottom moving


The Preliminary memory matching in the background takes some processing time. You may let it complete or at least be ahead of your currently-translated segment to speed things up considerably.


Anyway, I always recommend using Total Recall to filter out any unused segments from the working TM.

IK: I always recommend using Total Recall to filter out any unused segments from the working TM.


Using Total Recall for a TM that counts 55k words SL at most seems silly to me.


H.

There is also an option to perform the Preliminary memory matching on untranslated segments only or from the current segment. See the Translate menu. Then CT will skip any translated segments during processing. To me, it looks like the user is well ahead in the translation and the PMM starts all over from the very first segment, hence the slow down. 

Hm, I forogot to mention that I switched back to 'Automatic' in the mean time.
Just to give you a better idea:

image


Does it really make sense to use Total Recall here (empty TM at the start, only one file to process)? Or in other words: Would CT then really become this faster, also considering the more of processes?


 

Total Recall is very fast as it uses indexing. Of course, you should only use it to recall existing segments from the database for the current project context. You might try to check if it is a TM issue at all by loading the project without any TMs and moving to the next segment. 

Hm, I unchecked all TMs and TBs and online windows (without ProZ.com) and CT jumps from one segment to another with a delay of about 3 seconds, but not always (about 30 to 60 % of the segments).


IK: Of course, you should only use it to recall existing segments from the database for the current project context.


The friggin' TM is the project file, if Torsten 's claim "the TM only contains only the file's segments" is true. Since it must be fairly small:


  • It must be assigned the highest priority
  • Since it's "evolving" it shouldn't be imported in a Total Recall Database, let alone be "Recalled to segments/Memory"
  • There's something wrong. Don't know what.

H.

The 3 seconds delay might involve either automatic propagation with numerous non-translatables or not sufficient Java memory (e.g 32-bit Java instead of recommended 64-bit Java for larger projects).

IK: ...might involve either automatic propagation


Now that makes sense. Not only because I mentioned it first, but also because you made changes to the propagation feature recently. No more changes please, I'm a Conservative.


H.

Login to post a comment