Start a new topic

CafeTran and the new multi-core processors

This question is primarily for Igor, but thought it might be an interesting discussion.

I am seriously thinking about getting when of the new CPUs with 10-18 cores just because it seems like it would be a great boost in performance. Haven't yet decided on Intel vs AMD or specific model yet.

Anyway, my big question is if having such a CPU will speed up the time it takes CafeTran to finish going through my memory between each segment.

I currently have about 160,000 entries in my personal memory so that when I transition between segments, all the other resources are checked nearly instantly and then I have to wait anywhere between 3-8 or so seconds for CafeTran to finish going through my memory.

Please don't bring up Total Recall though. It doesn't work for Japanese. The memory Total Recall creates ends up having over 200,000 entries.

Actually, I guess it isn't always a minimum of 3 seconds.

Sometimes it is pretty quick and just a few hundred milliseconds behind the other resources, though that seems to only be the case for very short sentences.

Just a guess: Is it possible that the delay depends on whether or how many tags there are?

My experience is that files with many tags have a tendency to be notably slower than others without (more than you might assume). The presence of (many) tags might be another valuable argument against Total Recall in this case.

2nd guess: Are you working with a SSD or with a conventional HD? Maybe the SSD is in the end (a TMX is a plain text file) more important than some processor boost.

3rd, minor guess: Are you regularly optimizing the TM, i.e. at least removing duplicate entries? It depends on your workflows and I still did not recognize the pattern, but CT tends to add them here and there (though rarely really much).

160,000 entries is indeed quite a lot. I have customers for whom I translate since 20 years. Still not so many TUs.

Keep the TMs small by removing all garbage and you'll rarely need TR. I've described ways to compact TMs at length, both at this Freshdesk and at Proz.


Yeah, tags might have something to do with the range in times.

Yeah, I should try to optimize the memory. I have done so once or twice.

Yeah, I already have a SSD, and I top-tier one at that (Samsung 850 PRO 512GB)

However, it is just something I have noticed as it continually gets larger.

The larger the memory, the longer the delay.

I don't keep separate memories for each client.

I just have one master TM for myself.

Actually, I have one from the 18 months or so after I started using CafeTran.

The current one I started just about 2 years ago to this day.

I bet that you can reduce the size to about 50 %. But this will have a negative impact when you want to re-translate previously translated material via Insert all EMs. Of course you could do this:

  • Keep one central, lean TM, containing all TMs for all clients, for AA and concordancing.
  • Use one client TM for Insert all EMs. Detach it after "pre-translating".

I don't enjoy such a powerful machine yet so it is hard to tell. Theoretically, Java is able to utilize multiple cores so it might be faster. CafeTran has two options to deal with large TMs.

Total Recall (try lowering "Recall in context" option, say, to 100 for your language). If you reluctant to use Total Recall, just perform Preliminary memory matching. It should produce the results instantly after you give it a few-minute head start.  

IK: Theoretically, Java is able to utilize multiple cores

Fine, but doesn't CT search the resources one by one? Or do priority settings work in a different, more incomprehensible, way?

I'm still working on my dual core 2008 Mac, and switching to a much more recent quad core one doesn't seem to yield any speed benefits. Or maybe I'm just sloooow...


doesn't CT search the resources one by one? Or do priority settings work in a different, more incomprehensible, way?

No, searching in multiple resources is done via multiple processing threads. Auto-assembling (and priorities) enters into action when they all are done with the search.

Jason: The memory Total Recall creates ends up having over 200,000 entries.

I was wondering about that. If the TR TM ends up with more entries than the original resource, it must contain duplicates. Wouldn't it be easy for Igor to delete those duplicates?


IK: ...searching in multiple resources is done via multiple processing threads

Praise the Lord!



The type of RAM should also matter as the CT resources are imported into it. The faster and larger, the better.

Now I'm using DDR3 1600 MHz plus Core i7 6700K (4.0 GHz) plus SSD. CT can finish searching a Japanese-English TM having approx. 250K TUs in 3-5 seconds. Preliminary matching on the fly seems to reduce the time to 1-3 seconds, depending on the length of the source text.

DDR4 2400 MHz is already in the market, and it should be a buy.


The last time I tried preliminary matching, we ran into a problem with the memory not updating in real time while progressing with the project, which as we discussed, makes basically nullifies the point of using memories in the first place. Has this been resolved?

Yeah, I have 32 GB of 1866 MHz DDR3 as well.

It ins't very often that it takes 8 seconds between segments. I would say 2-4 on average.

But yeah, upgrading the CPU also means having to upgrade the motherboard. And if you are going to upgrade the CPU and motherboard, might as well upgrade the RAM!

Login to post a comment