Start a new topic

Matching tags is unreliable

I have been struggling with tags in CafeTran for a long time. 


As far as I understand, tags are not stored in CafeTran TMs, but rather their positions are remembered. 


Here is an extract for the Wiki:


Other CAT tools save inline formatting in the TUs of a TMX file. CafeTran only stores the position of formatting, in a property. Both approaches have pros and cons.


Here is an SDL TMX file:

(image)


And this is how CafeTran stores the positions of the character changes:

(image)


When you save a third-party TMX file in CafeTran, inline tags will be removed but their positions will be mapped to the TU's properties so that Exact Matches are possible.


http://beijer.uk/cafetranhelp.com_ARCHIVED/cafetran.wikidot.com/tmx.html

(The entry is about TMX, in case the link does not work some time later)


As it says, both approaches have pros and cons. I wonder what are the pros of this approach?


More important is that such tags are not inserted reliably. I work only with external projects from memoQ, and whenever I insert an exact match from a previous file, or simply delete a translation and insert it again with auto-assembly or by clicking on the Matchboard, tags are almost never inserted correctly. Moreover, clicking on the Matchboard and auto-assembly produce a different result. Some tags are misplaced, some are repeating, some are missing.


These are all red tags (I'm not talking about purple tags here). I have processing of tags enabled, I also import segments from project into a native CafeTran memory. But shouldn't it work with any memory? TMX memories I import from memoQ into CT do not have any marks that they are somehow different.


This is a major issue that prevents exact matching and does not allow reusing previous translations without manual operations. E.g, if I have a file with several thousand segments and lots of tags, and only a part of them are new, I would still need to manually place all the tags in all the segments I already translated, confirmed and checked before. Frankly, I do not understand how to get such a basic feature as exact matching from CafeTran.


I wonder how other users are dealing with this? 


Can this issue be addressed?


How about pretranslating the mqxliff in memoQ?
Use these files to create a CafeTran memory.

If you constantly receive totally different source files, you'll have a problem.


When I made the transition to CafeTran Espresso, I had to insert many tags (tag positions). On and on, the tagging activities became less and lesser. Nowadays, nearly all tags in old segments are inserted correctly.


So it was an investment in labour that I had to make. I did it, because I saw the benefits of CafeTran Espresso, compared to previous tools.

How about pretranslating the mqxliff in memoQ?


It would not work, as the file can have full/fuzzy matches down below it.


When I made the transition to CafeTran Espresso, I had to insert many tags (tag positions). On and on, the tagging activities became less and lesser. Nowadays, nearly all tags in old segments are inserted correctly.


If I insert them manually, add to memory and then do a test: I delete a target, move to another segment and return back. I get very strange and inconsistent results. So no matter how many times I insert them, I get the same wrong results. I do not know how it should work and what's the mechanic behind it. Why such a design approach was even taken? I mean, matching is what a translation tool is made for. So why is it implemented in such a strange way? Tags are as important as words. And inconsistent results suggest there are also bugs involved here.


If I use, say, a memory from Memsource in memoQ, I do not notice any difference, even if Memsource tends to replace more things with tags. With CafeTran, you simply lose you work.


I wonder how many users experience this and why it has not been brought up before. I have seen only one similar thread where a user had troubles with Trados tags: https://cafetran.freshdesk.com/support/discussions/topics/6000058903 This solution did not work for me.


I have many other issues with tags as well: tags unintentionally copied to target, tags shrinking when using a capitalize shortcut, purple tags not saved in memory and thus lost, CafeTran changing tags after I end up editing a segment.


However, this issue is the most important, and I do not know how to use a program that behaves this way, no matter how many advantages it has. 


I believe something should be done with this. Why not save tags in memory?


CafeTran takes a different approach in everything, but why not use general conventions in basic things?

Yes, FMs can be a pain. EMs normally are no problem.


I cannot confirm the inconsistent placing once you have manually added a segment with tags to a CafeTran Espresso memory and return to that segment.


Why not save tags in memory: to keep the memories small and 'clean'. CafeTran Espresso places everything in RAM, so small memories are important.


Perhaps you should send a demo project to Igor?

Please note, that I agree with you that tag placing can be improved. Recently, I've made suggestions in these forums (consideration of hi-prio glossary terms, non-translatables, brackets, quotes, etc.). If I'm not mistaken, Igor is musing over a solution.


It is my feeling that once this has been solved, tag placing will improve significantly. For technical texts, that is. I cannot reflect on marketing texts etc.


What kind of texts do you translate mostly?


I sent a video to Igor. I could send him a project as well. I experience the same with at least two different projects. One project seems fine, but I did not test it too much.


Why not save tags in memory: to keep the memories small and 'clean'. CafeTran Espresso places everything in RAM, so small memories are important.


Perhaps there could be added an option for powerful computers to save everything in memory? I have 40 GB of RAM and can easily upgrade to 64 GB.

What kind of texts do you translate mostly?


I translate marketing with very long segments: sometimes these segments are a paragraph or up to a half page long. They have tags like bold, italics, and line breaks.


I also do games that have lots of tags. These are also often long segments: every Excel sheet is given as one segment. Sometimes I can hardly fit a segment to the page.


Please note, that I agree with you that tag placing can be improved. Recently, I've made suggestions in these forums (consideration of hi-prio glossary terms, non-translatables, brackets, quotes, etc.). If I'm not mistaken, Igor is musing over a solution.


Hopefully, Igor can figure out the solution. I've sent many emails to him on the issues I have.





I also second a priority glossary — a sort of project lexicon.

We already have that. You can set three priorities to glossaries. And set that only the high one is used for MT!

I never use MT. 


I would like only this glossary to be used in QA. I also wish different colors on the Matchboard for different priority glossaries: dark green, green, light green.

>They have tags like bold, italics, and line breaks.


Simple formatting that can be represented by characters, like _bold_, /italics/ en NwLn.


And there is CafeTran Espresso's own system to add BUMI marking.


I'm sure there must be a system to have this simple formatting stored in the memory. I'll muse on this and do some experiments.


Ideally, this would also include a solution to markup MQXLIFF with this formatting tags, so that CafeTran Espresso places them right.

>I never use MT. 

Never ever in your whole life? Not even on holidays?


I'm quite sure you'll be confronted with the results of MT in texts around you, in your daily life.


>I would like only this glossary to be used in QA. 


You can already set that in the resource definitions of glossaries.


>I also wish different colors on the Matchboard for different priority glossaries: dark green, green, light green.


Not sure if that isn't possible already. You get different colouring in the source segment, depending on the priority. But that's a different subject. So I don't mention it here.

Login to post a comment