Start a new topic

REQ: Converting memsource tags upon import

 Hello,


I would like to request that CT converts memsource file (.mxliff) tags into CT types upon import.


Now, an mxliff file appears like this in CT.


image


Memsource tags can be properly handled by SDL Trados, and an sdlxliff file created by it (with the tags converted into serial numbers) looks like this in CT.


image



My request is to get this without intervention of SDL Trados.


I'm willing to send you a sample mxliff file if needed.


They appear to be some internal markings unrelated to the xliff format itself but rather to the source document being translated. In CafeTran, you might add them to nontranslatables in the form of the regular expression such as:


|[{<]\d+[>}]


Then, they will be highlighted and transferred easily as nontranslatables.



Thanks for the advice.

>> They appear to be some internal markings unrelated to the xliff format itself but rather to the source document being translated.

I don't know. The relevant portion of the xliff file is:

<source>Whereas, XXX is engaged in {1&gt;the business of &lt;1}{2&gt;design, specification, marketing, sales and services of promotional and advertising materials&lt;2}{3&gt; and,&lt;3} in connection with such business, would benefit from receiving the services that are {4&gt;agreed from time to time between the parties &lt;4}hereto;</source>

 Anyway, at least for the time being, SDL Trados Studio can be a good help.

Igor,


Just to learn: why are you enclosing {< and >} with brackets here?


image

And regarding my constant request for a regular expression tagger: perhaps this could be made superflous by allowing non-translatables to be collapsed? Don't know.



Apart from a Regex Tagger feature, which looks nice, there is concern that these non-translatables may considerably lower TM matching rates just because of their presence/absence. Can I exclude them from matching by inserting your regex in the "Do not match" section of the memory settings panel?

 

>concern that these non-translatables may considerably lower TM matching rates just because of their presence/absence


So true!

> why are you enclosing {< and >} with brackets here?


See Character classes here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html


Can I exclude them from matching by inserting your regex in the "Do not match" section of the memory settings panel?


Yes you can, except for numbers.


I'm resuscitating this old discussion to see whether in the meantime CTE has improved the handling of mxliff files re. Memsource tags. I tried by adding the |[{<]\d+[>}] regex above, but CTE couldn't import the text mxliff file.

https://github.com/idimitriadis0/TheCafeTranFiles/wiki/4-File-formats#memsource


To make Memsource tags easier to handle and insert in your target segments, you can add the following Regular Expression (regex) as a non-translatable fragment (Resources > Non-translatable fragments > Add selection to non-translatable fragments):


|[{<]\d+[>}]


Then, you will be able to easily place these tags with the F4 keyboard shortcut for inserting non-translatables.


To exclude these tags from memory matching (so that they don’t hurt the TM fuzzy matching algorithm), you can also insert the above regex in the “Do not match” section of Preferences (Options) > Memory.


Does this not apply anymore? I am following this.

The only person who can confirm this is Igor.


Igor, Memsource is becoming more and more popular as the number of agencies requiring it is increasing.


It's very good that we can process mxliff files with CTE, but correct tags handling is an important issue. Is the procedure explained above by Jean still valid?



Thank you Jean.


I've followed your instructions, but I'm still getting pieces like {b> etc.


I've also put |[{<]\d+[>}] in my non-translatable glossary, just in case.


Do you have any idea?


I know absolutely nothing about regular expressions, but perhaps |[{<]\d+[>}] doesn't catch the following Memsource tags, right?


image

 

In the Dejavu forum I found this expression that convert Memsource tags to DVX3 tags:


(\{([ibu0-9]{1,3})>)|(<([ibu0-9]{1,3})\}|\{.*?\})


It works well, except sometimes adding a few extra tags in the target, which is not a big deal since they can be deleted.


To convert the sequence of {bla> and <bla} characters into CafeTran's nontranslatables and hide it, you can add the following regular expression to your glossary of non-translatable fragments:


 |\{.?+>|<.?+\}^


If you skip the last ^ character, they will not hidden. Of course, you should transfer them all to the target segment via the F3 shortcut.


1 person likes this

Many thanks Igor, your solution works very well. 

Login to post a comment