They appear to be some internal markings unrelated to the xliff format itself but rather to the source document being translated. In CafeTran, you might add them to nontranslatables in the form of the regular expression such as:
|[{<]\d+[>}]
Then, they will be highlighted and transferred easily as nontranslatables.
Anyway, at least for the time being, SDL Trados Studio can be a good help.
Igor,
Just to learn: why are you enclosing {< and >} with brackets here?
>concern that these non-translatables may considerably lower TM matching rates just because of their presence/absence
So true!
> why are you enclosing {< and >} with brackets here?
See Character classes here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
> Can I exclude them from matching by inserting your regex in the "Do not match" section of the memory settings panel?
Yes you can, except for numbers.
I'm resuscitating this old discussion to see whether in the meantime CTE has improved the handling of mxliff files re. Memsource tags. I tried by adding the |[{<]\d+[>}] regex above, but CTE couldn't import the text mxliff file.
https://github.com/idimitriadis0/TheCafeTranFiles/wiki/4-File-formats#memsource
To make Memsource tags easier to handle and insert in your target segments, you can add the following Regular Expression (regex) as a non-translatable fragment (Resources > Non-translatable fragments > Add selection to non-translatable fragments):
|[{<]\d+[>}]
Then, you will be able to easily place these tags with the F4 keyboard shortcut for inserting non-translatables.
To exclude these tags from memory matching (so that they don’t hurt the TM fuzzy matching algorithm), you can also insert the above regex in the “Do not match” section of Preferences (Options) > Memory.
Does this not apply anymore? I am following this.
The only person who can confirm this is Igor.
Igor, Memsource is becoming more and more popular as the number of agencies requiring it is increasing.
It's very good that we can process mxliff files with CTE, but correct tags handling is an important issue. Is the procedure explained above by Jean still valid?
Thank you Jean.
I've followed your instructions, but I'm still getting pieces like {b> etc.
I've also put |[{<]\d+[>}] in my non-translatable glossary, just in case.
Do you have any idea?
I know absolutely nothing about regular expressions, but perhaps |[{<]\d+[>}] doesn't catch the following Memsource tags, right?
In the Dejavu forum I found this expression that convert Memsource tags to DVX3 tags:
(\{([ibu0-9]{1,3})>)|(<([ibu0-9]{1,3})\}|\{.*?\})
It works well, except sometimes adding a few extra tags in the target, which is not a big deal since they can be deleted.
To convert the sequence of {bla> and <bla} characters into CafeTran's nontranslatables and hide it, you can add the following regular expression to your glossary of non-translatable fragments:
|\{.?+>|<.?+\}^
If you skip the last ^ character, they will not hidden. Of course, you should transfer them all to the target segment via the F3 shortcut.
Many thanks Igor, your solution works very well.
masato
Hello,
I would like to request that CT converts memsource file (.mxliff) tags into CT types upon import.
Now, an mxliff file appears like this in CT.
Memsource tags can be properly handled by SDL Trados, and an sdlxliff file created by it (with the tags converted into serial numbers) looks like this in CT.
My request is to get this without intervention of SDL Trados.
I'm willing to send you a sample mxliff file if needed.