Start a new topic

Help on segmentation, please

I've just downloaded the Omega segmentation rules and immediately applied them to an EN-IT job after replacing the default Rules.srx file with the new one. But, although the new segmentation file includes the "no braake" rule for "Mr." (please see it below), CTE keeps breaking the sentence just after Mr.


Now, in CTE both the new segmentation rule file and "Default" have been selected. Should I have selected English instead? If so, does it mean that each time before importing a new file the corresponding language should be selected instead of "Default"?


Rule in question:


<rule break="no"> 

          <beforebreak>Mr\.</beforebreak>

          <afterbreak>\s</afterbreak>

</rule>




Select the rules specific to the language you wish to use. It will use the language-specific rules on top of the Default rules.

At least if you don't change the language pair between projects, "English" will be selected instead of "Default".

Thank you Jean,


you just preceded me. I tried again after selecting English instead of Default, and now the rule works. But, I seem to remember that someone in this forum wrote that CTE selects the rules language automatically based on the actual document source language, but clearly this is not the case.


Igor, any change to have CTE select automatically the rule language please?

Nothing has been changed with the automatic selection of the given language for segmentation if the source language is defined in .srx file. It certainly works with CafeTran's segmentation file. I believe there are many flavors of .srx files created by various tools and CTE may not be working with them all.

Igor, I'm using the highly acclaimed Omega srx file right now. Do you suggest to use the CTE default Rules.srx file after importing the Omega file contents instead? With Ratel it seems quite easy although a bit tedious, but what's more tedious than some largish translation jobs like the ones I have been receiving lately?

I suggest you hang on until the next update. Probably, there may be the (upper/lower) case difference in language codes. CTE could make the language codes mapping case-insensitive to switch the languages automatically for the external segmentation rules.


1 person likes this
Login to post a comment