Start a new topic

space after colon/semicolon - need help with segmentation

Hi all,

I have been testing cafetrans for a few days now, and I successfully made CT segment the source text after a colon/semicolon. The problem is that now, the segment that follows the colon/semicolon, contains a space in the very beginning.

For example, the sentence:

"This is the first part; this the second"

would result in the second segment being shown as " and this the second" in the CT source segment window, with the space before the "and".

How to get rid of this leading space? Unfortunately I am no regex wizzard, and all my tries of editing the EN language rule in the GUI and in the rules.srx file were unsuccesful.

Does anybody have an idea what settings to use for "beforebreak" and "afterbreak" to make the space disappear?

Many thanks in advance!

Hello Stefan,

I'm using OmegaT's SRX and have added the colon (not the semi-colon) rules in Language rulesĀ  > Default.

Here's what my rules look like:


beforebreak: [\.\?\!:]+

afterbreak: \s

You can try adding a semi-colon after the colon.

This was explained in the "Suggestion" section of Prefrences > General > Segmentation: but for some reason the back slashes were not displayed. I have just corrected this in the Reference document.

Hi Jean,

Many thanks for your suggestion. I haven't tried it as I figured it out on my own before I read your reply, this is what worked for me:

beforebreak \;

afterbreak: \s+

It took quite some research and trial and error though, and in the end I needed a break myself :P

Now everything is fine again.

Best regards,

Nice! Kudos for your perseverance :-)

Login to post a comment