Start a new topic

segmentation issue with superscript/subscript characters

Hi all, whenever I create a project with a Word file having superscript/subscript characters, CT segments the source text directly before and after the superscript/subscript characters. I've attached 2 pictures, one showing the source text in Word, the other how the segmentation looks like. Is this normal or is something wrong with my settings? Of course I can join all the segments by pressing ALT + up, but anyway I am wondering if this can be improved? Many thanks in advance for helping!

Hello Stefan,

Is "Segment at all tags" enabled in Edit > Preferences > General ? This could be causing the segmentation, since subscript characters are typically marked with tags in CafeTran.

Also, if you use a custom SRX, can you reproduce this with the built-in "Sentence" segmentation rule?

1 person likes this

*subscript = superscript/subscript

Hello Jean,

Yes, the "Segment at all tags" checkmark was the cause, thank you for pointing this out. You are the best!!! Have a wonderful Christmas!

Login to post a comment