Start a new topic

segmentation issue with superscript/subscript characters

Hi all, whenever I create a project with a Word file having superscript/subscript characters, CT segments the source text directly before and after the superscript/subscript characters. I've attached 2 pictures, one showing the source text in Word, the other how the segmentation looks like. Is this normal or is something wrong with my settings? Of course I can join all the segments by pressing ALT + up, but anyway I am wondering if this can be improved? Many thanks in advance for helping!

Hello Jean,

Yes, the "Segment at all tags" checkmark was the cause, thank you for pointing this out. You are the best!!! Have a wonderful Christmas!

*subscript = superscript/subscript

Hello Stefan,

Is "Segment at all tags" enabled in Edit > Preferences > General ? This could be causing the segmentation, since subscript characters are typically marked with tags in CafeTran.

Also, if you use a custom SRX, can you reproduce this with the built-in "Sentence" segmentation rule?

1 person likes this
Login to post a comment