Start a new topic

Small bug: segmentation of Japanese sentences between brackets


With the default segmentation rules, CT will cut a sentence before the end if it's between brackets, like this:

(Some Japanese text。)

CT cuts after the period ("。"), leaving the closing bracket alone in the next segment.

Yes, it does.
And I know this not a bug, but should be attributed to the preset segmentation rules.
What I'm very eager to know, is how to avoid this.
I want to request some kind of segmentation rule editor.


Default and simple segmentation rules are preset. However, you can try out the much more advanced segmentation option by choosing Rules.srx in the Edit > Preferences > Segmentation box. After starting a new project, the bracket following the period should not be cut in the Japanese source.

When this advanced segmentation option is selected, CT also lets you edit the Rules.srx in its Segmentation editor which gets activated next to the Segmentation box. However, note the advanced and editable segmentation rules follow regular expression syntax and may require some basic knowledge of it.

1 person likes this
Login to post a comment