wrong tags in source txt


I'm still exploring the software and I encountered this issue

Why is the source text segmented like this? the dots do not even exist in the source text, and I don't understand why they are treated as segments in the first place? 

Can the dots represent non-breaking spaces? You can check via the pilcrow symbol. Furthermore: what is the source of the document? Web? Scan? You could try if the Ms Word ocr gives a better import.
By segmenting we mean how text is chopped in pieces. Your screenshot seams to show a correctly segmented sentence.


The source is a word file, I will try the ocr solution.

but how is the segmentation correct? 

segment 1: the white house said it believes Iran is

segment 2: .

segment 3: planning to supply Russia

segment 4: .

segment 5: "with several hundred" ..........

What you see in the box is one segment. The dot represents a fixed space.

Thank you so much! 

Can the source text & its tags be edited?

You have to activate editing via Edit > Edit source segments


  1. You can only edit the text (e.g. a typo)
  2. Never (ever) remove a tag or change their order
If you want to have segments without these non-breaking spaces (at least, that is what I guess they are):

Remove them in the source, as far this is possible. 

Note: there must be some extra formatting around these spaces, since normally they aren't surrounded by tags.

Perhaps you can use another (cleaner) Ms Word document for your learning?

