In an Excel file, I have the problem that soft returns are being propagated as hard returns.
The segment below has been propagated.
You might assume that this does not make any difference, but I am pretty sure that it makes.
Another question is whether these returns shouldn't filtered out with the segmentation. In most cases it does not offer any advantage, and it troubles the fuzzy recognition enormously. Example:
This car is red.
This car is blue.
¶This car is blue.
For the last sentence CT will offer first sentence 1 as fuzzy and then sentence 2 (if at all).
Note: Of course the very first source segment above ("EN ISO") hat a soft return at its end.
And indeed, the behavior changes, obviously depending on if the last character is a number, a semicolon or a letter. But there is no rule that could be established.