Start a new topic

QA glitch with non-breaking space

Hi Igor,

I spotted a new segmentation and QA glitch:


I have a source segment which starts with a non-breaking space.

1. This was not treated as a space during segmentation, so the segment was not split at this point. I therefore split it manually and included a space at the start of the target segment.

2. QA now shows a) "Inconsistent leading space", which as far as I'm concerned is wrong (though I concede that there might be some extremely obscure case where you would want this behaviour), and b) "Letter case does not match", which is definitely wrong.

One for your bug tracker.

Jeremy


> I have a source segment which starts with a non-breaking space.


The  purpose of the non-breaking space, as the name suggests, is not to break - to keep the space surrounding contents as one chunk. Imagine CafeTran breaking at segments where the author of the document clearly meant not to break it by putting the non-breaking space. That would be wrong. 


The QA glitch is triggered by the glitch in document's contents. I am not sure if the QA functions should treat both types of spaces the same way - as consistent leading spaces. If the non-braking spaces are irrelevant (e.g inserted by some type of the OCR converter), it would be good to replace them in the source segments with normal spaces via CafeTran's Find and Replace function, at the start of translation.

> Imagine CafeTran breaking at segments where the author of the document clearly meant not to break it by putting the non-breaking space.


The author of the document didn't place this to prevent segmentation.

They almost certainly placed it either a) accidentally or b) because they don''t know how to justify their text properly. There is no legitimate reason for placing a nbsp between two sentences.

I really don't think your argument holds water.


> . If the non-braking spaces are irrelevant.
I would suggest that nbsp between sentences are always irrelevant. Do you have a specific use case in mind, where this would be intentional and rational? Because I'm extremely sceptical that such a use case exists.



Oh, and you didn't address the 'Letter case doesn't match' point.

> They almost certainly placed it either a) accidentally or b) because they don''t know how to justify their text properly.


CafeTran does not know the intention of the author of the source document. It follows the intention of the author who designed the non-breaking space.


> I really don't think your argument holds water.


> didn't address the 'Letter case doesn't match' point.


I usually don't address issues with arguments but with coding if I am able to. It may be fixed in a future update.



Login to post a comment