Better numbers

This is nothing new, only to illustrate:

  • Would it be possible to have numbers only with a space between them (no matter whether normal space – case 2 – or non-breaking space – case 1) regarded and treated as one number? Are there any use cases where such a differenciation would makr sense?
  • Do I assume right that there is no difference in that the number is non-translatable or not (just to be sure)?

Next stop: numbers with percentage-symbol, then.

Only for curiosity:
Example 1 has 6 words with 2 different words. Example 2 has 8 words with 2 different words. Shouldn't that be a 66 % match respectively a 75 % match?
1. That's a good idea to treat such spaced numbers as one.
2. CT compares the length of the segments (not the number of words).


Really appreciated. As I mentioned above, there might be some exceptions:
  1. In the year 2017 400 journalistes were killed.
  2. In the year 2017, 400 journalistes were killed.
  3. In department 36 400 people are employed.
  4. In department 36, 400 people are employed.

While 1 and 3 are bad style and deserve a false positive (if the numbers are separated in the target text, and they should), cases 2 and 4 is a bit different from above because of the comma. Would a rule be necessary here (only consider numbers with 1 to 3 other number separated by a space (and perhaps additionally a comma) as a number? On the other hand side many OCRed docs tend to have such numbers.

These cases should perhaps be kept in mind, at least that they may happen. I did not want to create more problems (maybe there is a very simple solution to resolve all this, maybe a question of "what is more probable?"). Happy holidays.

