Start a new topic

FEATURE REQUESTS: Translation progress bar modes and language-dependent decimal delimiter handling in QA

Hello!

I'd like to offer two ideas:

1. It would be great to have several modes for the translation progress bar, switchable via the right-click menu (or otherwise):

- segments (this is how it works now);

- words;

- characters.

The bar is updated, as is the case now, when the segment is finished, but in two new modes it should measure against the total number of words/characters in original rather than just the number of segments. In some cases, that may prove useful at least for motivation (like in the large document with tables+much text, etc).

2. Currently, the QA does not take into account the decimal delimiter in different languages. In my case (and some others, perhaps) – decimal point and  comma in thousands in English is substituted with a comma and a blank space respectively in Russian, which results in a slew of QA inconsistent number errors, which are actually correctly localized numbers.

Can the QA be adjusted so that it understands decimal delimiters depending on the language set depending on the source/target language? I mean, if it sees English as the source language and Russian as the target, it should know that 1,2345.678 = 1 2345,678.


@1: It doesn't really matter what mode you choose. It will always deviate from the actual situation. In a positive way.


@2: comma<>dot conversion is nasty. And very complicated. Paragraph 1.2 shouldn't become Paragraph 1,2, but 1.2 km should become 1,2 km. Tell that the software.


H.



H.

@1 That may be a matter of convenience, but it would certainly be great to have the option to switch.


@2 There must be a way. For instance, analyze the text in the segment for keywords and make the context-aware decision about whether the conversion is correct. Or something like that. Or maybe, it is an Excel table look at the cell type (date, number, etc.). There ought to be more or less acceptable solution.


I actively use auto-conversion and don't mind correcting it if I need. Having the numbers QA'ed properly would be a killer feature though.

@1 What I'm trying to say (and I hope it's true) is that whatever mode you choose, it'll only give an indication, and a pretty useless one. If you have been translating a document for two hours, and the progress bar says 50% (in any mode), the only thing that's probable (but not even sure) is that the next 50% will take less than 2 hours. Like 1 hour and 50 minutes. Or 5 minutes. That's why I think it's useless.


@2 Sure there's a way, but it will be incredibly difficult to catch all candidates for conversion, and exclude all the others. Just think of all possible SI units, non-SI units, before are after the number, all ways to indicate a section or a paragraph? And what about if there's no indication in the segment at all?


H.

@1 Actually, the progress bar above the project segment shows only the location of th current segment in the project. It does not show the actual percentage of translated segments. For this, there is a Statistics function (in the menu Project > Statistics) which can be updated anytime via the keyboard shortcut. I am planning to add a real-time automatic updating of the statistics there.


@2 This a good request. Perhaps letting the user specify in Options the source and target format for numbers correspondence could be a solution?


Igor  

@1 

2 woorden

Now, the "progress" may be very much inflated in case the file has several large tables with a lot of numbers; when counted on the word basis though indication should become much more. I understand your point, but for me at least such indication would be a great feature, I enjoyed it very much in OmegaT (to fair, it was based on the number of segments, but still fairly accurate), for instance. 

2 Igor

Statistics function is great for a detailed look at the file, but actually monitoring progress in real time would be great.


@2

2 Igor this very well may be a solution. Moreover, you can determine the format by taking source and target language from project properties.

2 woorden

The context can be determined in vast majority of occurrences, it seems to me. Can you provide some cases of "there is no indication"?

@1: I think it's extremely difficult to get an indication that's not too far off. And I'm happy with it. You think you're only halfway, and shortly afterwards, it turns out that CT did all the work for you.


@2: Both numbers in columns, and numbers of sections/paragraphs. Like


2.6

In this section, we will discuss....


Igor's idea is workable, of course, although I'd say you should specify the formats in the project settings. Somehow. There are too many possible "determiners" for the Options, I think, whereas one look at the document will probably tell you what to exclude/include.


H.


Living on

Jl. Bantul km 12.5

2 woorden

@2 

>>Both numbers in columns, and numbers of sections/paragraphs Like

>>2.6

>>In this section, we will discuss....


In fact this case has some context to it:

  • For a plain text file, the number should be preceded by a return character and some text and followed by (in your example) a return character and some text too. Hence, this one is interpreted as paragraph number and is converter/checked accordingly.
  • For a rich text file (MSOffice XML in particular), you look at the tags first, and, should there be none, perform the above-mentioned check.
Everything depends on how much the text can be formalized.
Can you please elaborate on the "numbers in columns"? If it is just number-return-number-return-..., you can apply the similar logic.


AS: Can you please elaborate on the "numbers in columns"? If it is just number-return-number-return-..., you can apply the similar logic.


I can, but I won't. I completely agree with you: You can apply logic. The trouble is, that you will have to specify that logic to an enormous extent. To some humans, you can say, "A number followed by an `m' for meter will have to be converted", and most people will conclude that this will also go for `cm'. A computer isn't that smart, yet. And you'll also have to "tell" CT that is some - but not all - cases, info in previous or following segments will have to be taken into account. It's a worry. And the illogical part is the most worrisome as far as I'm concerned: The better the algorithm, the more worries. If the dot<>comma conversion only works for 80%, you'll thoroughly check everything. If it works for 99%, looking for that mistake becomes a nightmare.


H.

2 woorden

>>The better the algorithm, the more worries. If the dot<>comma conversion only works for 80%, you'll thoroughly check everything. If it works for 99%, looking for that mistake becomes a nightmare.


Then, it is unclear to me what is the point in having a possibility to QA numbers here if you're supposed just supposed to copy/paste them for QA not to fail miserably (I mean, if the possibility of their localization is not taken into account). If it is only about typos in numbers, well, it makes it quite a niche (and useless) feature.

Hi Andrii,


There is the second progress bar displayed in the last update (see the new option in Project > Statistics > Automatic update of project statistics. I improved the localized formatting of numbers too:


https://cafetran.freshdesk.com/discussions/topics/6000022389


Igor

Hi, Igor!


Thanks a lot! I will test it for several days and post feedback here.

I've switched on the option "Automatic update of project statistics". Now there are two progress bars, but what do they actually indicate? If I understand correctly the first one shows the progress based on number of finished segments, and the other one "percentage position" of activated segment, no matter how much translation has been finished?


I agree with Andrii that there should be possibility of choosing the method of assessment of progress like in other CAT tools (words, characters, characters with spaces, segments).


If I am not mistaken, Wordfast Pro indicates progress with decimal place, which is good.


                                                                Regards


                                                          Wojciech

Login to post a comment