I am not sure if this helps, but there is a setting that you can toggle on/off or tweak that could help.
It is found in Preferences > Workflow > Replace characters at source transfer.
These field pairs allow you to set characters you wish to replace during source transfer.
Note: This replacement option is a helper to "Transfer numbers to matches" feature. It lets your replace the defined characters in a numerical expression during the transfer from the source to the target segment.
https://github.com/idimitriadis0/TheCafeTranFiles/wiki/1-Preferences#workflow
Transfer numbers to matches can be found in Preferences>Auto-assembling.
If the option is ON, CafeTran automatically inserts the numbers present in the source segment into the target suggestion, replacing those from fuzzy matches.
Auto-assembling settings are also used when CafeTran handles TM matches.
Hi Igor,
Can you please elaborate on your previous reply on this? I am also interested to know. You said F4 should turn e.g 12.4 into 12,4 if your languages have that respective number formats.
How does CafeTran achieve the decimal and number formats difference based on the project languages?
Thanks!
> How does CafeTran achieve the decimal and number formats difference based on the project languages?
It achieves it by checking the language codes of the project and then translating numbers from the number format of the source language to the target number format. There are helper methods within the programming language itself which may not cover complex cases.
Hi Igor,
Thank you for your answer. I assume this is for when the "Format numbers" is used.
Is there a way to (re)view the rules used per language code?
For example, 1,000,000.00 is usually written as 1 000 000,00 (with spaces or preferrably non-breaking spaces and comma for decimals) in French.
Is CafeTran supposed to doing this conversion when translating from English to French?
The specific locale implementation for the given language is buried in the Java code. Oracle makes available very general help documents without specifying the implementation details for languages. For example, it looks like the grouping (thousands) separator implemented for French in Java is not normal non-breaking space but the narrow non-breaking space. Please expect some improvements to handling (converting to and from) the numbers with the the grouping (thousands) separator in the next update.
Hi Igor,
Thank you for providing more information about this.
Indeed, in French printed works, the grouping separator to be used is what we call an "espace insécable fine" (narrow non-breaking space, or U+202F), but otherwise the simple non-breaking space is being expected and used.
In 5 years, I had not one translation project where the narrow non-breaking space was to be used.
This is a good example to highlight the need for the user to be able to view/access and if necessary set/customize common numbering format rules (like decimal and grouping separators).
Plus, not knowing the rules that CafeTran reads, the user does not know what to expect and which types of numbers can be correctly converted into the target language.
---
But even before considering that, there is a more pressing question.
According to the description of "Format numbers", with this setting enabled, "CafeTran formats numbers to the target language numbering system if it is different from the source language system".
How to experience the numbering format conversion?
With "Transfer numbers to matches" and "Format numbers" enabled in Preferences>Auto-assembling (but not "Replace characters at source transfer", which is I find too crude/sweeping for my needs, plus it does not handle both periods to commas and commas to a non-breaking space or a period), I am unable to experience what these numbering rules do.
There is simply no conversion occurring.
Numbers stay the same in English to French (with F4, source transfer or fuzzy match/auto-assembling), but also from English to Portuguese and vice versa, for example. I think the issue is general and not related to a single target language.
I understand there will be an improvement when the grouping separator is used, but even for decimals, I cannot reproduce the conversion that is supposed to happen when the setting is enabled. Am I missing something?
---
At least for numbers which include grouping and/or decimal separators, I think there should be an easy way (F4, if they can be recognized as non-translatables) to transfer these and have them converted into the target language formatting (maybe providing more than one formatting option, like at least source/target formatting. For example, the year do not take the grouping separator in French, its 2021, non 2 021).
Other common numbering formats could include dates, currency amounts (to include the currency symbol. for example, or €1 is written 1 € in French,with a non-breaking space, same if euro is written as EUR), etc. which calls for CafeTran to recognize that this number is a date (and that the / separators are part of the number) or that the currency symbol or shortened currency name is part of the number too, even where there is no space before or after the number.
For translators who are routinely dealing with numbers and need to rely on the ability of a CAT tool to quickly transfer these numbers and have them accurately converted according to target language rules, improving how CafeTran deals with this aspect could make a big difference!
Let's see what improvement in this regard will bring the next CTE update.
Hi Igor,
I have just made an EN>FR test in the latest version (10.8.3) and here is what I see :
- When transferring a number with with F4, the comma (thousands grouping) is now converted to a simple (not non-breaking) space.
Before that version, there was no conversion at all, so this is an improvement. However, I would like for example to use a non-breaking space, instead of space. How do I do that? Where can the user review and change the numbers conversion setting? If this setting is readable by CafeTran, theoretically it could be shown to the user and etibable on a target-language basis.
- The decimal conversion from a period to a comma seems to work, unless the decimal is followed only by a zeros (.05 becomes ,05 but .00 stays .00).
- The F4 drop-down does not show a preview of what the number will look like when transferred. For me, this is not the desired behavior. I want to see exactly what will be inserted…
I also want the option to insert the number as found in the source text, without having to resort to Translate>Transfer source segment (to transfer the selected number from source as is), Maybe the F4 panel could be improved by offering more than one option?
- If Preferences>Auto-assembling>Transfer numbers to matches is enabled (with or without Format numbers, which does not change anything!), when a fuzzy match is automatically inserted (or inserted from the Matchboard, or via the F1 panel)… the number is being distorted. In my view, this makes the setting "Transfer numbers to matches" quite unusable and even dangerous to unsuspecting users, especially since it is enabled by default. Please consider at least having it disabled by default.
Example:
Number 124,120.23 becomes 124 000,23 120,23.
- The same happens when the MT suggestion is inserted from the Matchboard, even when Preferences>MT services>Team auto-assembling with Machine translation is disabled. This does not occur when the shortcut is being used to insert the suggestion, or if I click on the suggestion in the specific MT tab.
- In percentages (numbers with the % sign), the percentage sign is not recognized as a non-translatable and not treated by conversion rules. If such rule exists in the Java code, could the percentage be treated as well?
- For currencies, when the currency sign is placed just before the number (as usual in English), it is not recognized as a non-translatable and not treated by conversion rules (if they do exist) for correct placement (in French, it follows the number with a non-breaking space, normally). Oddly enough, if the currency sign follows the number (which is not usual in English), the non-translatable includes the currency sign and just transfers it at the same location. Same goes for the standard three letter currency code "USD".
Thanks!
Added note: What does Format numbers actually do? In my test, in this version as well, I don't see what the setting does.
>I also want the option to insert the number as found in the source text, without having to resort to Translate>Transfer source segment (to transfer the selected number from source as is), Maybe the F4 panel could be improved by offering more than one option?
Me too. Especially, with regard to the new feature to modify non-translatables, it would be nice to see a simple F4 menu that shows the original string and the modified one, allowing to insert either of them.
dekka
Coming from WFP, I'm having a hard time with the following (using the demo version):
- I exported my TMs (both big and small ones so the demo works), loaded them in the dashboard. On WFP, for this specific document, the analysis shows 3482 words out of 6111 that are no matches. Using the same TM in CTE, it gives me 4133/6089. I would expect some difference between the apps, but maybe this is too much? (This specific TM is larger than the demo limit, but the app is reading it, not sure it would save to it)
- I can't figure out how to handle numbers easily. I need the dot/comma conversion, and I need to transfer these numbers to the target without having to type them (I work with financial docs, too dangerous to type numbers and make a mistake). I see the numbers highlighted in pink, then I can use the F4 shortcut, but then there is no dot/comma conversion. I found out I can also type three digits (my prompter starts at 2), delete the last digit, and then a suggestion pops up for a number that does not exist in the source:
- Still numbers. I asked CTE to Insert all exact matches, just to see what it would do. It filled the target segments that were 100%, but did not change the numbers:
My current configs that I believe are for numbers:
- Workflow > Replace characters at source transfer: unchecked (have tried checked, no change, even with commas and dots in the correct boxes)
- Prompter: checked - Prompt phrases, Two-word, Auto case adjustment; Prompting starts (2), Minimal word length (3)
- Auto-assembling: checked - Transfer numbers, Format numbers
- CTE has also inserted 100% matches that are not exact matches. In one example, the source in the TM was "Em 2019, foi criado o Comitê de Engajamento, que envolve diferentes áreas, percepções e ideias para assegurar um ambiente de trabalho saudável, com equipes e colaboradores comprometidos", and the source in the new document was "Em 2020, o acompanhamento do engajamento da organização foi realizado através de reuniões do Comitê de Engajamento, que envolve diferentes áreas, percepções e ideias para assegurar um ambiente de trabalho saudável, com equipes e colaboradores comprometidos" (underlined is different text), but the segment was marked as 100% and filled with the wrong translation, and the TM tab was not marking the differences in red.
I think these are the most pressing issues for now :-) TIA