After my last rant some weeks ago I was notified that I had a wrong (let's say legacy) setting.
Now I wanted CT to recognize a word that contains some issues: Maître d'œuvre. I need to introduce whole bunch of entries to reflecting possible (but common spelling error):
Maître d'œuvre (Standard)
Maître d’œuvre ("wrong" apostrophe, very common in FR)
Maitre d'œuvre (missing circumflex)
Maître d'oeuvre (the œ)
In French there are dozens of quite typical errors.
So wouldn't it be nice to have CT ignore the difference between e.g. different apostrophes, vocals with and without accents or any other similar problem?
The only 2 workarounds are:
- making big TB entries (in my example above you need to multiply the mistakes)
- fiddling around with RegEx (after making your personal RegEx master).
Any other idea?
I assumed the TMX format for glossaries was no longer supported.
And even if: How do I proceed?
Sorry for my ignorance, it is Friday ...
The current solution to catch the numerous variants of the phrase is via the regular expression in you glossary. For example:
The pipe at the start says it is a regex. in the glossary
The dot means any character possible.
(œ|oe) = œ or oe
The above reg. ex. might be optimized yet but it should work fine.
This is great, thanks. I assumed that it would be much more complicated.
The optimized solution which lists all the possibilities of the apostrophes and accents would be much more complicated. I took a shortcut using a dot (meaning all possible characters).
Personally, I'd prefer legible source terms at all times. So I'd go for adding all possible (well, most of them) variants at the source side:
Maître d'œuvre;Maître d’œuvre;Maitre d'œuvre;Maître d'oeuvre TAB Project Manager;Projektleiter
And then I'd create a script or macro to generate the likely variants of the source term and past them at the end. This may sound complicated, but it doesn't have to be.
With the macro, I'd:
Additional advantage: Once you've created and optimised the macro (you'll be adding extra F/R actions in time), you can fully concentrate on translating and don't have to get distracted by the need to create complicated regular expressions during your work.
The TB is shown as it appears in the text, no matter what RegEx has been entered. But there are three good arguments for Hans:
This is what the French Hunspell (the LO extension, I assume they correspond) gives out:
It accepts maitre (there is no such spelling in FR) and the "wrong" apostrophe, but not oeuvre.
>It accepts maitre (there is no such spelling in FR) and the "wrong" apostrophe, but not oeuvre.
This is what I was expecting. And probably Hunspell can be taught to accept oeuvre for œuvre too (by modifying the human readable Hunspell files–not as complicated as one thinks).
Okay, this is why I asked:
You could ask Igor to have Hunspell run in the source pane too, in the French variant. I'm not sure whether Hunspell can run two instances at the same time. Else, you could perhaps accept to run it for the source language during the translation phase and to run it for the target language during the reviewing phase.
Once this is operational, the next step would be to allow stemming via Hunspell for the source language too.
Et voila, Bob es ton oncle.
I've been talking nonsense here: this stemming is already present in the source box :). Must have been a bloedpropje.