Start a new topic

Using MateCat as an additional file filter for CTE

I'm copying this from the document CafeTran Espresso - File formats for reference here in the forum as well.


MateCat is a free online CAT tool which handles no less than 70 different formats and offers an excellent and easy way to translate file types for which CafeTran does not yet offer native support, acting as an additional file filter. 


MateCat builds upon Okapi Framework file filters and represents probably the easiest way to use these filters.


Here is the MateCat round trip procedure (preparation in MateCat, translation in CafeTran, final conversion in MateCat):


  1. (Optional) Create a MateCat account or login with your Google credentials, for easier access and management of your projects.
  2. Create a project in MateCat by giving it a name, selecting the language pair (and the subject if desired) and uploading the files.
  3. Once you click Analyze, you will see a breakdown of the word count. You can access the job itself by clicking Translate in the bottom right, then Open job.
  4. On the Translation Editor page, click the arrow next to the Preview button in the top-right corner of the screen and select the option Export XLIFF from the drop-down menu to download the translatable XLIFF file, which you can save on your PC. The XLIFF will be pre-translated with suggestions from your private TMs, the public TM and machine translation.
  5. Translate and Finalize the XLIFF file in CafeTran. [You may need to use Project > Documents to view/glue all documents, some text sections of each file being held in separate .xml documents within the XLIFF itself.]
  6. Use this MateCat tool to convert the translated XLIFF back into the original file format (DOCX, if OCR or PDF conversion is used). Just drag and drop, the download will start!

 

Additional notes:

  • Added bonus: the XLIFF will be pre-translated with suggestions from your private TMs (you can import them prior to or at project creation), the public MyMemory TM and machine translation (Google/Microsoft translate). Other MT options are also available.
  • If you wish so, you can disable MT (Settings > Machine Translation), although public MyMemory TM matches will still appear in the XLIFF file you download and target segments with no matches will be populated with source text. To start with a clean slate, you can use Task > Remove target segments in CafeTran. This will also set the progress statistics to 0%.
  • Highly recommended steps if you intend to perform at least part of the translation in MateCat:
    - Create a private TM resource:In the Project creation page, click on Settings (Alternatively, in the TM and glossary field, expand the drop-down menu and select Create resource).
    - Click on + New resource button in the opened dialog. Give the TM an optional name. Hit Confirm. You will see that “MyMemory: Collaborative translation memory” resource is Enabled for Lookup, but not set to be Updated anymore. That way, translated segments will only be stored in your private resources.
  • Be sure to check MateCat’s Privacy policy to confirm this solution is appropriate for you. If you prefer offline solutions, you’ll be better off using other options, such as the Wordfast Pro 3 demo version or, if you feel comfortable, Okapi Framework.
  • The same goes if you wish to configure additional filter options, since MateCat does not seem to offer any for its supported file formats.

 

Related links:

MateCat - Supported file formats and languages
MateCat - online support documentation
MateCat - Translating offline
MateCat - Philosophy and Terms of service (includes Privacy policy)


Hi Jean,
many thanks for your detailed observations.

I agree that their very casual approach to user privacy is a big turn-off. I researched privacy in detail before I even touched MateCat, but I still managed to inadvertantly donate an old TM to MyMemory on my first test run (despite having creaated a private TM). Fortunately it was nothing confidental, but that was more a matter of luck than judgement.

Otherwise though, I was impressed with their MT results, though in fairness my test text lent itself very well to MT, and I'm sure the results for the sort of medical texts I usually translate would be far less impressive.
For me poor configurability that works beats amazing configuarability that's undocumented and often fails to work, but I appreciate that that's a matter of taste.
I also like the fact that it's so lightweight and I could in theory do my work on any computer.

I stopped using CT's internal browser long ago, as it occassionally crashed CT – I know Igor says this has been fixed, but he said that quite I lot when I first started out too. I just use AutoKey instead (thanks, if I remember rightly, again to one of your previous posts), which I find excellent. (Plus, of course, it works everywhere.) Indeed I find AutoKey is great for lots of things (like converting dates, legal citations, etc.).

Cheers,
Jeremy

 

Oh, and I love the inline QA.

 

Hello Jeremy,


Thank you for these details!


Inline QA: MateCat uses lexiQA and it is good indeed. I like CafeTran’s QA and filtering, and it works fine along with a step including Antidote (for English and French). MateCat does not support bilingual file export/import (although it has a nifty Revision feature with Changes comparison and error typology).


MateCat’s MT results use machine translation from Google Translate, DeepL and Microsoft Translator, plus offer suggestions from MyMemory. All these MT providers can be queried from within CTE, so I don’t think there is a real advantage there (and in CTE you can compare the MT results, plus toy with DeepL’s suggestions before inserting them).


The internal browser works very well now on my GNU/Linux machine, no crashes. Almost all resources can be queried from there, which offers the added benefit of keeping your text visible at all times. Very practical. If you have enough RAM (8 GB or more), you should try it!


I’m interested to know more about your AutoKey workflow as an alternative to the internal browser. Would you like to add a separate post for this in Linux Topics and Questions or External Tools (macros, scripts, regular expressions, etc.)? I use AutoKey myself and find it excellent, but have not set it up for querying resources.


Cheers,


Jean

 I know is an old thread, but thanks! I'm curious to know what you think of MateCat in general - I hadn't come across it until yesterday (thanks to another of your posts), but (assuming you're on a fast, reliable internet connection) it looks pretty interesting as a CAT tool in its own right.


I've now started an Autokey thread here.

Hello Jeremy,


You’re very welcome!


I was initially drawn to MateCat because it has open source at its core, and have done a number of projets with a company that uses it, along with some personal ones. Later on, when I saw the announcement about the open sourced MateCat Filters (which build upon the Okapi Framework ones), I tried to make use of them, but could not figure out how to go about. But the idea stayed in the back of my head. While working on the reference documents, I revisited MateCat on purpose and discovered there IS now a way to easily export the XLIFF files after performing the translation in a third party tool, making this an excellent solution for a round-trip with CafeTran.


I find it an excellent online CAT tool, it sports many nice features. Because it is so easy to learn and use, this is what I actually may recommend to those starting out.


As for me, I prefer to use CafeTran all the way though and only revert to MateCat occasionally.


Some not so good points:

- By default, the TM is public (your segments go to MyMemory), you need an additional step to create a private TM. This is a big turn off, I expect many clueless users to fall for this and even put their clients content at risk. Easily avoidable, but this is not the way to go.

- MateCat has some nice export features, and being able to export a native XLIFF, translate it in another tool (CafeTran) and be able to export the document in the original file format is excellent per se. But I would also like to be able to reimport the XLIFF in MateCat itself. This is quite limiting.

- While managing TMs in MateCat is quite good, uploading Huge TM resources can be a pain (and I don’t know about the file size limit).

- Of course, in many ways, it is less configurable that CafeTran.

- Being able to access various web resources from within CafeTran is priceless, as you keep your text in front of you at all times. This can be minimized with the use of a configurable online resource such as magicsearch.com, which allows you to launch terminology/bilingual concordance searches in different language pairs (and even perform monolingual searches) in one place, in just one browser tab or window. This is what I use when I absolutely need to perform a translation online and not in CafeTran, and it makes the process less painless.


Jean

Login to post a comment