Start a new topic

Easiest way to convert CafeTran .txt glossary with synonyms into a TMX?

Does anyone here have any tips re easy ways of converting CafeTran TXT glossaries with synonyms into TMXs?


Michael


1 person likes this idea

I was thinking more in "terms" of getting rid of those synonyms and regular expressions, and then turning them in to TMX for fuzziness and more control. For exchange purposes, a one-click conversion is already available.


MB: but i doubt it will be in TMX format


I don't think the dictionaries are, but I do think you can import TMX files (that will then be converted, like in Total Recall). We'll see. You'll see. I won't.


As far as I can see, a "package" for Windows already exists, not supported by the Moses project, but I don't think Tom's Slate Desktop is either.


http://www.statmt.org/moses/?n=Moses.Packages


H.

And because of this one-man and possibly one-time Experiment you request a feature right away? How about sharing your findings first? Perhaps then you get some support from other experienced yet critical users.

Hans (woorden) = black

Michael B = green


--------------------------------------


I can think of a few reasons:


• He needs to share the terms with a colleague (if I wanted to share terms I would definitely not send someone a TMX; I would probably send an Excel file; most translators don't even have a TMX editor installed or know what it is)

• He needs to use them in another CAT tool (most other CAT tools prefer to import terminology from an Excel file or a delimited text file (usually .txt or .csv))

• He wants to use them in Recall (nope)

• He wants to see if he can benefit from them in Slate Desktop (I don't yet know what format Slate's dictionaries will be in, or if it will even have the ability to use dictionaries/glossaries, but i doubt it will be in TMX format; however, who knows, it might)

• He finally came to the conclusion that I am right (which is of course actually the prime reason)(definitely nope)


Speaking more generally though, I think that having a converter that can convert between terminology containers and translation memory containers, and one that can respect synonyms, would be a valuable asset to CT. The ability to convert a glossary containing synonyms into a TMX (or vice versa) is actually something that other users of other CAT tools may also find useful. They would of course have to fiddle around a bit to get their format  into the CafeTran format, or vice versa (e.g., memoQ separates its synonyms differently in its delimited CSV terminology containers, but it isn't all too difficult to convert that into the CafeTran glossary format), but I can see how this could be a selling point. Currently, there don't seem to be any good converters that can do this.



I was experimenting with importing a project specific TXT glossary into my project TMX, to see what the difference would be like in practice.

Hans CafeTran Wiki: Why are you using a TM for terms, Michael?


I can think of a few reasons:

  • He needs to share the terms with a colleague
  • He needs to use them in another CAT tool
  • He wants to use them in Recall
  • He wants to see if he can benefit from them in Slate Desktop
  • He finally came to the conclusion that I am right (which is of course actually the prime reason)
H.
Why are you using a TM for terms, Michael?

MB: I meant a built in converter in CT that also respects synonyms.


No comment.


H.

Yeah, I forgot that CT can of course also convert between glossaries and memories for terms (via Memory > Import ... e.g.). However, that's not what I meant: I meant a built in converter in CT that also respects synonyms.

MB: I actually managed to solve it differently


Good.


MB: ... automatic (i.e. quick & easy) converter for this (Convert Glossary <-> Memory for terms), built into CT...


But, but... it's already there. Just import/export them files. Better than using TMXEditor actually, considering your love for BIG things (TMXEditor can't handle large TMX files without splitting them first).


H.

Thanks for the Perl link! I used to have Perl running using some other Windows thingee, something like Active Perl, or ‘Active’ something or other, can't remember. Anyway, got side-tracked (as usual), and am working on something else at the moment ... 


However, I actually managed to solve it differently: I did it in Ron's Editor, which has a handy Column > Split function, so I split the src and trgt columns into separate columns (using the semicolons), and then via a little copy-pasting, created a tab-del with all the synonyms onder elkaar. Then I converted this to a TMX in Heartsome's TMX editor via Convert to TMX.


However, I think it would be really useful if there was an automatic (i.e. quick & easy) converter for this (Convert Glossary <-> Memory for terms), built into CT. I know how much you love features, so I requested it from Igor.

No success?


I of course applaud what you're trying to achieve. Since most scripts/apps that are useful for text manipulation seem to have been developed for UNIX - Perl, sed, awk, grep - I thought I'd point you to StrawberryPerl. All of those scripts can run under Windows, but not straight from the command line (I think). If it's not working, I'll be happy to try running the scripts, but as you will understand, I don't have any tab dels with synonyms, so you'll have to send me an example that reflects your structure of those files.


H.

Try installing http://strawberryperl.com


H.

I tried the awk script mentioned here: http://cafetran.wikidot.com/using-source-side-synonyms (which I found thanks to Masata's post here: https://cafetran.freshdesk.com/support/discussions/topics/6000013801), but am getting an error.


I don't know how to run the Perl script.


Michael

Login to post a comment