Start a new topic

Easiest way to convert CafeTran .txt glossary with synonyms into a TMX?

Does anyone here have any tips re easy ways of converting CafeTran TXT glossaries with synonyms into TMXs?


Michael


1 person likes this idea

Hello, everybody

If you are interested in my macro, please see the Tips and Tricks section.

M,

 

MB: This would allow people who hate to bother Igor for new features (not me ;)) to suggest or create their own functionality,

 

Good idea. Should be a new/separate section of Freshdesk, though. I'll contribute, for example, with my (Mac only) solution for dropping files on the Dashboard. If you run CafeTran, the folder that contains the current job opens, and stays frontmost, solving the problem of resizing the CT window or opening a folder "manually."

 

 

 

Will record a new screencast, of course.

 

H.

 

Hi Masato,


Cool! However, you are aware that Igor added something similar to the latest version, right? Your supplemental macro for statistical analysis sounds very interesting though!


Incidentally, your macro reminded me of an idea I had a while back, which others here have also had I'm sure: it would be cool (sorry, I know I probably shouldn’t say "cool" twice in one post) if CafeTran had some kind of plug-in system that would allow third-party developers to contribute macros etc. to CafeTran. Basically kind of like the SDL exchange thing. This would allow people who hate to bother Igor for new features (not me ;)) to suggest or create their own functionality, which could then be shared with all of us. Just an idea.


@Igor: any new re the updated version of this feature that properly handles all synonyms?


Michael



Hi All,

I've successfully made an Excel macro for splitting synonyms, retaining all subsequent fields.

Now I'm writing a supplemental macro for statistical analysis, so I hope that I will be able to offer the package to you in a week or so,

Have a nice day!
Masato

 

Wow, thanks for trying Hans! I'm so sorry you had to create a Beijer Glossary. It must have been painful ;)

UPDATE:


  • I tried the Perl script. I don't understand Perl, but I should be able to run it anyway. It won't run
  • The awk script does run, and it runs successfully. But I think you'll have to run it several times (when to stop?)
  • A regex looks OK, but again, I think you'll have to run it several times


In other words, it's not particularly simple, and I've only been dealing with source side synonyms. An MB glossary (which, for obvious reasons, I had to create - and I called it Michael Syns) can also contain target side synonyms, regular expressions, and sentence patterns. You should be able to delete the latter two quite easily, but I'm afraid there's little else you can do. Aspirin.

H.

>Do check the latter, as I don't use macros.


You can do it with a formula too, but you have to drag it down all the way.

MB: I know that Excel can be a bit suboptimal


That's why I switched to LibreOffice Calc recently (and I think I mentioned it here). Calc seems to have it all: UTF-8, can save to tab del (Numbers can't), accepts macros. Do check the latter, as I don't use macros.


H.

Hi, Michael

You are right. According to my experience, you'd better convert your glossary into UTF-16 format (or else) somehow; otherwise (in UTF-8), not all the data may not be imported successfully, which I don't know why.

Peace,
Masato

Thanks Masato!


One small question before you begin: will your solution in Excel respect the UTF-8 encoding of the CT resources? I know that Excel can be a bit suboptimal when it comes to UTF-8 (and end up trashing all manner of special characters), which is why I always use a CSV editor like Ron’s editor (which, however, of course doesn't have macro/VBA capabilities).


Michael



Hi,

I'm writing an Excel macro/VBA for the users who are not familiar with those "script" stuff, that converts a glossary into one with synonyms in both the source and target segments split while retaining (duplicating) all the subsequent fields (context, note, et cetra).

I hope I will succeed.

Cheers,
Masato

 


1 person likes this

MB: Leaving aside regexes for the moment


I don't think you can do that, but anyway, if it's that simple, I think you can explain the process in the Wiki, or even write a script for it. I think I can write one for the Mac users (not for the regexes, though).


H.

@woorden: Leaving aside regexes for the moment, I think creating a Converter that respects source and target-side synonyms would be trivial for Igor. I mean, if I can do it in five to ten minutes using Ron's Editor and a text editor, how hard could it really be?

MB: They look around in the menus for a Converter, but cannot find one.


You can easily convert between the two kinds of termbases, it's just that that conversion txt>tmx  can't handle ss and ts synonyms and regexes. And I don't think that's a problem you can solve easily, especially not the regexes.


H.

@Hans CafeTran Wiki:


It is not a ‘one-man, one-time experiment’ that leads me to believe that this feature would be useful, but the fact that it is something general, and very important. 


CT has two formats to store terminology in. It would therefore not be so strange if CT also had a way to convert back and forth between these two. This seems pretty reasonable to me, and something that users (new and old) might actually expect and find useful.


A possible use case: after listening to the older users argue endlessly about the old Glossaries vs Memories for Terms issue, a new user decides he/she would like to try one of them out, but can't figure out how to convert their current choice of terminology format into the other format. They look around in the menus for a Converter, but cannot find one.


This is not some crazy feature that no one will ever use or understand, but basic (missing) functionality.



Login to post a comment