Start a new topic

Add an existing glossary file to CafeTran

The automatic creation of the ProjectTerms glossary by CafeTran, the manual creation of an empty (new) glossary (Glossary > New glossary...) and the addition of an existing (e.g. downloaded from the internet or supplied by your client) glossary (Glossary > Add glossary...) are three completely different workflows.


In this soundless video I'll show you how to prepare a specialised glossary from the web, for use in CafeTran.


Also featured:

  • Creation of a PDF from a webpage, using your browser's Reading mode
  • Conversion of a PDF to a Word document, for translation in CafeTran

Google for a specialised glossary, download it, convert it to tab-delimited, add the file as a glossary to CafeTran



1 person has this question

I just realised that there's even a fourth way to add a glossary to a CafeTran, well actually it is a way to open an empty glossary template for a certain language combination.


Currently there are 4 glossary templates stored in the CafeTran package. They contain some example term pairs for a certain example text (the Declaration of the Human Rights). They do not contain the 5 column headers.


You can call the glossaries this way:



Here's the English <> Polish one:



A glossary template for DE <> EN would look like this:




There is no content required (though, at a later state some basic content from the public domain could be added, there are lots of free term lists for starters available on the web).


Per language combination, only one template is needed, since CafeTran can query glossaries from left to right (the default situation, I guess), from right to left or bidirectional.


Still, the number of language combinations is huge (though limited), even when we only offer templates for the most 'common' (frequent?) combinations. The manual creation of these glossary templates is simple but requires some work. So it would be nice if someone could automate that. Perhaps even Igor?


The idea would be to pack the templates with the CafeTran installation package. New users start using two different glossaries right away: the ProjectTerms and the AB-CD glossary for his lingo combo. Hey man, I even realise now that CafeTran could preselect the template itself, because the languages are already set in the project config.


Okay, so which lingo combos should be offered?


Let me start by listing the most 'common' languages:


Arabic (supported by CafeTran?)

Dutch

English

French

German

Greek

Italian

Japanese

Javanese

Polish

Portuguese

Russian

Spanish

Thai

Turkish


Of course the languages of the installed user base should be covered.


https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers

https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes


I am very interested in this. Has anything happened since? Thanks!

Hi Jessica, What exactly are you looking for?

Hi Hans, 


Sorry I wasn't clear. I was really interested in the glossary packages idea. I was wondering if anything ever happened since you mentioned it in 2015. If so, I would love to learn more, and save time not having to add so many fragments as I'm starting out with CT. :). Thanks!



Hello Jessica,


No nothing happened.


Two remarks:


  • You can search for a list of the most frequent 5000 or 50000 words in your language and run this through Google Translate to get a starting glossary.
  • Glossaries will be of less importance in future because of DeepL etc. You can use their suggestions for terminology purposes, adding only your own preferred target terms to a (small) glossary to repair (overrule) MT suggestions with your own terms/to avoid repeating dumb errors ('Seals on the Beach' syndrome). See: 

image


Hans--brilliant points! Thanks!

Jessica, what language combinations do you offer? There are quite a few free/cheap glossaries available for a large number of languages, and especially specialised glossaries are very useful, if not indispensable.


My take on CafeTran's resources:


Resources, an Overview

TMX Files, an Approach

The Big Mama, an Approach


H.



1 person likes this

Woorden,


Thanks so much for those links. I do Spanish into English. I would love to get my hands on some of the glossaries you're talking about to help me benefit from CafeTran as quickly as possible. 

Jessica: I do Spanish into English


That should be an easy one. You can start with IATE, the EU terminology. It offers some 750,000 terms (general/legalese). You can download them for free, but then you'll have to create your resource yourself, which can take several hours/days (I know, I was stupid enough to reinvent the wheel). A Henk Sanderson offers IATE terminology for € 8.50, and you can tell him the format you prefer. I'd strongly suggest TMX (TM for Fragments). See https://www.tm-town.com/terminology-marketplace/en_es (I know that's en-es, but it's not even 6 AM here...).


I'll try to find other resources later.


H.

I attached a rather small termbase of frequent words (below the TMX file in the CT edit mode):


image


With the settings:


image


I stole it from a list, and adapted it.


image



Since I don't know any Spanish, please accept it "as is." The list of English terms is a lot more comprehensive, so I attached it as well. You can try to turn it into a more complete ES-EN termbase.


H.

tmx
xlsx

Woorden,


What a fantastic idea! I had seen the IATE glossary for sale, but it hadn't clicked that it would already be ready for CafeTran. I fumbled around for a few hours trying to create a glossary with the top 1000 words in Spanish and found that exercise tiring. I am feeling pretty clueless about how to use that glossary. I added it to the Glossary dashboard. Then, to see if it worked, I thought I'd open up a NEW PROJECT with just the Spanish words to see if I could auto-populate the translation with the glossary (perhaps this was stupid, but I'm just trying to get used to everything). Needless to say, it didn't work. 


I know am the proud owner of my very own Spa-Eng IATE terminology files. They come with instructions on how to get them into just about every program other than CafeTran, though. They do come with .csv files for CafeTran, though. And they are broken into 6 different fields to use them by industry, not all at once. I am not seeing a help video for what to do...


As for your TMX file (Thanks!). It looks like that would be a simple "IMPORT TMX" task on the dashboard, right? But, sounds like I would need to take a quick look at it to make sure the words were right. How would I do that? 


Basically, you are 1000x savvier than I am on any of this! Also, I did run across a post between you and Hans stating that the IATE terminology wouldn't be very helpful for you. Is it because it's really better for someone starting to build up the termbanks and TMs, rather than you with years of things already stored? 


Thanks so much for helping me work on this problem! I think that this could really make a difference for me, if only I could get these files into CafeTran!


Take care,

Jessica


Jessica: They do come with .csv files for CafeTran


That's why I suggested to ask for TMX files. CSV files don't fit in my workflow, so I'm afraid I can't help you anymore.


H.

I now have TMX files for the IATE that I am able to pull in as part of my Translation Memories. 


Is there anything wrong with this strategy?

No, just make sure that when opening the TM, you tick the Read-only checkbox.
Login to post a comment