Start a new topic

REQ: Glossary accepting more than one pipe in multi-word entries

 Hi,


it seems that CT glossary does not respect more than one pipe in multi-word entries.

Would it be possible to change it.

If I, for example, make an entry polis|a ubezpieczeniow\a, CT recognises only inflections of "polisa" and ignores the inflections of the second word.

As far as I know fragment memory does not accept pipes at all. So such development of glossary flexibilty would be a great asset in case of inflected source languages.


                                                             Regards


                                                              Wojciech


Wojciech: As far as I know fragment memory does not accept pipes at all. 


Actually, I think it does. But you won't need it because CafeTran will recognise (parts of) separate words of a Memory for Fragments entry anyway, unlike a tab del glossary.


H.

CafeTran can do it automatically with the Prefix matching function which works for glossaries too. In principle, it places a virtual/invisible pipe at each word in a longer phrase automatically. The length of the prefix is determined in Memory options. To activate it for glossaries, check this option in Edit > Preferences > Glossary tab > Prefix matching.

IK: CafeTran can do automatically with the Prefix matching function which works for glossaries too


Yes. By now we know you spent/wasted heaps of time and coding energy to try to attribute all TMX features to those tab dells on request of certain [moderated], and you didn't succeed.


However, Wojciech asked if that would work for multiple word entries, and since tab dels don't feature fuzziness, the answer should be NO. I think the answer could be YES for TMX resources, but I never tested it. And I'm not going to test it.


H.

 for multiple word entries


The answer is YES. The prefix matching works both for memories and glossaries.

The info to the prefix matching is missing here, it is only in some threads spread around here.

I can understand the pragmatic method to implement a setting first only in the TM and not in TM and TB at the same time, but maybe some time later it makes sense to separate these settings. Actually it means you need to readjust any TM you use with a TB – that are more often general and not client- or project-specific – only for the terms, doesn't it? And what happens when using two TMs with different settings?

However, it returns many false hits, but it does not help in this case, even not set on 90%.

 

Thank you tre, I have added this info as a Note to the CafeTran Espresso - Preferences reference document (Glossary options > Prefix matching), as well as in the TM options document.


Jean

Jean: I have added this info as a Note to the CafeTran Espresso - Preferences reference document (Glossary options > Prefix matching), as well as in the TM options document.


And according to his Igorness himself in his CafeTran Handbook 2012 (but the feature is likely older):


4. Prefix matching

When this option is selected, CafeTran will analyze the beginnings of words (here called prefixes) and discard any endings responsible for inflection of words. It is an option which increases significantly the number of hits for highly inflected languages. The length of prefix- es is set by a percentage number. The bigger the percent number the longer the prefix of words which the program will analyze. The minimal prefix length option (menu Edit | Op- tions | Memory | Minimal prefix length) lets you set the minimal allowed length of prefixes. The length can also be fixed, when the "fixed" option selected, instead of a set percentage length. It means that all the words will have the minimal prefix length, no matter their actual length.

http://www.cafetran.republika.pl/handbook.html Page 11 of 53

CafeTran Computer Aided Translation Software 17/12/11 12:28

5. Custom prefixes

If the inflection of a word is too high for automatic prefix matching you can enter your terms to the memory determining the prefix of a word manually. This is done by inserting the pipe character | at the end of a prefix in a word. For example, the Polish phrase "piękny dzień" (a beautiful day) has a highly inflected word "dzień" occuring in a number of various cases (dnia, dni, dniom). If you insert the pipe characters at the following positions - "pięk|ny d|zień", CafeTran will also recognize other forms of the phrase (pięknego dnia, pięknych dni etc.). Note that inserting the pipe character at the first word in the phrase - "pięk|ny" is op-tional since its inflection is quite regular and CafeTran should recognize its prefix automatically .


The above goes for TMX memories, of course. Not for tab dels.


H.


Thank you, Hans.


I wasn't around when the handbook was online, do you have it? It might be useful to check if anything else can be used.


The information on "Prefix matching" was already present in the reference document on TM Options, I had copied that part from the old Wiki, now I see its original source was the Handbook.


I've now added the note on Custom prefixes for TMs.

 

Cheers!


Jean

 

Jean: I wasn't around when the handbook was online, do you have it


Yep, and 2010 and last official. Attached.


Suggestion: Your "Documents" are very complete. Great. However, that also means that it'd take quite some time to read. Could you use another font (size, type) or pop-ups or something to make less relevant things (less relevant for beginners) to stand out? I'm thinking of file types and their conversion and things, things you can skip at first reading?


H.

pdf
pdf
pdf
(240 KB)
Hi,

I experimented with multiple word entries both for glossary and fragement memory .
It still seems to me that CT cannot cope with more than one manually inserted pipe. But automatic prefix matching, with settings low, works even for highly-inflected phrases as far as glossaries are concerned. I tried with fragment memory (manually created via ALT + M) but to no avail. I guess that prefix matching applies only to subsegments , but not to fragments. If I'm wrong, please correct me.

P.S. Every time I tried to modify an entry in the New term window (through yellow link), CT deleted it altogether as if I used Delete button instead of OK. Is it intended or a bug?

                                                                  Regards

                                                                    Wojciech

 

1. Make sure that your memory has the Prefix matching on in its Options. It works the same way as with the glossaries.

2. It does not delete the term but the edited term might not be matched to the terms in your project anymore.


Many thanks, Woorden. I will study the Handbooks at the next Documents update.


The current documents are mostly meant as reference documents both for beginners and more experienced users, they are not specifically beginner guides, I would have used a different structure if that was the case, and maybe will in some "usage documents" I have in mind.


Every (sub)section is clearly defined (allowing the user to decide if it is useful to them), and some additional information (that may be skipped by beginners) is labelled "Note" or "Suggestion".


Do you have a specific document in mind (File formats, maybe?) or can you provide a concrete example of content that could be displayed differently, so that I can better understand your suggestion?


Jean

Login to post a comment