Start a new topic

Controlling glossary matching

I have a question about what can be called "partial" glossary matches.


Partial match Type 1

When a glossary contains the term "all," then "all" as part of the term "usually" is highlighted.


Partial match Type 2

When a glossary contains the term "merchandise item;merchandise items" (two alternative entries), then, for a source sentence like "a number of merchandise items," both entries (singular and plural) are separately displayed on the glossary pane.


My question here is, is it possible to control this (e.g., elect not to display these partial matches)? It seems prefix matching has something to do with this, but I'm not sure.


Masato


Hello Hans,


Sure, CT could hook the Lucene engine directly but it might mean changing the emphasis from the user's control of terms for auto-assembling purposes to fuzziness that Lucene engine provides. By searching glossaries via TM-Town, you can have both type of searches available (CafeTran's and TM-Town's search engines) without any conflicts.


Igor

Hello Kwang,


Please see Kevin's reply here: https://cafetran.freshdesk.com/discussions/topics/6000006272


Igor

Hello Igor, What is your reason not to add this tokenizer to CT directly? I can see disadvantages to uploading glossaries to the cloud. Hans

Hi Igor,


What about the semi-colon characters and pipe characters in the glossary (both at the source and target sides)?

If I upload my glossaries to TM-Town, apart from the Lucene thing, will it work the same way as CT (e.g. giving/displaying matches, auto-assembling, regex.. etc.)?


Kwang


1 person likes this

Hello Hans,


Lucene search engine implementation is a feature provided by TM-Town and CafeTran makes use of it via the available APIs the same way it uses the available APIs provided by Google Translate and Microsoft Translator MT services. CafeTran's path of development is independent but when there are available APIs which enhance functionality of the program, it is naturally enjoyable and practical to implement the connection between the tools. TM-Town gives translators very nice, cloud-based search and management tools for TMs and glossaries. CafeTran has always given choice for translators. It does not force you to use Google Translate, MS Translator or TM-Town services. You can use them in your work flow only if you find them useful.


Igor 

Hi Igor,


Does this mean that you'll be providing new features primarily via the TM-Town platform, requiring an on line connection? What is your path of development?


Hans

CafeTran User

You're welcome! By the way, for those users who wish to try out another matching algorithm (Lucene engine based), which can find all forms of a word, CafeTran offers the integration with TM-Town web service. After uploading your glossary there, CT will show you the matches automatically in its interface.


Igor


2 people like this

Dear Igor


Thanks for your clarification.


I'm very happy now!


Masato


It is not recognized unless you have it in the glossary either as a regular expression such as |device.* (which will recognize both forms) or as the plural form.


Igor

I'm beginning to understand.


In this case, "device" is an exact match, so "devices," which appears later in the same segment, is highlighted, too.


This means that when only "devices" appears, it is not recognized as a match for "device"?


Masato

Matches are only for the whole words as expected but highlights in the source segment can show those matches as parts of longer words as well.


Igor

Thanks, Igor


So, I should be expecting matches for a certain string of characters, rather than for a whole word in its ordinary sense.


Masato

The match is actually only for the "device" but CT highlights all the occurrences of the match in the current source segment including in the plural form.


Igor

Here is an example.



"device" in "devices" is recogized.


This glossary is not a regex one. No other resources are used.


Actually, I don't mind if this type of match occurs; but, what's really annoying me is that this occurs only in some cases (maybe, only for some terms). So, I want to know what makes it happen.


Peace,

Masato

Hello Masato,


Can you provide a screenshot of such a partial match? If you source language has a word separator and your tab delimited glossary is not set as a regular expressions glossary, you should see only exact matches for the source terms. The partial matches are controlled via regular expressions. If you have source side synonyms in your glossary, the one appearing in the source segment should be displayed.


Igor

Login to post a comment