Using Tagging Glossaries to save time/avoid dull tag insertion work
alwayslockyourbike
started a topic
over 2 years ago
I define Tagging Glossaries as bilingual word lists that can be used to match the tagging of target terms to that of the corresponding source terms.
Example:
Source:
An American cowboy rides a 1horse2 and not a 3bike4.
Target:
Een boerenzoon uit Holland rijdt vaker op een 3fiets4 dan op een 1paard2.
Content of the tagging glossary
horse\tpaard
bike\tfiets
Since this will be a major automation project, good planning is inevitable.
I see two different approaches, both with their pros and cons:
Read all words of the source segment and match these with the source terms of the tagging glossary.
Read all source terms of the tagging glossary and match them with the source segment.
Pros:
For 1: In average, there will be less words in the source segment than there are source terms in the glossary. So this approach will be faster. However, is this relevant, in the context of the immense speed that modern computers have?
For 2: Multi-word source terms can be handled.
Cons:
For 1: Multi-word source terms cannot be handled straight away.
For 2: For a larger glossary, a lot of comparing has to be done. However, is this relevant, in the context of the immense speed that modern computers have? And, how many terms will be present in an average Tagging Glossary? I think that it is save to assume that this kind of glossaries will be used for specific types of projects: documents with a lot of indexed terms, software translations with a lot of UI components that have to be set in italics, bold etc.
alwayslockyourbike
I define Tagging Glossaries as bilingual word lists that can be used to match the tagging of target terms to that of the corresponding source terms.
Example:
Source:
An American cowboy rides a 1horse2 and not a 3bike4.
Target:
Een boerenzoon uit Holland rijdt vaker op een 3fiets4 dan op een 1paard2.
Content of the tagging glossary
Since this will be a major automation project, good planning is inevitable.
I see two different approaches, both with their pros and cons: