Cool, thanks!
It does sort of break my heart though to see we are now using two completely arbitrary (and misleading) names for them, based on HvdB's decision to start calling them "Termbases". Anyone from the outside world will look in, and think: "Termbases"? "Glossaries"? What? How does that relate to .tmx versus tab-delimted?
TMX termbases vs TXT termbases ... Makes sense.
Glossaries vs TMs for Terms ... Make sense.
even
Glossaries vs Memories for Terms ... Make sense.
but
Glossaries vs Termbases ... Doesn't make sense.
Since when is a "termbase" related to storing terminology in a TMX file?
Dear Michael,
Let me start my response by thanking you wholeheartedly for taking interest in my posting.
Thanks again!
Arbitrary, yes. But misleading, no.
I think that in many CAT tools the 'device' for storing terms is called a termbase. And nearly always, a termbase will allow fuzzy matching.
That's with CafeTran too.
Why not use the word termbase for a fuzzy term matching device? The fact that they are organised internally as TMX files is only relevant on a second level (once you start editing them for instance).
And use the word glossary for those two column things, the digital equivalent of the folded paper with words that you had to learn on high school.
Any naming is arbitrary.
We could create new names like: exact word lists <> fuzzy word lists. But Glossaries and Termbases are fine words. Even when a Termbase isn't an indexed database filled with term pairs. I don't think that's relevant here.
It's also possible to introduce fancy names like 'worter list' (american for Wörterliste):
http://home.wtnet.de/~hekrueger/sprachen/index-fr.htm
But please take this seriously.
You said: "I think that in many CAT tools the 'device' for storing terms is called a termbase. [true!] And nearly always, a termbase will allow fuzzy matching. [really?]"
A very tenuous link (and reason to call the Memories for Terms in CT "Termbases"), if you ask me.
However, if you guys want to use it, who am I to stop you? I'm just telling you (as a native English speaker, and terminologist) what I think. It sounds terribly confusing to me, and I dare say will confuse many newbies.
Fine, use it if you want. I'm just telling you (as a native English speaker, and terminologist) what I think. It sounds terribly confusing to me, and I dare say will confuse many newbies.
In my view, the way to distinguish Glossaries from Memories for Terms is the format (.txt vs .tmx), not a single feature of one of them.
And now time for a story:
Imagine that you have met an alien on Mars, and you are trying to distinguish earth men from earth women. It would be sort of strange if you based there two different names on the fact that the man likes to eat pizza (so you call him a "Pizza Eater"), whereas the woman has no driving licence (so you call her a "No Driving Licence Holder"). Far better to use the fact that they differ sexually in your naming scheme.
When creating names, there a several ways to approach this/things to consider.
Thinks to consider: users' expectations/knowledge.
Termbases (term bases) in memoQ, Studio etc. are rather complex structured things that allow fuzzy matching of terms during the translation process. Database files filled with terms. One file (or sometimes more, when the index is internal), and you are not very likely going to modify this file in an text editor.
All this applies to CafeTran's incarnation of termbases too (note that this term is already used in CafeTran).
Personally I associate TMX with TMs. For my a TM is a concept. I once knew that TM means Translation Memory, but now I only talk about TMs, things to store segments in. TMs are related to the Grid. What I see there, will be stored in a TM. And yes: the M is related to Memory, just fine. SM is already taken, so Segment Memory wasn't available. Let's just forget about the whole 'memory' concept.
TMs for segments. For Grid content.
Memories for terms? Mwah, why link a memory to a term list?
Termbases. Everyone knows what they are.
And then: Glossaries, those old-school two-column word lists.
Memory for segments
Memory for terms
Please not.
Glossary <> TM <> Termbase
The underlying technology isn't so relevant to the user. Okay, CafeTran uses TMX for TMs. Fine. Other tools use tab-delimited files for TMs. A Termbase is TMX too? Who cares? It's just relevant that they can be used for fuzzy matching.
Must be used for fuzzy matching. Because that's their primary reason of existence. Let's face it: if you are not going to use a Termbase for fuzzy matching, why bother for the complex file format with all the overhead. Better use small tab-delimited files.
Tab-dels for glossaries? Yes they are tab-del, but why go for a name that relates to the technology behind the surface? Glossary is a beautiful word and everyone knows what it refers to. It's also known by people who don't know what a termbase is.
Creating new word that perfectly cover the use and technical background of these two term resources? Why bother.
>In my view, the way to distinguish Glossaries from Memories for Terms is the format (.txt vs .tmx), not a single feature of one of them.
I don't agree anymore. I don't think that users should be bothered with the 'technology behind the surface' (tab-del vs. TMX). It's fuzzy or non-fuzzy that's relevant here. And yes that's 'one single feature', but as far as I'm concerned, it's the crucial one.
Exact term matches or 'limited-variation' matches: glossaries and regular expression glossaries
Fuzzy term matches: termbases
I think that the usage should be relevant, not the technical implementation (TMX <> Tab-del).
Of course the usage isn't represented a word like 'termbase', one could argue that a 'glossary' is more clear to everyone.
Why not just coin 'termbase' as a name for 'any other terminology holder than the glossary'?
I think that the argument 'I'm a native speaker' (we all are, each and everyone of us in her own language) and 'I'm a terminologist' (aren't we all constantly coining words?) are strong arguments. They aren't related to the expectations of new users, some of them coming from other CAT tools.
Fine, if you think the main difference between Glossaries and Termbases is the fuzzy/no fuzzy matching. However, if you want to stress exactly this feature (namely fuzzy matching), then I'm afraid Termbase is the wrong word. When I think of other CAT tools, and I think of the word Termbase, I definitely don't think: Oh yeah, they obviously all offer fuzzy matching. They are called termbases because its where you store your terms. Some CAT tool termbases now offer fuzzy matching, most didn't to start with, but added it later. Fuzzy matching is definitely not the defining characteristic of termbases in other CAT tools.
Incidentally, fuzziness isn’t the only difference. Another important difference is: synonyms. TMXs don't allow for synonyms, tab-del txt files do. In my opinion just as, if not more important than the fuzzy/non-fuzzy difference.
And yet, this is also just another feature (of many). Better to look for a more general differentiating characteristic: the file format.
However, as I said, go ahead and use the word "Termbase". I'm just not going to. ;)
I mean, I will, but every time some puzzled person asks me what these two terms mean in CafeTran (which they will, I am sure), I'm going to have to explain this whole mess to them, and add that I think that the terms are ill chosen.
>You said: "I think that in many CAT tools the 'device' for storing terms is called a termbase. [true!] And nearly always, a termbase will allow fuzzy matching. [really?]"
Er, yeah, I think so? Studio's MultiTerm: fuzzy matching, Transit's TermStar: fuzzy term matching, Swordfish allows fuzzy searching in the database containing terms, memoQ's termbases: fuzzy term matching.
>When I think of other CAT tools, and I think of the word Termbase, I definitely don't think: Oh yeah, they obviously all offer fuzzy matching.
There's indeed no need to make this fuzzy matching feature of termbases active in your mind whenever you say/read the word 'termbase'. They are fuzzy, but no need to be bothered with that characteristic every time.
>And yet, this is also just another feature (of many). Better to look for a more general differentiating characteristic: the file format.
That's a fundamental discussion: name things based on the technology behind it or name things based on the usage or name things based on expectations of users etc. etc.
New things are often named based on the technology behind them. And often in the language used by the inventors. When things become familiar, the wish to refer to the technical background vanishes whereas the wish to emphasise the usage increases. And this differs from language to language (French and German will try to coin their own for words, more likely than e.g. Dutch: we take everything from anywhere ;)).
A TM was once called a Translation Memory, because at the time of introduction it referred to a 'neurales Netz' (Transit 2) and the metaphor of the human mind was a nice means to explain the way these things work. Nowadays everyone (96.5% of the users!) calls them TMs.
Let us just call fuzzy term thingees 'termbases'.
WFP has fuzziness in glossaries. Just fine. Based on the technical background.
Thing is, that in CafeTran there are two completely different ways to store terms. And fuzziness is assigned to one of them by design.
That distinction matches. As in: a distinction has to be made. And represented in different names, hopefully simple names.
This is how I think most newbies will interpret the terms "Glossary" and "Termbase".
Either:
(1) they will think they are just synonyms (which they actually are), but then they will realise that we are using them for two different things (arbitrarily), and then they will get confused. Not good.
Or:
(2) they will think that maybe the "Termbases" have more fields and can hold more complex information than the possibly simpler "Glossaries". Not true.
Fuzziness has absolutely nothing to do with the term "Termbase".
But fine, use it. However, I'm afraid I am going to have to be like Mr Woorden on this point and refuse to use the term out of mild disgust ;) To me personally, "glossary" and "termbase" are 100% synonymous, and should not be used to pretend otherwise.
Igor in his 2010 Handbook: Check in "For segments" field if it is the segments memory or/and "For terms" field if you create a terminology base.
He called "glossaries" glossaries, and "termbases" terminology bases. I think "termbase" is a suitable abbreviation for "terminology base."
Michael: TMXs don't allow for synonyms
Yes they do, though not in one entry.
...they will think that maybe the "Termbases" have more fields and can hold more complex information than the possibly simpler "Glossaries"
Not necessarily more "complex" information, but "structured" information.
H.
Hans CafeTran Wiki
I have updated my article about termbases.
1 person has this question