Start a new topic

REQ: Fragment vs. Subsegment

Both "fragment" and "subsegment" are used now, and I think they have the same meaning.

As I'm writing a quick start guide for newbies, I'm wondering which is clearer to them. This question arises especially when "fragment" or "subsegment" is talked about in relation to a "segment" (like "segments memory" and "fragments memory").

There may be some newbies who don't understand that "segment" is different from "fragment" (unless somebody teaches them) because both simply mean a part of something according to the dictionary.

On the other hand, the combination of "segment" and "subsegment" can indicate a clear hierarchy (document > segment > subsegment).

I personally vote for "subsegment" (which is more frequently used in CT). But anyway, if "subsegment" and "fragment" have the same meaning, please consider abandoning either of them for consistency.


My small brain is confused. If I want/prefer to create a TMX glossary instead of TAB one, should I use Segments memory or Fragments memory? I would bet on Fragments, but things are getting very complex here ;-)

Alain: ... I would bet on Fragments

So would I.

...but things are getting very complex here

That's because Igor is in the process of simplifying things, mainly by adding features.


Thank you woorden, I will stick to Fragments and see...

Since Igor set us a good example by replacing "pre-translation" by "preliminary memory matching," I'm also trying to take a fresh look at the two types of memories, not in terms of how they have been traditionally called or how they are called by other CAT tools, but in terms of how they work and what they do for the user.

The whole of the current segment is added to a certain memory every time you go to another segment. On the other hand, you can add anything (not necessarily fragments of the current segment) to another type of memory.


For the sake of consistency & easiness to remember, I would recommend:

segment = what the CAT tool segments your text into (one src and one trgt segment together form a TU, or translation unit)

sub-segment = a piece (or fragment) of a segment

I think segment + sub-segment is easier to remember and make sense of than segment + fragment.

Hi Michael,

>> segment = what the CAT tool segments your text into (one src and one trgt segment together form a TU, or translation unit)

What a coincidence with part of my writing under way:

>>> A translation memory is a file that stores your translations together with their corresponding source segments. A "segment" means a part of a document that you translate separately from the rest of the document. A document is divided into segments according to certain rules ("segmentation") when imported into CT, and you usually work on each of these segments one after another until the end of the document.
The term "segment" is often used in combination with another term, such as "source/target segment."
A translation memory contains one or more translation units, which each consists of a source segment and its corresponding target segment.




There is a slight difference between sub-segments and fragments. When you already have or add a fragment to a translation memory, there is an existing reference (a separate unit) between the source and target fragment. For example, you can add to the memory the following pair:

me gusta = I like

then you have a real unit which, when found in the segment "me gusta ir al cine", is called a fragment.

In contrast, when CafeTran extracts a piece of text from the segment making a 'guess' what its translation is, such a virtual and non-existing fragment (there is no separately-added unit in the TM) is called a sub-segment.


Can I understand that, if I dare to use the term "subsegment"

Fragment = Subsegment-level exact match
Subsegment = Subsegment-level fuzzy match


> Fragment = Subsegment-level exact match

Yes. That is correct.

> Subsegment = Subsegment-level fuzzy match

Well, fuzziness usually means that there is no exactness in the translation, only an approximation. Whereas subsegments are usually 'guessed' correctly (no fuzziness at all) so I would rather call them:

Subsegment = Subsegment-level virtual match

Login to post a comment