Start a new topic

Segment pattern matching: "Less is more" or "More is less"?

Here, I’d like to share my petty know-how about how to construct segment patterns to get better results. Please note that this is NOT an explanation of the rules or specifications of the segment pattern matching feature.


For busy readers, here is a summary of my recommendations:


  1. If there is only one variable and it is expected to be replaced by only one or a few words, you should reconsider taking time to create a segment pattern. Fuzzy auto-correction can work in most such instances already. (To enable Fuzzy auto-correction, go to Translation > Options)
  2. Avoid creating too simple a segment pattern.
  3. The ratio of variables to fixed elements, as well as their respective locations in the pattern, matter much.


Points 2 and 3 are recommendations to avoid or reduce “risk”: the possibility of getting an unexpected, often useless and nonsense, result. You may welcome those results, though; but, note that when there is a segment pattern match, which is, by definition, a 100% match, auto-assembling does not work. So, taking this “risk” could mean getting a useless result at the expense of what would otherwise be generated by auto-assembling, which could be a much better one.



END of SUMMARY



==============



Going deeper...



1.   If there is only one variable and it is expected to be replaced by only one or a few words, you should reconsider taking time to create a segment pattern. Fuzzy auto-correction can work in most such instances.


No further comment may be needed on this point. Fuzzy auto-correction is designed to automatically replace very minor differences between the source segment you are working on and a high-rate fuzzy match.


This is an automatic process; so, in most such cases, you don’t have to spend part of your precious time on creating a segment pattern.



2.   Avoid creating too simple a segment pattern.


  • Example: {1} must be reported by {2}.


You may want to create this pattern when you are translating The test results must be reported by tomorrow. But this pattern could equally work for the following sentence:


  • The test results must be reported by March 20 unless otherwise instructed by your boss.


It could work, simply because the pattern matches. Here, March 20 unless otherwise instructed by your boss is taken as one that fills {2}, whether it is what you expect or not. In this case, if March 20 unless otherwise instructed by your boss is not found in any of your resources, CT nevertheless gives a 100% match, leaving the translation for {2} blank (The test results must be reported by.) and at the same time stopping auto-assembling.


To gain a better-quality 100% match, you need to add March 20 unless otherwise instructed by your boss as a single TU or glossary entry.


However, if it is an impossible choice for you as it is unnatural and thus is against your resource-building policy (if so, I agree with you), then you should expand the pattern as below:


  • {1} must be reported by {2} unless otherwise instructed by {3}.


Generally, the fewer variables are there in a pattern, the higher the “risk” is.


When creating a segment pattern, you should evaluate “effectiveness” versus “risk" based on your own analysis of documents you usually translate.



3.   The ratio of variables to fixed elements, as well as their respective locations in the pattern, matter much.


This is rather a restatement of point 2.


A segment pattern with many fixed elements is structurally more “rigid,” so it is more likely to produce a translation that is closer to your expectation.


In terms of the locations of variables, it would be preferable to avoid placing a variable at the beginning or at the end. A variable in this position is "too open.”


  • Example: {1} must be reported by {2}.


This pattern matches a sentence like Unless otherwise instructed by your boss, the test results must be reported by tomorrow.


Here, Unless otherwise instructed by your boss, the test results is taken as corresponding to {1}.


You can get a better result by making the pattern a little bit more rigid as below:


  • The {1} must be reported by {2}.


This pattern, of course, does not match Unless otherwise instructed by your boss, the test results must be reported by tomorrow.


The same rule of thumb would apply when a variable is placed at the end, as discussed above.


In order to reduce the “risk” and get a more favorable result, variables should preferably be placed between, enclosed by, fixed elements.



Cheers,
Masato


2 people like this idea

My momma always said, "Life was like a box of chocolates. You never know what you're gonna get." - Forest Gump (1994 film)

My tips:


  • Don't use pattern matching if you have no idea of what you're doing. It can and therefore will ruin your job
  • Like just about all CAT tool functionalities, this is a search-and-replace thing. Keep in mind that those functions are both source and target language dependent. Masato's mentioning of don't use a "simple" pattern is wrong. It can be very useful if you translate into a language with a different word order. In that case, fuzzy matching won't work
  • It's also very likely that the patterns are job dependent. Consider creating patterns for a particular job only. It's not much trouble, and you don't have to think about all possible varieties which is a pain anyway
  • Always save patterns in a dedicated "patterns resource" (or even in you ProjectTM) to keep things exchangeable
H.

1 person likes this

Thanks for your feedback.


"Pattern resource" for a particular "type" of project or document is really a good idea.


Peace,

 

For me there's always a friction between advancing in my current project and investing time in phrases that will be of use later.


I store everything in glossaries. When I have time, I use an editor to simply move all phrases that contain X to a new glossary. I sort them, replace numbers and nouns by placeholders and then I add the new entries to a special glossary.


This is possible with a TM too, but you'll have to open it in TMX mode.


1 person likes this
Hi,

Yes, there is always a conflict between specificity and generality. Time is limited, yes. Should I have more time, say, 50 hours a day, I would be able to tailor my resources for both the current project and future projects of a similar type.

Yet, I want to say that this conflict is the very essence of professionalism.

Cheers,
M

I happen to think that doing it right from starters will both yield better AA results and save time. Okay, it takes a minute (not much more) to fine-tune your TMs when you set up a project, and it takes a second or so to save any terms in the right TM, but it pays.


H.

Perhaps, I'm being too much engrossed in the technical aspects of CT and in thinking how to organize my resources to make the most of its features.


Igor always says "Enjoy translating!"


And I just recalled my favorite saying: "Perfectionism kills."


M

Hi all,


It is very interesting to see how useful this new feature is and how you apply it in your workflow, It really helps me tune it. Thanks!


Igor


1 person likes this

On a second thought about this topic, what I really wanted to say is that whether a segment pattern has "both ends open" or "both ends closed" really matters.


Thank you, Igor


M

Masato: And I just recalled my favorite saying: "Perfectionism kills."


Specialisation kills, but perfectionism?


H.

"Perfectionism" as I use it refers to seeking too much for perfect features and perfect resources that meet both your immediate and future needs perfectly. Being too hasty.


Though there may be a better word for it ...


Peace,

M

Perfectionism and haste don't go together very well.


H.

Right.


M.

Login to post a comment