Sentence patterns coming into CafeTran

Hi,


The next update of CafeTran will see a new enhancement to auto-assembling called "Sentence patterns". It will allow translators to create translation memory segments with variables in the following form:


All the leaves are {1} and the sky is {2}. = Todas las hojas son color {1} y el cielo es {2}.


Then CafeTran will be able to use terms in glossaries or fragments in translation memories to replace the variables with the found entry, creating a complete translation.


The feature will also let the user set the default exact match for the pattern such as:


All the leaves are {1=brown} and the sky is {2=gray}. = Todas las hojas son color {1=caféy el cielo es {2=gris}.


The order of variables is not fixed so the function may be really useful in auto-assembling when the translation of a sentence has the variable lexical elements in a completely different order.


I call this new improvement "Sentence patterns" as suggested by a user but it would be interesting to know an alternative (or perhaps a standard) term for it.


Igor


4 people like this

Instead of the "100%" pattern match indication on the AA panel that looks somewhat colorless, and is sometimes confusing at least for me, how about simply displaying "Pattern Match"?

 

How about 'placeholder framing'?

One more potential problem for "term patterns": What happens if two (let alone more) of those patterns occur in the same segment? Not unlikely.


On September 17, 2015, Igor introduced Cloze Segment Matching, and on September 20, 2015, he added Cloze Term Matching


On September 17, 2015, Igor introduced Cloze Matching, and colourless green ideas slept furiously ever after


September {1}

{1} ideas


H.

Igor: What you describe is more like a "term patterns" feature than "segment patterns". This is an interesting extension to the current functionality. I am going to look into it.


I think that's even more dangerous than "segment patterns," comparable to regex matches.


Take for instance:


On September 17, 2015, Igor introduced Cloze Matching

In September 2015, Igor introduced Cloze Matching

September 2015: Igor introduced Cloze Matching

September 17, 2015: Igor introduced Cloze Matching


Op 17 september 2015 introduceerde Igor Cloze Matching

In september 2015 introduceerde Igor Cloze Matching

September 2015: Igor introduceert Cloze Matching

17 september 2015: Igor introduceert Cloze Matching



If you leave out the preposition, you're dead meat. "September {1}" won't do. Capitalisation won't do either. Not in all "cases" anyway.


The very last Remark/Opmerking in the Wiki also points to a problem. Well spotted! A 100% match can be wrong. The good thing is, that CT doesn't "jump over" those matches if "Jump Over... | Exact Memory Matches is enabled.


That said, Cloze Matching can be a huge time saver, if applied correctly. It's probably subject, or even document/job specific.


His Nastee Olde Fartness,


H



One of my fantasies...


There is "Refine AA" button in the segment toolbar.


Every time you move to the next segment, the current segment pattern matching feature is activated, while at the same time (or in the background), doing the same thing for partial elements (such as On September {1}).


Then, you click "Refine AA" button.


CT now auto-assembles only exact glossary entries and TM fragments gained from the default segment pattern matching into one complete whole.


Hi Igor,

Actually, I don't know much about programming, so this suggestion from me might be a "mission impossible," which I've never seen fulfilled in any of the translation-related suites I've tried so far.

Thank you for attending to me!

Cheers,
Masato

Hi Masato,


What you describe is more like a "term patterns" feature than "segment patterns". This is an interesting extension to the current functionality. I am going to look into it.


Igor 

Will "multi-step (or reflexive) auto-fill and auto-assembling" be possible?

For example, when you have the following glossary entries, TUs or whatever, separately in your resources:

On September {1}
{1} released {2}
Apple
a device called {1}
iPad Pro

You have this source sentence:

On September 10, Apple released a device called iPad Pro.

Can this new feature generate the following result automatically?

On September {1=10}, {1=Apple} released {2=a device called {1=iPad Pro}}.


Peace,
Masato
Candidate names, considering the fact that English/Japanese translation packages call a similar feature as "sentence pattern matching" or "hole filling feature":

1. Auto-fill
2. Pattern matching
3. Trans-modeling (meaning translation based on sentence models)

4. Missing-link finder

5. Hole filler


Peace,

Masato

My suggestion for a name is "Puzzle match".

Good luck.


> It means that results from Google and other online engines will not be taken in, unlike the current feature of improving AA with them?


No, they are no taken in yet. It will be the first release with this auto-fill concept. I may add the external Machine Translation filling later.


Igor

Hi Igor,

It means that results from Google and other online engines will not be taken in, unlike the current feature of improving AA with them?

Maybe I can find out more about this new feature when it is available.

Peace,
Masato

 

Hi Masato,


You can create a pattern as long as you wish, with as many variables as you like, but variable numbers cannot be repeated in a pattern, as you described it. Also, CT treats the whole segment as a match to defined patterns.


CT uses the results of its own MT translation engine processing - local (terms, fragment ans subsegments) to auto-fill.


Igor

Hi, Igor


I'm really surprised that you made it so quickly!


Now, two questions.


(1) Can CT automatically combine "two or more" sentence patterns?

For example:

If you don't like it (= if {1} don't like {2})

+

fxxk it off (=fxxk {1} off)


(2) Can CT use machine translation to fill a missing word (not found in any local resource) in a sentence pattern?



Thanking you always,

Masato

 

MBr: the new feature is much easier to use than regular expressions


Regexes are for the pros, but I suppose the new feature will present the same problem: Unexchangeability. I'm still cleaning old DV mdbs/tdbs for {n}. So I think you should only use them in a termbase or glossary (if you don't want fuzziness and other goodies) exclusively for Sentence Patterns.


With the new feature, AA might (!) produce many more usable results for me


If AA (either inserted in the target language pane or not) does not produce usable results,

  • Your source language isn't suitable, e.g. a highly agglutinative language (not your case, I'd say)
  • The words/phrases simply are not in your resources
  • You use the wrong settings. (An overview of CT resources)
  • The subject matter isn't suitable (highly creative texts - for which I still use CT and AA), not your case either. Legal texts are perfect for AA


In all other cases, you should benefit from AA. To the point you don't have to look up a single word/phrase.


H.



Login to post a comment