Start a new topic

Renewed request for "subsegment pattern"

As a very powerful extension of the "segment pattern" feature, I would like to request a "subsegment pattern" matching feature again.


As segment patterns begin with ?, here I want to describe subsegment patterns beginning with ??.


For example, you have the following entries (and their translations) in your resources:

??On September {1}
??{1} released {2}
Apple
??a device called {1}
iPad Pro

You have this source sentence:

On September 10, Apple released a device called iPad Pro.


My request is that CT be able to auto-assemble the following result.

On September {1=10}, {1=Apple} released {2=a device called {1=iPad Pro}}.




1 person likes this idea

I'm satisfied very much with the current version of CT, except for very minor ones (that do not fit my personal preferences).

I now feel that this feature is gonna be the last major request from me.

I once read somewhere on this forum a posting by Igor that his goal is an AI translation engine based on the user's resources. I really respect him.

To the best of my knowledge, a very important step toward this goal is this "subsegment pattern matching" or multi-layered (multi-step reflexive) machine translation. And it seems that the latest MT technology is shifting from word-based to phrase-based.

 

image


As you can see, the nesting isn’t multi-level.

It can be made multi-level too. As the feature has the dynamic character, probably the nested patterns need to be matched earlier to be appear in the top level patterns. To be added in a future update.


1 person likes this

Perhaps it's also possible to introduce a means to limit a wildcard (variable) to a few values, like 'die' and 'der' (, 'das'):


image


Of course I could write:

?blinkt {1} rot

or:

?blinkt {1} {2} rot

to catch the variable Artikel (der, die, das)

or even:

?blinkt {1} {2} {3}

But I'm afraid that this much of vagueness will be counterproductive in the end.


So perhaps it's possible to come up with something like:


?blinkt {der;die;das} {2} rot

Or perhaps better:


?blinkt {der;die;das} {2} {rot;grün;gelb;blau}

How about using the wildcard character * to catch German articles:


blinkt d* {1} or blinkt d* {1} {2}


1 person likes this

That works indeed as advertised. Thank you!


image


docx
(12 KB)
txt
(318 Bytes)

You have to be careful:


If you define:

?d* {1} ist wieder betriebsbereit ?de {1} is weer klaar voor gebruik


the 'der' in 'wieder' will prevent recognition of 'ist wieder betriebsbereit':


image


Is there a way to prevent this?



IMPORTANT: If you want to define Term Patterns that contain a comma, like:


?Warten Sie, bis {1} wieder still steht

?Stellen Sie {1} standsicher auf einer ebenen, waagerechten Oberfläche


you'll have to remove the comma from the Do Not Match box (Edit > Prefs > Memory tab).


At this moment I cannot predict what the consequences of this removal will be ;).




One consequence is that when your TP ends with a comma, it won't be recognised.


Luckily, this can be solved by adding the comma here:


image


Hey Masato,


I think that with CafeTran Espresso 10.8.1 you've got what you were asking for:


image


> I think that with CafeTran Espresso 10.8.1 you've got what you were asking for:


Yes! Thank you, Igor.


masato

Login to post a comment