Start a new topic

Renewed request for "subsegment pattern"

As a very powerful extension of the "segment pattern" feature, I would like to request a "subsegment pattern" matching feature again.

As segment patterns begin with ?, here I want to describe subsegment patterns beginning with ??.

For example, you have the following entries (and their translations) in your resources:

??On September {1}
??{1} released {2}
??a device called {1}
iPad Pro

You have this source sentence:

On September 10, Apple released a device called iPad Pro.

My request is that CT be able to auto-assemble the following result.

On September {1=10}, {1=Apple} released {2=a device called {1=iPad Pro}}.

1 person likes this idea

I'm satisfied very much with the current version of CT, except for very minor ones (that do not fit my personal preferences).

I now feel that this feature is gonna be the last major request from me.

I once read somewhere on this forum a posting by Igor that his goal is an AI translation engine based on the user's resources. I really respect him.

To the best of my knowledge, a very important step toward this goal is this "subsegment pattern matching" or multi-layered (multi-step reflexive) machine translation. And it seems that the latest MT technology is shifting from word-based to phrase-based.



As you can see, the nesting isn’t multi-level.

It can be made multi-level too. As the feature has the dynamic character, probably the nested patterns need to be matched earlier to be appear in the top level patterns. To be added in a future update.

1 person likes this

Perhaps it's also possible to introduce a means to limit a wildcard (variable) to a few values, like 'die' and 'der' (, 'das'):


Of course I could write:

?blinkt {1} rot


?blinkt {1} {2} rot

to catch the variable Artikel (der, die, das)

or even:

?blinkt {1} {2} {3}

But I'm afraid that this much of vagueness will be counterproductive in the end.

So perhaps it's possible to come up with something like:

?blinkt {der;die;das} {2} rot

Or perhaps better:

?blinkt {der;die;das} {2} {rot;grün;gelb;blau}

How about using the wildcard character * to catch German articles:

blinkt d* {1} or blinkt d* {1} {2}

1 person likes this

That works indeed as advertised. Thank you!


(12 KB)
(318 Bytes)

You have to be careful:

If you define:

?d* {1} ist wieder betriebsbereit ?de {1} is weer klaar voor gebruik

the 'der' in 'wieder' will prevent recognition of 'ist wieder betriebsbereit':


Is there a way to prevent this?

IMPORTANT: If you want to define Term Patterns that contain a comma, like:

?Warten Sie, bis {1} wieder still steht

?Stellen Sie {1} standsicher auf einer ebenen, waagerechten Oberfläche

you'll have to remove the comma from the Do Not Match box (Edit > Prefs > Memory tab).

At this moment I cannot predict what the consequences of this removal will be ;).

One consequence is that when your TP ends with a comma, it won't be recognised.

Luckily, this can be solved by adding the comma here:


Hey Masato,

I think that with CafeTran Espresso 10.8.1 you've got what you were asking for:


> I think that with CafeTran Espresso 10.8.1 you've got what you were asking for:

Yes! Thank you, Igor.


Login to post a comment