Hi Igor,
Two more questions about this feature. I want to provide my website visitors with an accurate knowledge so that they can know what they are really doing with this feature.
1. Subsegment minimal length difference (%)
For example,
ABCDE (5-character word)
WXYZ (4-character word)
In this case, "subsegment length difference" is 20% (?). If so, lowering the minimal length difference to zero, for example, means finding words of exactly the same length, too. Am I correct?
2. Statistical approach
Does CT make subsegment guesses like below?
S: CafeTran is a CAT tool.
T: ABCDDDKATtoorrr.
S: Trados is one of popular CAT tools.
T: FJEJDDDKATtrrr.
So, when there is a sufficient number of TUs containing "CAT" in the source and "KAT" in the target that can be compared against one another, CT strikes out differentials and picks up a common (frequently appearing) string of characters as a probable subsegment.
Am I correct?
Cheers
Masato
Hi Masato,
1. An example with minimal length difference set at 80%
ABCDEFGHIJ (10-character source word)
CafeTran accepts the following target length possibilities:
ABCDEFGH = 80% (minimal length difference)
ABCDEFGHI = 90%
ABCDEFGHIJ = 100%
ABCDEFGHIJK = 90%
ABCDEFGHIJKL = 80% (minimal length difference)
2. Yes, your description is correct.
Igor
Cheers
Masato
kamonchanok.k15
I'm playing with CT setting again. After reading the wiki, I still do not understand the following.
How do they relate to each other, in terms of correct guessing and AA?
If I set Subsegment to Auto threshold to 0, does this mean that CT will never use subsegment in AA?
How does increasing Subsegment to Virtual threshold increase the probability of CT guessing correctly?
More importantly, how can I delete CT's wrong guess of subsegment translation from its memory? There are a lot of them now.
I'm currently using the default setting.
Thank you in advance!
Kwang