Start a new topic

REQ: QA: Repeated words (not double words)

It might be a good idea to have a QA that finds any repeated words within one segment (or optionally all x words, even across segments). This might be good to find the typical fill words (okay, isn't there a blacklist feature?), but also things like this:

This means you do not have to insert it itself insert.

You can also see this document also in the other folder.

Although I do not agree, I give my consent, although this is not the best solution.

This is the same feature Papyrus has, a German tool for Writers that has an excellent Grammar and Style QA (well, only for DE), including this feature here.  


  • better style
  • good solution for marketing and high profile texts


  • not suitable for some texts, e.g. manuals, that do not care too much about style or where repetition is preferred to a risk of misunderstanding, so it should be optional (= unclicked by default, if part of the QA menu)
  • needs exclude lists (really?) with activation/deactivation feature


CT's QA is handy for a quick comparison between source and target (ponctuation, numbers, empty, same, etc.). But this kind of QA (finding repetitions or even spellcheck) is solely a target QA. For this, as you said, there is specific software (for French and English, there is the excellent Antidote from Druide). I don't think "target only QA" it's something that CT should worry about.

1 person likes this

Julie: I don't think "target only QA" it's something that CT should worry about.

I agree. However, I do think it's CAT tool/MT related, the mistakes I mean, and they occur more often than you can imagine. Maybe something for the upcoming modular version of CT.


It’s a nice feature Torsten, though difficult to implement. Especially useful when using MT, since this duplication is a characteristic of them. Spellcheck is definitely a feature that has been integrated for very good reasons. Even more: also for the source it would be useful. Eg for idml

It should be possible to create a macro that matches every word of a segment with all other words in the same segment. When a certain word is found several times in a segment, a warning can be displayed. Even one as elegant as a sticky note which indicates all occurrences of the word in bold. One could link this macro to Go to next, to have the check executed every time you proceed to a new segment.

To check the whole project at once, would require to have a dump of the target column as a plain text file. Then a macro would cycle through all lines and check for multiple occurrences of a word. This is basically the same as above, except for the file reading and cycling through the lines.


As a modest contribution, here is a terminal line that will do the following things with the clipboard content (target segment):

  1. Convert to lowercase
  2. Remove punctuation
  3. Replace spaces with new lines
  4. Search for duplicate words (lines)
  5. Display them in the terminal.
pbpaste > target && cat target | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' > target2 && sort target2 | perl -pe 's/\s+/\n/g' > target && sort target | uniq -i -c | grep -v '^ *1 ' | sort -f -nr

 From there, we can do what we want (copy the results in Stickies or whatever).


Tested on my Mac, no problem untill now.



2 people like this

Alain: As a modest contribution

I haven't tried it yet, but it looks okay, and it has the enormous advantage that you don't need Keyboard Maestro for it, not even a Mac. Even Windows and Linux users can benefit from it.


Woorden: you don't need Keyboard Maestro

Yes, that's was the idea, this time ;-)
This morning a just completed a complex one for fuzzy search from the terminal, but it's awfully complicated and works only with KBM, the terminal, the stickies and Sublime Text. And it's limited to only one result. And I cannot control the fuzzy thing (no percentage setting, it works, or don't, period.

Conclusion : completely useless, but I had a good time playing with agrep. :-)

Alain: ...only with KBM, the terminal, the stickies and Sublime Text

I bet you a bottle of arak it can work without using KB. And probably without Stickies and Sublime Text as well.


Julie: I don't think "target only QA" it's something that CT should worry about.

But it does (see "Double spaces" and "Double words" QA).

Hans: Especially useful when using MT, since this duplication is a characteristic of them

No, it's not MT-specific, but rather specific when it comes to change the syntax of a sentence (or maybe due to a repetitive source text that should not be repetitive).

Tre: But it does (see "Double spaces" and "Double words" QA).

Sure, basic QA is fine. But I am afraid it will bloat CT into a jack of all trade if more complex Target QA is added. I'd rather the developers concentrate on improving the actual translation part of CT. There exists much more complete software to do the target QA (in FR and EN anyway).

1 person likes this

Sure, basic QA is fine... .

Yes, this how I would wish to keep developing CafeTran. This is the only way not to bloat it, and not go crazy, by adding only basic features of various useful tools that can complement it. CafeTran cannot be a perfect web browser, a PDF viewer, a QA tool, Word processor, terminology tool etc. There exist specialized standalone tools for each of the above function which are as complex as the CAT itself. Thanks Julie!  

3 people like this

Very understandable, Igor. And it's your software. Nevertheless, I hope that you don't mind that we discuss additional features here. Perhaps a new forum should be opened? Daydreaming away about useful features, that won't gonna make it into CafeTran or so? ;)

On a more serious note: I understand your perspective and even agree. However, to enable users who want to add their own features via macros, JavaScript, Swift etc., it would be handy to have some plug-in interface or at least some ways to get info from CafeTran, like the source and target language of the current project (just as one example). I think that the number of required 'entry points' is rather limited. So, perhaps you'd like to consider offering this? This could also reduce the number of requests for features.

If users want to have a 'special' feature, you could say: here's the plug-in interface. Go find a programmer who can fulfil your wishes.

How about offering an API and or support for running CafeTran from the Terminal/Windows command line?

cafetran-training: How about offering an API and or support for running CafeTran from the Terminal/Windows command line?

Agree, but still waiting for a yes or no answer here. :-)


1 person likes this

Okay, Igor, I understand that.

However, some features (such as Language Tool, see here) might be a wonderful thing to implement.

Login to post a comment