Start a new topic

REQ Filter TUs that only differ in punctuation and spaces

For the cleaning of gigantic TMX files, automatically created by tools as Terminotix, it would be cool to be able to clean up / compact these giants by filtering on differences in punctuation marks, spaces and numbers. TUs that have the same letters but different numbers, spaces and punctuation, should be listed. The first occurrence should be kept.
1 Comment

Open a copy of the TMX file in TextWrangler (a UNIX app if the file is larger than 380 MB), and use grep to delete end punctuation. Use CT's TM editor to get rid of duplicates.

Alternatively (easier), convert the TM to plain text. Open in TextWrangler (a UNIX app if the file is still larger than 380 MB). Use grep to delete end punctuation. In TW, use the built-in duplicate remover. In CT, import in an empty TM.

Not sure if this is cool enough, though... Let's ask that Michael if that's the case. He's the cool expert.


Login to post a comment