Start a new topic

How to deal with Studio projects with lots of reps and HTML codes

How would you handle Studio projects with lots of identical segments and many segments that contain unprotected HTML codes?

Perhaps I should use this menu item?


From the Ref Files: Filter > Repeated and propagated first segments = Include first segments only which are exactly the same as at least one other segment.

Looks like I cannot get a tooltip for this menu item on my Mac ...

First filter on this, translate all repeated segments, commit to TM, deactivate filter, run insert all EMs? Translate all other (not repeated) segments?

Since I also want to convert the in-segment HTML codes to CafeTran tags, I'll go for a workflow via an external review document. I'll open it in MS Word, remove all doublets and make all HTML codes invisible. Let's see how this works.

Duh, the MS Word bilingual document is 558 pages. Selecting one column is even on my heavy iMac quite slow. If at all possible ...

Bilingual document is way too large for MS Word, LibreOffice, TextEdit etc. What to do ...

I exported the Studio project from CafeTran Espresso 10 Croissant to HTML. Good old KompoZer did a quick job on removing the first, third and fourth column of the HTML file. Then I exported to plain text.

I'll open this plain text file in BBEdit to remove duplicates and then in MS Word to hide all HTML tags.

Had to return to the original HTML (the one with only the relevant column to translate), since many segments have several lines. I need to replace all returns with a <NewLine> tag in MS Word first. After that, I can start marking all error and warning numbers (F122, W487, ...) with the hidden font attribute.

HTML from SDLXLIFF now looks like this:


Now I have the document Export - unique lines.docx, with 7K instead of 25K lines.

I mask all HTML tags:


This is only the HTML tags, without the error and warning numbers (that still have to be hidden):


Hiding the error and warning numbers:


As it turns out, words between CafeTran tags, representing hidden HTML codes, aren't recognised:


So I'll have to add spaces around all HTML codes and import the document again.

Login to post a comment