Start a new topic

How to deal with Studio projects with lots of reps and HTML codes

How would you handle Studio projects with lots of identical segments and many segments that contain unprotected HTML codes?


That required quite a lot of steps (Search & Replace actions in MS Word). The result in CafeTran is:

  • Words are recognised but ...
  • ... I'll have to insert tags and non-breaking spaces
  • You win some, you lose some ...
For the record, here are the replacements required (see attached screenshots). Note that every group of HTML codes is surrounded by one non-breaking space. (Instead of every single HTML code; a little smartness integrated ...)

BTW: I decided to go for non-breaking spaces, since I can remove them from the exported MS Word document afterwards.

The Result in CafeTran Espresso 10 Croissant:

image


zip

This situation was annoying me. In a brainwave I had the idea to replace all non-breaking spaces in the MS Word document with new line characters (^s with ^l), and hurray: all HTML list items are segmented perfectly without any CafeTran tags!!!


image



I translated some segments with list items and exported:


image


Once I have finished this project, I'll have to:

  • Replace all ^l with nothing in MS Word.
  • Replace all <NewLine> with ^l
Let's see if everything will work as advertised, in a couple of days ;).

Syncing with PDF from the Studio project:


image

Clean, non-distracting representation in CafeTran Espresso 10 Croissant:

image


Would be nice if CafeTran Espresso 10 Croissant had a regular expression tagger.

> Would be nice if CafeTran Espresso 10 Croissant had a regular expression tagger.


I like the way you handled it outside the program but do you realize how complex (and possibly frustrating) the whole issue of custom tagging via the regular expressions might be for most users?    

>I like the way you handled it outside the program but do you realize how complex (and possibly frustrating) the whole issue of custom tagging via the regular expressions might be for most users?    


Yes, I do. But, OTOH, users come in all flavours. This is still one of very few features that I'm missing often. Most likely because I'm a translator of machine/plant related texts.

CafeTran's insertion of the correct numbers in Fuzzy Matches isn't what it should be ...


I should have masked all isolated numbers for this big job.

>custom tagging via the regular expressions might be for most users?    


It would be great if CafeTran Espresso 10 Croissant would have a button in the Add Term dialogue box to convert the selection of the target language box (sic) to regular expressions.


Of course, not everything can be covered. But I guess a 90 % should be feasible. I'm happy to contribute m non-translatables glossaries and mark-up macros for inspiration.

Beware: MS Word removes the HTML formatting for bold and italics!


The whole workflow has worked, except for about 2K words that I have to restore from TM and where I have to insert the HTML formatting for bold and italics. Of course, via non-translatables.

|\<i\>\<b\>

|\<\/b\>\<\/i\>

|\<[biu]\>

|\<\/[biu]\>

|\<(br|hl|li|ul)\>

|\<\/(br|hl|li|ul)>

I had to use Excell to align the source and target. Word didn’t handle the huge document, despite of 32 GB RAM. All in all the workflow went smoothly.
If you have access to SDL Studio you can create a bilingual review document that can be imported: http://producthelp.sdl.com/SDL_Trados_Studio_2015/client_en/edit_view/AboutReviewingFilesExternally.htm You might want to hide the markup in MS Word, prior to importing the document in CafeTran.
When you have to translate bilingual review tables that contain tab characters, e.g. 1.2 tab Introduction, you should replace the tab characters in MS Word with a unique character. In CafeTran replace this unique character at the source side with a tab character. By using this workaround you can prevent CafeTran to split segments at every tab character.
Login to post a comment