Start a new topic

Translating HTML files in CT - anything I need to know

I may soon have a large HTML translation job coming up, and before I start making any promises to the client I wanted to check how CT handles HTML files.

I can't find anything on the forums about this, apart from one thread about problems segmenting javascript.

Does CT play nice with HTML files? Are there any particular pitfalls to watch out for?


I rarely have to translate such file types, but the best way to test how CT does work with it is to save some web page in your browser, save the whole web page it in a folder, and then open it in CT to see how you can deal with it.
My guess is that you will have to handle tags together with the text.

Mike

 

Perhaps you're not happy with this suggestion, but whenever I find that CafeTran's file format filters aren't up to my expectations, I use another CAT tool (like Studio or Transit) to create a file that I then translate in CafeTran. I do this for FrameMaker MIF.

Don't know if it is too late for you, but I do translate html files with some php code in them. I find WordFast 4 (which I try to avoid most of the time) handles them better. CT does not segment the alt text. So that you have to open the file with BBedit or the likes and translate those snippets manually. They are usually short. But if you have lots of them, it is a pain. I don't like segmenting "elsewhere" and bringing it back to CT as you can't preview the document in CT (or am I missing something?).

Yes, missing bilingual preview is a bummer... Adding/improving file filters would be a major improvement, although maybe not the most rewarding developper activity. Igor would have to brew several cups of of his cafetran coffee!

Thanks Julie, that's good to know. Missing alt tags sounds like a pain. (@Igor: but surely easily fixable?)

 

About to embark in a fairly big html project. Any chance the html segmentation rules will see an improvement and include sub-flow items?

Oh and the following code (only a snippet) is segmented into a single huge segment with the list of items separated by tags. For long tables of contents (where titles have to be consistently used throughout the doc), it is a pain to have to split the segment into dozens of small segments to be used later.


Would it be terribly hard to tell CT to segment each list item?

<li><a href="#page_3-0-0">Creating the Cone of Uncertainty</a>
                        <ul class="nav" id="ul_3-0-0">
                          <li><a href="#page_3-1-0">Using the Cone of Uncertainty</a></li>
                          <li><a href="#page_3-2-0">One to Five-Day Track Errors</a></li>
                          <li><a href="#page_3-3-0">Two-Day Track Scenarios</a></li>
                        </ul>
                      </li>

 

>it is a pain to have to split the segment into dozens of small segments to be used later.


Personally, I'd be very careful with splitting and joining. To avoid export problems later.

Perhaps you'd like to use the Okapi HTML filter:


http://okapiframework.org/wiki/index.php?title=HTML_Filter


Igor, how about adding support for these filters? Wouldn't that be a more effective approach?


1 person likes this
I wish I could like Hans' comment many many times - this is definitely the way to go!! (Assuming there are no licensing issues.)

 

Igor, how about adding support for these filters? Wouldn't that be a more effective approach?


Nope. I've never seen OKAPI. As with every CT feature, I wish to improve CafeTran's html filter gradually but I don't really know when it will be accomplished. I can succeed next week, next month, next year or tomorrow. I generally have no idea when something can be completed.

>Personally, I'd be very careful with splitting and joining. To avoid export problems later.


Hmm, yeah. Maybe I can just translate the whole bunch and then commit each subsegment to memory to be retrieved later. Okapi! I think I'm to old to learn new tricks ;-) but that's worth a try.


I wish to improve CafeTran's html filter gradually... next week, next month, next year or tomorrow


Thanks, it's still good to know html filter improvement is in the pipe.


1 person likes this

Would it be terribly hard to tell CT to segment each list item?


Okay, the above has just been fixed for the update 3 (build 2017031401). Please download and install the update again. You will need to start a new project with the changed html filter.


1 person likes this

>I wish to improve CafeTran's html filter gradually 


Wouldn't a regular expressions tagger be a more flexible solution/approach/hans_happy_maker?

Login to post a comment