Start a new topic

XML file - hide tags from segment grid

I am translating an English text XML file that includes tags.

Is it possible to hide/filter out the tags from the segment grid thereby only showing the actual segments I will be translating?


Below is a sample of the XML text, followed by a screenshot of what my segment grid currently looks like.

  

<?xml version="1.0" encoding="utf-8"?>

<chapter id="chapGUI" xreflabel="GUI">
  <title>GUI</title>

<!-- Prerequisites -->
  <section>

    <title>Prerequisites</title>
    <para>
      This chapter assumes that you already have the GUI
      installed on your PC and that you have working code
      integrated with your application. If you do not have
      this, please contact us and order a code integration
      or a demonstration kit.
    </para>

  </section>
</chapter>

 


image



cafetran.training: Perhaps you want to test the import result from other CAT tools, like Transit or Studio? If they segment correctly, you could translate their files in CafeTran.


Thanks for the info. CafeTran is the only CAT tool I have real experience with, but I'll consider trying some others like you mentioned and see.

>This is my first translation project with XML files and it is looking to be quite a bit more cumbersome to work with than other common file formats.


Perhaps you want to test the import result from other CAT tools, like Transit or Studio? If they segment correctly, you could translate their files in CafeTran.


I can help you with the import.

Igor Kmitowski: I suppose that by choosing the Tag segmentation from the Segmentation list in Preferences might solve your issue partially. However, then you could end up with long segments spanning between tags.


In some cases using the tag segmentation may be an improvement, but some text sections contain numerous tags such as the tags below which then become a segmented mess.


... <quote>text</quote> ...

... text <xref linkend="figure" /> text...

... <emphasis>text</emphasis> ...


This is my first translation project with XML files and it is looking to be quite a bit more cumbersome to work with than other common file formats.

I suppose that by choosing the Tag segmentation from the Segmentation list in Preferences might solve your issue partially. However, then you could end up with long segments spanning between tags.

woorden: For future use: KDIFF3, free software to spot differences between files. You may have to change the extensions to .txt. It even runs under Windows.


Haven't tried that one yet. I use Beyond Compare (not free), WinMerge, and DiffMerge.


woorden: By the way: chmProcessor: Word/HTML to CHM converter Never tried it, just found it.


Due to segmentation and exportation issues (as shown in one of my other threads) I think I'm giving up on using CafeTran for this project. Just gonna do it the old fashioned way and use a text editor (notepad++). Thanks for all your input though.

By the way: chmProcessor: Word/HTML to CHM converter Never tried it, just found it.


H.

jerwin267: (nearly) completely different


Bad luck.

For future use: KDIFF3, free software to spot differences between files. You may have to change the extensions to .txt. It even runs under Windows.

H.

woorden: What you could try (shouldn't take long): Convert the CHM to DOCX using Calibre, unzip it, and check if/how the DOCX XML differs from the CHM XML.


You certainly are well versed with working with documents. I had no idea Word .docx files could be unziped into xml files. Unfortunately in this case the converted XML and the original XML are (nearly) completely different.

jerwin267: Due to the risks I think I'll pass on this method for now


What you could try (shouldn't take long): Convert the CHM to DOCX using Calibre, unzip it, and check if/how the DOCX XML differs from the CHM XML.

H.

woorden: I converted it using calibre. No idea how to convert it back, though.


Interesting. But yeah, then I'd have to find a way to convert whatever I converted the CHM to back to XML so I can rebuild the CHM, all the while making sure I didn't break anything in the process. Due to the risks I think I'll pass on this method for now, but I do appreciate your input.

jerwin267: I do have the CHM file


I converted it using calibre. No idea how to convert it back, though.


H.

> Do you have the CHM file? Their should be a way to translate them directly, without going the XML route. I'll be darned if I

> remember how, though, too long ago.


I do have the CHM file. I'll dig around and see if I can find any info. Please let me know if you remember how you did it



> You might try changing the segmentation simply by choosing Rules.srx in Edit > Preferences > Segmentation drop box.
> Then you would need to create a new project as the segmentation is done at the start of the project.


Thanks for the response Igor.

Regrettably, the sentences are still segmented.

The segmentation however, is not optimal. I would like for it to be segmented by sentence instead of split into pieces as it is now.


You might try changing the segmentation simply by choosing Rules.srx in Edit > Preferences > Segmentation drop box. Then you would need to create a new project as the segmentation is done at the start of the project.

jerwin267: Would some form of custom segmentation rules in the "Rules.srx" make this sort of thing possible?


I think that would make it worse. But using a regex or regexes isn't too bad. In CT's Menu, go Edit | Find, and enable the right options

image


Note that I never ever use regexes in CT because they are very powerful and dangerous. Also because there are various regex "flavours" and I din't know if I use the right one for CT (Java flavour). And of course, always use a copy of your original XML file.

I'll have to educate myself more about regular expressions

I suggest you use http://www.regular-expressions.info and regexr (the latter to test, their website is actually better than the app I use).


Do you have the CHM file? Their should be a way to translate them directly, without going the XML route. I'll be darned if I remember how, though, too long ago.


H.

Login to post a comment