Start a new topic

XML file - hide tags from segment grid

I am translating an English text XML file that includes tags.

Is it possible to hide/filter out the tags from the segment grid thereby only showing the actual segments I will be translating?


Below is a sample of the XML text, followed by a screenshot of what my segment grid currently looks like.

  

<?xml version="1.0" encoding="utf-8"?>

<chapter id="chapGUI" xreflabel="GUI">
  <title>GUI</title>

<!-- Prerequisites -->
  <section>

    <title>Prerequisites</title>
    <para>
      This chapter assumes that you already have the GUI
      installed on your PC and that you have working code
      integrated with your application. If you do not have
      this, please contact us and order a code integration
      or a demonstration kit.
    </para>

  </section>
</chapter>

 


image



Are you sure that you selected the XML file type? I get this:


image


All looks fine to me, except for the comment, that shouldn't have been imported in my opinion. Perhaps make this optional?


image


I have no idea, why segmentation takes place after 'have':


image


Best create a ticket and send the XML file to the Support Department in Kolobrzeg.

Thank you for your responses cafetran.training.

I initially had the file type set to "XML/HTML with tags" in the Project configuration, recreating the project with file type set to "XML" hides the tags as you said.


The segmentation however, is not optimal. I would like for it to be segmented by sentence instead of split into pieces as it is now. For a small this would be easy to manually merge, but the entire XML file is much larger than the sample I've posted here.


The image below is my segment grid now.


image


jerwin267: The segmentation however, is not optimal.


If I try do get rid of the mark-up things, I get something like this:

image

I don't know if the client will be happy if the segmentation for the translation is "natural," but you could try using a regular expression to achieve it. If you happen to be on a Mac, I know another, easier solution.


H.



Thank you for your response Woorden.

I don't follow what you are proposing however. 


By setting the file type to "XML" I can hide the XML tags from the segment grid as shown above.

But the sentences are split up into multiple segments (due to the structure of the text from the XML file I assume).

I'm on Windows and am looking for a way for CafeTran to not split sentences up into multiple segments.

An XML file isn't an "end file," it will probably be printed as a PDF or as HTML. So if you "join" those segments, those resulting files may end up in a mess.


You can, however, try to join them in CT using a regex. You'll have to be very careful, though, and you should check the whole file before you execute the regex. My try (not having the whole file at my disposal):


image



H.

Addendum: Maybe it's not a bad idea to join the segments anyway. I suppose the XML code will take care of it. But I'm not sure...


H.

Right, the XML files will be compiled into a CHM help file.


So basically use regular expressions to merge the text together if I understand you right.

I know how to do that from a text editor (Notepad++) to modify the source XML files so that they are not split up, but currently can't figure out how I would do that from the CafeTran interface.


I modified the file from Notepad++ and it seems to work with a small sample thus far. The segments are split by sentence in CafeTran, and the compiled CHM help file looks normal.

 

I'm using Woorden's regular expression: ([a-zA-Z])\r\s\s\s\s\s in Notepad++ and find/replace to join the text together, but it cuts off the last character on the line. If I go this route I'll have to educate myself more about regular expressions and see if I can sort it out.


So modifying the XML files to merge the text together is one method, but having to modify thousands of lines of XML text and confirm they are being merged together correctly isn't ideal. Is there no solution to automatically merge the segments together from CafeTran? Would some form of custom segmentation rules in the "Rules.srx" make this sort of thing possible?

jerwin267: Would some form of custom segmentation rules in the "Rules.srx" make this sort of thing possible?


I think that would make it worse. But using a regex or regexes isn't too bad. In CT's Menu, go Edit | Find, and enable the right options

image


Note that I never ever use regexes in CT because they are very powerful and dangerous. Also because there are various regex "flavours" and I din't know if I use the right one for CT (Java flavour). And of course, always use a copy of your original XML file.

I'll have to educate myself more about regular expressions

I suggest you use http://www.regular-expressions.info and regexr (the latter to test, their website is actually better than the app I use).


Do you have the CHM file? Their should be a way to translate them directly, without going the XML route. I'll be darned if I remember how, though, too long ago.


H.

The segmentation however, is not optimal. I would like for it to be segmented by sentence instead of split into pieces as it is now.


You might try changing the segmentation simply by choosing Rules.srx in Edit > Preferences > Segmentation drop box. Then you would need to create a new project as the segmentation is done at the start of the project.

> Do you have the CHM file? Their should be a way to translate them directly, without going the XML route. I'll be darned if I

> remember how, though, too long ago.


I do have the CHM file. I'll dig around and see if I can find any info. Please let me know if you remember how you did it



> You might try changing the segmentation simply by choosing Rules.srx in Edit > Preferences > Segmentation drop box.
> Then you would need to create a new project as the segmentation is done at the start of the project.


Thanks for the response Igor.

Regrettably, the sentences are still segmented.

jerwin267: I do have the CHM file


I converted it using calibre. No idea how to convert it back, though.


H.
Login to post a comment