I am translating an English text XML file that includes tags.
Is it possible to hide/filter out the tags from the segment grid thereby only showing the actual segments I will be translating?
Below is a sample of the XML text, followed by a screenshot of what my segment grid currently looks like.
<?xml version="1.0" encoding="utf-8"?> <chapter id="chapGUI" xreflabel="GUI"> <title>GUI</title> <!-- Prerequisites --> <section> <title>Prerequisites</title> <para> This chapter assumes that you already have the GUI installed on your PC and that you have working code integrated with your application. If you do not have this, please contact us and order a code integration or a demonstration kit. </para> </section> </chapter>
Are you sure that you selected the XML file type? I get this:
All looks fine to me, except for the comment, that shouldn't have been imported in my opinion. Perhaps make this optional?
I have no idea, why segmentation takes place after 'have':
Best create a ticket and send the XML file to the Support Department in Kolobrzeg.
Thank you for your responses cafetran.training.
I initially had the file type set to "XML/HTML with tags" in the Project configuration, recreating the project with file type set to "XML" hides the tags as you said.
The segmentation however, is not optimal. I would like for it to be segmented by sentence instead of split into pieces as it is now. For a small this would be easy to manually merge, but the entire XML file is much larger than the sample I've posted here.
The image below is my segment grid now.
I don't know if the client will be happy if the segmentation for the translation is "natural," but you could try using a regular expression to achieve it. If you happen to be on a Mac, I know another, easier solution.
Thank you for your response Woorden.
I don't follow what you are proposing however.
By setting the file type to "XML" I can hide the XML tags from the segment grid as shown above.
But the sentences are split up into multiple segments (due to the structure of the text from the XML file I assume).
I'm on Windows and am looking for a way for CafeTran to not split sentences up into multiple segments.
An XML file isn't an "end file," it will probably be printed as a PDF or as HTML. So if you "join" those segments, those resulting files may end up in a mess.
You can, however, try to join them in CT using a regex. You'll have to be very careful, though, and you should check the whole file before you execute the regex. My try (not having the whole file at my disposal):
Addendum: Maybe it's not a bad idea to join the segments anyway. I suppose the XML code will take care of it. But I'm not sure...
Right, the XML files will be compiled into a CHM help file.
So basically use regular expressions to merge the text together if I understand you right.
know how to do that from a text editor (Notepad++) to modify the source
XML files so that they are not split up, but currently can't figure out
how I would do that from the CafeTran interface.
modified the file from Notepad++ and it seems to work with a small
sample thus far. The segments are split by sentence in CafeTran, and the
compiled CHM help file looks normal.
I'm using Woorden's regular expression: ([a-zA-Z])\r\s\s\s\s\s in Notepad++ and find/replace to join the text together, but it cuts off the last character on the line. If I go this route I'll have to educate myself more about regular expressions and see if I can sort it out.
modifying the XML files to merge the text together is one method, but
having to modify thousands of lines of XML text and confirm they are
being merged together correctly isn't ideal. Is there no solution to
automatically merge the segments together from CafeTran? Would some form
of custom segmentation rules in the "Rules.srx" make this sort of thing
Do you have the CHM file? Their should be a way to translate them directly, without going the XML route. I'll be darned if I remember how, though, too long ago.
> The segmentation however, is not optimal. I would like for it to be segmented by sentence instead of split into pieces as it is now.
You might try changing the segmentation simply by choosing Rules.srx in Edit > Preferences > Segmentation drop box. Then you would need to create a new project as the segmentation is done at the start of the project.
> Do you have the CHM file? Their should be a way to translate them directly, without going the XML route. I'll be darned if I
> remember how, though, too long ago.
I do have the CHM file. I'll dig around and see if I can find any info. Please let me know if you remember how you did it
> You might try changing the segmentation simply by choosing Rules.srx in Edit > Preferences > Segmentation drop box.
> Then you would need to create a new project as the segmentation is done at the start of the project.
Thanks for the response Igor.