Start a new topic

Tags where they shouldn't be …

Following procedure:

  • Copy a chunk of text from a mail app
  • Insert it as text-only into a word docx
  • Import this word docx into CT

Now there is a tag at the beginning of each line (with boundary tags shown). There is no reason it should be there, is there?


Maybe this only happens with Word for Mac. 



> with boundary tags shown...


If you activate "Show boundary tags option", they will be shown. Word DOCX is a tag-based XML format. It doesn't matter if it contains text only. The boundary tags can indicate the start of the paragraph, for example. Please turn this option off if you don't need to display those tags.

Usually I have the boundary tags hidden.


However, this is only text, and there is no real logic in this kind of tag. Just tested: Studio does not show any tag. Nor does memoQ. OmegaT shows tags, but elsewhere, inmidst the segments (see below), however these tags can happily be ignored (they don't hinder a proper target file generation). So obviously no other tool will display these tags in CT. As boundary tags are being shown by default (aren't they?) a kind of worst-case scenario for some beginners.


image


In the xlf file from CafeTran, the first segment has <x id="1"/> at the beginning. Only here no tag is shown. The second segment has <x id="2" ctype="x-break" equiv-text=" "/> at the beginning. Perhaps this kind of tag could simply be filtered out totally.  

Pls read 'Only here no tag is shown' as 'Only here no tag is shown on the CT display'.

In this case, just replace the document and add it once again.


This time, use the MS Word OCR (.docx/xml) file filter instead of the plain MS Word (.docx/xml) one,


CafeTran offers a special filter to handle MS Word documents after OCR, which clears the source text of unnecessary formatting tags.


Note: You can also use this filter anytime the MS Word document produces many unnecessary tags.


Source: https://github.com/idimitriadis0/TheCafeTranFiles/wiki/4-File-formats#ms-word-ocr-docxxml

Thanks, but I do not think this is the proper solution (and it is not intuitive, especially not for newbies).


As shown above, these are tags that no other common tool shows (and, other than these omegaT tags, it makes a confirmation of the segment impossible). I do not want to be holier than the pope, but I assume nor does Igor. This phenomena makes CT more difficult than other tools, and it is as unnecessary as this zero-width space we discussed some weeks ago.

An improvement is welcome.


Just saying this is the solution that I use. It is as simple as choosing the next file filter upon project creation.


Also found in the official knowledge base (as far as newbies are concerned): 


Handling Tag Soup - https://cafetran.freshdesk.com/support/solutions/articles/6000160720-handling-tag-soup

Jean's tip is fine. I might add that if you translate a text only file, just save it as a .txt file in Word or LibreOffice, and you will get tagless segments.


Finally, there is the Clipboard workflow offered by CafeTran as another nice solution if tags are an issue for you.


If none of the above is satisfactory for you, just pick any tool from your list to use it. That's perfectly okay. :) 

And of course, please keep "Hide segment boundary tags" option in Action > Tags menu activated. Your first post suggests that you unchecked it for some reason. This option  is sometimes useful while translating projects created by other tools. I can't see any advantage of activating it with CafeTran projects.

Sigh.


"Hide segment boundary tags" can be a troublemaker, no matter in which kind of projects (in other cases not, indeed), Example:


You must follow procedure A

This is in CT:

You must follow 1procedure A2

This is in CT with the issue above (tags that should not be there):

1You must follow 2procedure A3

With "Hide segment boundary tags" this is shown as:

You must follow 1procedure A

If you need to translate this as

Sie müssen 1Verfahren A2 benutzen.

you can get into troube, especially as you won't necessarily recognize the problem with  "Hide segment boundary tags" activated.


For this reason, it would be nice to have at least superfluous tags removed (and by the way, it would be nice to get more infos on tags, maybe another display). 


I wonder how to communicate these tags to a beginner who stumbles on this and who sees that CT is the only CAT presenting them.


However, sometimes I have an impression as in a garage;

"This car runs fine, but it sometimes has terrible misfires when idling."

"No, we won't fix that, but don't worry, simply put on some ear phones, use another car or go by bus.".

Please keep "Hide segment boundary tags" option in Action > Tags menu activated.


> You must follow...


You don't have to follow it. Select a phrase to be formatted in the target segment, right-click and then choose one of the available format types in the pop-up toolbar.  To skip some (or all) source segment formatting tags,  select the "Automatic transfer of remaining tags" option via the Action  > Tags menu.


Also, please note the simple tags let you transfer source segment formatting faster than any other method. And in many cases, CafeTran uses them to auto-transfer formatting for you.



> You don't have to follow it. Select a phrase to be formatted in the target segment, right-click and then choose one of the available format types in the pop-up toolbar. To skip some (or all) source segment formatting tags, select the "Automatic transfer of remaining tags" option via the Action > Tags menu.


It would be nice if it worked.


Okay, here we are in practice. Unfortunately, worse than expected.

image

image

This is the sentence. A Word docx made in Word for Mac 365 the very last edition (by the way, when clients send an addendum, I do not send back text files, but only Word files, so why should I first save it as txt and then as docsx to send it?). "Hide segment boundary tags" is off. "Befolgen" (as being the verb) shall not be bold. There is no tag at the beginning, as this is the first line (same as above) Following your advice, I would to do the following:

image

Not an intuitive way, but let's say a kind of compromise (no idea how to deal seriously with larger, heavily formatted sentences).

This leads in the target docx to:

image

This is not what could not be expected, is it?


The smart advice that a non-bold full stop at the end of the sentence would avoid this is true, but not helpful. Nobody of us has a slave sitting besides the desktop to finetune our source texts (sometimes, finetuning makes sense, indeed …)


Some more bonus questions:

  • Is it possible that the editor in Windows does more interpretation than the Mac one (in external projects, not necessarily related to this, but quite obvious)?
  • Why is a bold command in Word interpreted here as a simple tag instead of <b> (the text has been typed and then set to bold, no more magic done)?
  • Why isn't there a final tag (only the words have been set bold, not the paragraph mark)? The next line is not bold, of course.


> It would be nice if it worked.


Here it works just fine. The "befolgen" should NOT be bolded in the target segment (after the last custom tag) after confirming it and the Export. You must have forgotten to confirm the segment. And please keep "Hide segment boundary tags" option in Action > Tags menu activated to hide the tags that you don't need to show.





Hm. Seems to work now.


Besides the three questions above that remain unanswered just as a note a thing that has been invented by the EC. Maybe it helps in this case to get rid of tags already when importing.

Login to post a comment