Start a new topic

Can we have a new segmentation type that ignores tab characters?

For reasons of compatibility with other CAT tools that can include the tab character in segments and for reasons of segment context, I'd like to request:

  1. Rename the current segmentation type Paragraph to Tab character.
  2. Introduce a new segmentation type Paragraph that ignores tab character, that is: it includes tab characters in segments.
Example document:

image


Current segmentation with selection Paragraph:

image


Though this segmentation is very useful, it can be very useful to use this type of segmentation too:

image



On second thought:

  • Leave the current segmentation type at Paragraph
  • Name the new one: Paragraph including tab characters

For Excel spreadsheets, have you tried Document?


More generally, have you tried with the SRX provided by the OmegaT project?


Link found here: https://github.com/idimitriadis0/TheCafeTranFiles/wiki/1-Preferences#segmentation


Can't test right now, but I think the Tab character does not break sentences.


However, for Excel spreadsheets, Paragraph or Document (maybe try Document as well) are of more use, because it often makes more sense to have one cell = one segment.



1 person likes this

I need it for bilingual review tables. Sometimes, SDLXLIFF projects are very slow when navigating from segment to segment (I think there is a problem with the creation of these projects). Since I cannot afford the waste of time (and more important: I get sick in my stomach while working in these projects) I'm currently testing the translation of these (few particular) SDLXLIFF projects via:

  • Save all empty segments to a bilingual review table
  • Quickly translate this in CafeTran Espresso
  • Use the thus produced memory to insert all segments in the SDLXLIFF project 
This works remarkably well!

Of course, segmentation has to be identical.


Thanks for the suggestions, I'll investigate them in my free time. 

@Jean:


I have downloaded and tested the oT and LT SRX files but I cannot get segmentation per line. Am I doing something wrong here?


Thanks for any help.


image


image


srx
srx
docx

By default, the oT SRX seems to break at tab character (I don't use the LT one).


I'm sorry, I can't help with customizing it to fit your needs.


1 person likes this

Not a problem. And thanks for confirming my findings. Have a nice day!

Login to post a comment