In 2012 I posted a question about how to re-segment TMX files that contain several sentences in one Translation Unit (TU, the bilingual items of any TMX file):
Recently, Michael has brought this intriguing topic back to my attention. And, how it goes, this morning I got the idea to use a workaround via glossaries, which, BTW, are amongst the best features that CafeTran offers!
I created the attached glossary for a quick test (note that I've removed the optional trailing punctuation marks and spaces in this stage). From the Memory menu, I selected Import from glossary, et voila, we're always half through:
Any chance that we get a way to have a special mode of aligning here:
First sentence=Eerste zin
Second sentence=Tweede zin
Third sentence=Derde zin
I think that this could be really useful, like all things that can be done with glossaries (if you have an open mind).
Second test, with a longer glossary and some punctuation characters and trailing spaces (see the attached glossary if you want to replicate the test):
The idea is to use regular expressions (which I find really useful, BTW) to get from something like:
<tu tuid="1"> <tuv xml:lang="en-GB"><seg>Second sentence! Third sentence? Fourth sentence, </seg> </tuv>
<tu tuid="1"> <tuv xml:lang="en-GB"><seg>Second sentence! ;Third sentence? ;Fourth sentence, </seg> </tuv>
DISCLAIMER: If you don't have the flexibility to use glossaries, please ignore this posting.
Tell Michael I will implement the automatic TMX segment split if he translates his every second document in CT. Deal? :)
Pretty simple, at least in the case above.
>Align using CT's aligner (auto should do) or another aligner
No aligning please.