Start a new topic

How to "align" a big batch of bilingual text files

Anyone know how to "align" a big batch of bilingual text files that are already aligned. That is, I have hundreds of DE and EN txt files like this:

de1.txt = en1.txt

de2.txt = en2.txt

de3.txt = en3.txt

de4.txt = en4.txt


and I want to convert it all into a big TMX.

I therefore don't want the aligner to segment ANYTHING. The alignment is already correct. All I need is line 1 in each file to be matched with line 1 in the corresponding file, line 2 in each file to be matched with line 2 in the corresponding file, etc.


I thought that you could just go ahead and align two folders with files. This seems to be impossible:

Choose two folders:

You get an error message:

Work around:

  • Combine the source files to one big source file first.
  • Then combine the target files to one big target file.
  • Align the two of them.

Perhaps you want to request Igor to allow the alignment of a folder with files?

Look here:



To combine the files, you can import the results into Excel/LibreOffice Calc as a column and set them side by side. Of course the txt files must not contain any character (semicolons, tabs) that splits the content into 2 columns. 

Terminal. Of course. Should have thought about that. The OP can use the Windows solution.

Nice one, Torsten!

Thanks guys, but I have never in my life gotten the CT aligner to align anything. I simply don't understand it.

I just tried it with one file pair (of the zillions I have), and it opened it up in some strange window in the tabbed pane. Now what? Do I have to manually click through every segment???

PS: my post on Proz re this:

Thanks FarkasAndras,

But I solved it for now. I realised that in AlignFactory you can set the program to spit out separate TMXs, as well as one big one. So when you run a huge batch job, and the program chokes, the last TMX it spits out will be the one with the problem. The name of this TMX will correspond to the txt file (pair) with the problem. Just skipping this txt file usually allows the project to complete if rerun. I then just convert the single txt file (pair) with the problem into a separate TMX using Heartsome's TMX editor (Tools > Convert to TMX), and then merge it with the AlignFactory TMX.

Indeed: it's never a good idea to merge 100 txt files into a single big one for stuff like this. Way too much chance of something going wrong, not to mention merely merging 100 text files of this type is in itself quite a chore, and likely to choke most programs, even EmEditor.

No time for generating .bat files, etc., right now but I do look forward to your future GUI batch mode thingee, as I would love to test it against AlignFactory.

There really should be an easier way to do this though, seeing as how all of these files are effectively already aligned. All I need is for a program to: take the first line of text file de1.txt and match it to the first line of text file en1.txt, and turn this into a TU. Then, it needs to take the second line of text file de1.txt and match it with the second line of text file en1.txt, and turn it into a TU. Then repeat that a few times.

PS: this is what I'm currently working on:

Login to post a comment