Resegment XLIFF from paragraph to sentence


If your goal is to segment the input XLIFF file you normally need a simple pipeline that: 1) reads the file, 2) segment the source, 3) write back the file.

There is no need to create a translation kit or leverage or any other things.

The pipeline you are using creates a T-Kit, so that is why you have a .xlf.xlf output with different XLIFF data: you have extracted your XLIFF file into the XLIFF file of the T-Kit.


In your case you just need a pipeline with:


1) RawDocumentToFilterEvents

2) Segmentation

3) FilterEventsToRawDocument


In the segmentation options:

- Specify to segment the source and select the SRX file to use.

- The other defaults are likely fine


Make sure your languages and encoding are set properly and you can execute that pipeline.


The *.out.xlf files created should have your entries segmented.


Note that entries with that have a single segment are by default output without segment markers.

This is because XLIFF doesn’t have a way to indicate if an entry with a single segment has been segmented or not.


I’ve attached the output as example.

