The best method to access the source language text of image files is to process them in the Optical Character Recognition (OCR) software. Some OCR programs produce MS Word documents from image files but quite often such a document after OCR is formatted in numerous sections. For example, various parts of text can can have a different shade of text color, background, font size, unusual spacing etc. It results in a large number of formatting tags making it difficult to ignore them. CafeTran has a special filter to handle MS Word documents after OCR which clears the source text of the unnecessary formatting tags.
- As you create a new translation project, choose the "Ms Word OCR (*.docx/xml)" option from the "File Type" drop-down list in the "Project configuration" panel.
- Click the Finish button.