Start a new topic

Please! Help the German Com-Munity! Crazy Binde-Striche

Would it be able to introduce a way that deals with the 'crazy Bindestriche' (superfluous hard hyphens), in words like:

Folgende Export-Möglichkeiten stehen zur Verfügung:

When your glossary contains Exportmöglichkeiten?

I'm not asking for any fuzziness here, I'd just appreciate an exact solution very much. This is really slowing me down.

I also see this a lot:

Folgende Export- Möglichkeiten stehen zur Verfügung:

1 person likes this idea

It's even worse. After posting the previous message, I got this segment:

Folgende Export -Möglichkeiten stehen zur Verfügung:

Yes, this is all a matter of Schlampigkeit. But that's not relevant here. Fact is that this sloppiness is slowing me down unnecessarily. The introduction of a switch:

[x] Ignore hyphen and leading/trailing space in glossary entries

would be a great improvement for me.

I'm just afraid that this would require a complete overhaul of the glossary code.

The underlaying question is: To what degree should a CAT tool cater for human errors and sloppiness?

1 person likes this

How about this one:

die Standard Maschinen - Konfiguration

Oh man ...

Trying to think outside of the box, like Jost just said in his Tool Box Journal, I am wondering if this kind of thing could also be dealt with prior to importing the document into CafeTran. That is, try to fix this  in the src file, like this interesting Star tool: "Other Star products include the FormatChecker. This tool checks about 50 different potential errors in Word or FrameMaker documents, ranging from typographical errors to duplicated spaces, paragraph marks, manual references, and many others. The intention is to create well-formed documents before the translation even starts, thus aiming at a better return on translation memory matches and/or better entry of data into the translation memory." (taken from Jost's latest TB Journal) I usually do quite a bit of cleaning before document import, a lot if it in TransTools, so will have a look if it already contains any functions relating to spurious hyphens. And if not I'll ask Stanislav if he has any clever ideas.
As long as you create the projects, this is indeed possible. It is also possible to do a F/R at the source in the grid. Both approaches have one big disadvantage, even two: xliff gets distorted and you'll have to do exactly the same replacements in follow-up projects. So a non fuzzy but binde-strich tolerant glossary would-be be optimal.

I can imagine that other languages have similar needs. How about the l and d apostrophe, for instance? Or z and s in English?

Hello Hans,

This binde-striche case seems like a good candidate for a regular expression:



Hello Igor,

That would require inserting regular expressions in my big glossary. Would it be possible to work the other way around: if a dash is spotted inside a word, followed by a uppercase letter (preceded by zero ore one space), CafeTran handles this like the do not match category and ignores the dash, while the following letter is interpreted as lowercase:

Sour segment:


Found in glossary as:


Hello Hans,

I will explore how to solve it although it might involve introducing some limited fuzziness to glossary matching.


Hello Igor,

That would be great! In manuals where clients want to write Maschinen-Bau-Gesellschaft, I can activate the Limited fuzziness on the fly via the context menu :). In all other manuals I just let it turned off.



What would be a good name for this glossary context menu? Resolve hyphens Ignore internal dashes Dump Binda-Stricha?

Actually, if you add - character to the "Do not match" list both BindeStriche and Binde-Striche will produce a match from your glossary. As for the careless "Binde - Striche" (with spaces around -), you might do Find and Replace in the Project Source segments scope.


I didn't know that! I thought that this category of characters was meant for the TM only.

Anyway, I guess that this means that I cannot have the glossary set to Match case?

Hi Hans,

Do you mean you didn't know that the "Do not match" list applies to TXT glossaries? I always assumed it applies to TMs (used for segments and/or terms), and TXT glossaries.

I am puzzled though hos this might affect the ability of making a TXT glossary case sensitive? How are the two related?


How about this one:



Next Segment:


That would have required either a dash or a better construction (Mindestdurchmesser der Folienrolle, minimale F. etc.)

This proves that the placing of dashes is totally arbitrary and, er, crazy.

Login to post a comment