Anschlußdose <> Anschlussdose

Glossary contains Anschlußdose (old spelling)

Project contains Anschlussdose (new spelling)

No hit found. I've installed this version of the German Hunspell:

de_DE_comb.dic contains ca. 597 thousand words.

de_DE_comb.dic contains German words for old and new spelling

Further investigation needed ...

It doesn't work in either direction: ß in glossary, ss in source segment nor the other way around.

Igor "cheats." For this new feature, he doesn't use the Hunspell Spellchecker to check the spelling, he doesn't use the tokeniser to find stems, he uses the above to arrive at "basic words," words as they occur in the dictionary, lemmata. Anschlußdose is not a lemma in the Hunspell *.dic, so if there's Anschlussdose in the text, CT will never come up with Anschlußdose. Same goes for your previous wrong examples, For "Bäumen" the entry for "Baum" is auto-assembled, whereas the entry for "Bäume" would have been correct. CT cannot find the plural, because it's not a lemma in the *.dic. Not a lemma in any dic. And to wrap it up, Schüler is a lemma, and you won't find Schule for it because CT finds lemmata, not stems. And all of this makes sense, because most likely, your resource will (at least) contain the "basic words" just like dictionaries do. My previously launched idea was using Hunspell to arrive at all possible declinations/conjugations of all lemmata, and add them to your resource. No need for implementing it CT. CT's version is for the lazy ones, and for the ones who understand how it works...



Just had my first useful fuzzy match AA'd:

The translation of 'Schwingungsmessungen', where my glossary only contained the singular.

BTW: I'll replicate the Anschlussdose scenario in Transit, with its morphological reduction.

After work. In my playing time. The translation of 'Schwingungsmessungen', where my glossary only contained the singular.

I wonder if that's possible. Igor: The function performs the look-up of word stems defined in Hunspell spell-check dictionaries. I looked it up, and there's no lemma Schwingungsmessung in the Hunspell *.dic. That would mean that CT not only searches the Hunspell *.dic but also the user resources. I doubt that.

... with its morphological reduction.

Igor, would it be possible to get a special colour for glossary entries that have been assembled because of fuzzy matching? As a warning.

... with its morphological reduction.


As I explained already in 2005:

And the test with Anschlussdose, it's indeed recognised when Anschlußdose is in the dictionary:

  • Anschlußdose: If you use the algorithms of a spellchecker, you cannot expect to arrive at a misspelled lemma
  • If you use morphological reduction for Anschlussdose, you cannot expect to arrive at Anschlußdose, unless there's a rule for it
  • If you "reduce" Bäumen, you cannot expect to arrive at the plural, unless the algorithm checks your resources after each and every reduction
  • Now how did you arrive at Schwingungsmessung again?


