Hi Hans,
TM-Town currently harvests multi-word terms by searching for n-grams (bi-grams, tri-grams, 4-grams), extracting those that occur at a high frequency, and filtering out uniques (i.e. removing a bi-gram that is part of a larger tri-gram).
I'd be interested to learn more about how you would use a regex. One concern I have with a regex solution is that it is difficult to make a language independent solution. In other words, it might work fine for European languages, but probably will fail hard for asian languages.
TM-Town can still get much, much better at extracting multi-word terms, so open to any and all ideas.
Kevin
Hello Hans,
Thanks, all great points! I definitely plan to spend more time soon on improving the term extraction in TM-Town. I've already made some adjustments over the past few days based on your email feedback.
If you are willing, it would be great if you could send me a small sample doc in your language of choice (maybe German) and a second document with the terms that you would like/expect to see extracted. This way I have a base to test against.
I know you are busy, so no worries if you can't, but much appreciated if possible.
Cheers,
Kevin
Hi Hans,
You mean like the "new" feature in DVX3 (which CafeTran has already had for quite some time)?
See:
(1) http://helpdesk.atril.com/index.php?pid=knowledgebase&cmd=viewentclient&id=242 ("Déjà Vu X3 now supports using Regular Expressions in the Search and Replace dialogs.") +
(2) https://www.youtube.com/watch?v=l2M8b_zyewk&feature=youtu.be ("Extract terminology with Déjà Vu X3 using Regular Expressions")
Hans CafeTran Wiki