Hi Hans,
You mean like the "new" feature in DVX3 (which CafeTran has already had for quite some time)?
See:
(1) http://helpdesk.atril.com/index.php?pid=knowledgebase&cmd=viewentclient&id=242 ("Déjà Vu X3 now supports using Regular Expressions in the Search and Replace dialogs.") +
(2) https://www.youtube.com/watch?v=l2M8b_zyewk&feature=youtu.be ("Extract terminology with Déjà Vu X3 using Regular Expressions")
Hi Hans,
TM-Town currently harvests multi-word terms by searching for n-grams (bi-grams, tri-grams, 4-grams), extracting those that occur at a high frequency, and filtering out uniques (i.e. removing a bi-gram that is part of a larger tri-gram).
I'd be interested to learn more about how you would use a regex. One concern I have with a regex solution is that it is difficult to make a language independent solution. In other words, it might work fine for European languages, but probably will fail hard for asian languages.
TM-Town can still get much, much better at extracting multi-word terms, so open to any and all ideas.
Kevin
Hello Hans,
Thanks, all great points! I definitely plan to spend more time soon on improving the term extraction in TM-Town. I've already made some adjustments over the past few days based on your email feedback.
If you are willing, it would be great if you could send me a small sample doc in your language of choice (maybe German) and a second document with the terms that you would like/expect to see extracted. This way I have a base to test against.
I know you are busy, so no worries if you can't, but much appreciated if possible.
Cheers,
Kevin
Hans CafeTran Wiki