Start a new topic

Q: Excluding short fragments

 Hi,


Is it possible to exclude "short" TM fragments from the search results (e.g., shorter than five characters)? I changed several settings on the Preference > Memory pane, but got no expected results.


search results

match results

Marsato: Is it possible to exclude "short" TM fragments from the search results


Interesting question. I don't think you can, and though it seems useful at first sight, maybe it's not.


H.

Thank you!

 

But you can prevent Auto-Assembling by increasing the AA insert threshold (and the Fuzzy Match insert threshold? They're both set to 90% here). That way, the Automatic Workflow will still find all terms in the TM for Fragments entry, but they will not be inserted.


H.

Yes. But the reason I asked this is that sometimes it takes much time for CT to show the results because of the large size of my TM (which you'd prefer to call Big Mama).

Well, so, I should do preliminary matching and increase the threshold values as you advised to make the workflow more speedy.

Thanks,

 

M: because of the large size of my TM (which you'd prefer to call Big Mama).


Now that you can solve by importing it in the database, and let CT create a TM with only the relevant TUs...


H.

Hi,

If you have time, please teach me about Total Recall.

It's about an external DB, isn't it?

What type of DB is it? A third-party-provided rental server? Or a local server (e.g., created with Mac OSX Server)?

 

Masato: If you have time, please teach me about Total Recall


See Solutions


My short version:


CafeTran comes with an Igor-defined database, which is located in the CT package, /Applications/CafeTran.app/Contents/Java/resources/databases/SQLiteMemoryBase.db (OS X path)


If you delete it, CT will create a new, empty database. It’s an SQLite database, although I think you can still use Igor’s old version, H2. Not recommended.


Using the Menu, you can add tables to it. A new table will be indexed first, which may take a while. But after that, a database search is blistering fast, heaps faster than searching a TMX memory, even though that is loaded in the RAM whereas the database isn’t.




That alone would have been useful enough, but Igor added the Recall Segments function to it. The idea is simple and brilliant: CT searches for all words (minus stopwords, I suppose) of the document you need to translate, and adds the segments* in the table in which a word occurs to a TMX file, effectively reducing your resource.


So if you have a large Big Mama, that has become too large to use, you can upload it to the database as a table, and Recall Segments will spit it out again in a TM for the current project only. That TM will be a lot smaller than the Big Mama, of course, so you can use it again in the Automatic Workflow for Auto-Assembly. At the end of the project, you should import the ProjectTM in the table.


*default maximum number = 100 


H.

Masato: It's about an external DB, isn't it?


That's what it used to be called. I never understood the "external" part of it. By default, the database is located in the CT package on your computer, but you can save it anywhere.


What type of DB is it? A third-party-provided rental server? Or a local server (e.g., created with Mac OSX Server)?


It's an SQLite database. I think it comes with most operating systems, though I downloaded a more recent version. It's a relational database, in CT's case without relations, though. I tried to understand the lot, but it's not particularly easy, and for use in CT, it's not necessary to understand how it works. I can, however, create my own databases, and refer to them in CT to use them. Basically a waste of time, and in my case, a huge waste of time.


H.

By the way, I mentioned that searching a table is incredibly fast, but so is creating the TM with the recalled segments. I don't understand how our Benevolent Leader does it. The process is easy enough, to the point I can probably replicate it, but I'd probably have to wait a few weeks for the resulting TMX file...


H.

Did I ever mention I HATE Freshdesk? Several minor mistakes in my text above, but I can't edit it. I'd have to delete the posting, then enter it again with the corrections, and upload any screenshots again. That not only urinates me off, it's also pure and utter shit for people who'll get e-mail notifications for new postings. Like for this one.


H.

There has been a big misunderstanding on my part: I thought this feature is only for connection to an online server. I'm really happy to know that I can create a TR database in the local storage and it is retrieved very fast.

Thank you for your navigation!

M,

 

Masato: I thought this feature is only for connection to an online server.


Probably because that "external" in the old External Database. Confusing. Would it have refered to a resource that's not being loaded in the RAM?


H.

> Did I ever mention I HATE Freshdesk?


Yes, you did. Please see the below link. I will bang at that door too to allow users edit their own posts.


https://support.freshdesk.com/support/discussions/topics/308689


Igor

Login to post a comment