Start a new topic

REQ: Better handling double entries in the glossary

There might be situations where double entries (without differences in context, field or notes) in the glossary might appear:

  • double entry is added after another entry has been made and the glossary shows only this newest one (didn't we have automatic reload already? - this happens always after editing an entry)
  • glossaries are merged or imported
  • the first entry is being overseen soemhow

The problem

  • adding the entry usually does not have any impact - you do not even note there is a double entry
  • double entries are not shown

The workaround

  • if you know there is a double entry, you might mark the source word and click on Glossaries on the Quick search bar to show, delete and/or merge (unfortunately only by hand) the entries

The wishes (alternatively and perhaps as option, to not overload slower machines)

  • show both entries
  • mark double entries at least with an asterisc



> double entries (without differences in context, field or notes) in the glossary might appear.

> There might be situations where double entries (without differences in context, field or notes) in the glossary might appear


They don't appear by themselves. Please avoid adding double entries thus creating the need for more complexity. Keep you terminology clean and simple, and then CafeTran interface will be kept clean and simple. Some general task to remove double entries might be added in the future.


> The problem


It's not a problem. It is a feature.

IK: ...Some general task to remove double entries might be added in the future.


Like:

image

and:

image


H.



Try the following:


1. Activate the Glossary interface via View > Show glossary menu.

2. Choose Glossary > Remove duplicate entries.

IK: Try the following


That won't work - I guess - in the case of e.g.


boer     peasant;farmer

boer     peasant


H.

Glossary > Remove duplicate entries does not give any reasonable response, eg. the number of deleted entries. Better would be to filter them out and show them, and then let the user decide.

And it won't work for
material|   Material
material   Material

 

Torsten: Better would be to filter them out


Better would be to convert them to TMX, run the readily available applicable tasks, and leave them as TMX.


H.

> And it won't work for


Of course, it won't. They are not duplicates.


> reasonable response, eg. the number of deleted entries


What's the reason for checking the number of deleted entries as they are gone anyway?  Just the check for check's sake. :)

> What's the reason for checking the number of deleted entries as they are gone anyway?

Indeed, maybe reasonable and "to expect" is more than this. The problem is that the deletion of duplicates is intransparent and obviously cannot be controlled or be undone. A feature that reveals identical or highly identical entries (e.g. identical source entry, but not identical target, source entries with a pipe as difference, source entries with only x letter(s) as difference, entries only with different context, notes etc.,) would be a grateful thing.

I think CTE creates a txt backup when you clean glossary duplicates, or alter the sort order.


Comparing the two text files with a visual diff tool (such as Meld) should be enough for the curious.


Some additional maintenance/filtering can be done by renaming the txt to csv and opening it in LibreOffice. Simply save back as tab delimited csv when done.

> Comparing the two text files with a visual diff tool (such as Meld) should be enough for the curious.

 

Sigh. I assumed we were talking about convenience. You can do anything with such tools, indeed. I can push my car 3 miles from A to B, but most drivers would expect to start the motor. Or more practical: You can use this tutorial to check your text, but you can also open the 300 URLs of 300 segments manually to work with and search in rahter raw JSON data (and perhaps, one day …).


And no, it's not about curiosity, it is more about control and security. Any database tool or any other program that offers the possibility to delete duplicates has (or should have) a kind of control. CT does not have this.

Sigh. You have easy deleting of duplicates. You have sorting where duplicates are visually close. You have lots of other cool glossary features in CT including the ability to open and manage your glossary in Excel or LibreOffice. Now, you expect a super-trooper duplicate tool within CT (let's say called "duplicator") with the enormous complexity where the user spends more time to manage duplicates than translating. Sure, this is possible to build an Excel-like interface so that the user could play with duplicates. But note that it may take more lines of code than the rest of CT. I am convinced the current approach of handling of glossary duplicates is optimal.    

Sigh again.

> You have sorting where duplicates are visually close.

Nice for a glossary with some hundreds entries. Not so nice for a professional glossarxy. A very prominent Dutch user told me that he as more than one million entries.

But indeed we have disgressed. The starting point was that CT adds double terms to its glossary without having any feedback. The initial idea was to show both entries or to show an asterisc to indicate duplicates (= same source entry with – or without – pipe). FRom a virrtual point of view, CT needs to search all terms of a segment and to show them (in some cases with "Display longest match only", what might already cost some milliseconds compared to the standard setting). So how much processor capacity would it take to find possible duplicates and to mark them? I would prefer the asterisc as being less intrusive, as a signal “hey, you need to check your glossary". The double display could be more user-friendly. 

 

> The initial idea was to show both entries or to show an asterisc to indicate duplicates


This is distracting and takes your attention from the real task of translation. Instead, the current approach to treat double entries as one lets you simply ignore them and focus on what's essential.


 


Torsten: A very prominent Dutch user told me that he as more than one million entries.


However, the most prominent, smartest, most experienced, intelligent, nicest, and most modest Dutch translator counts far fewer entries in his TM for Fragments. And it makes sense:


"The statistics of English are astonishing. Of all the world's languages (which now number some 2,700), it is arguably the richest in vocabulary. The compendious Oxford English Dictionary lists about 500,000 words; and a further half-million technical and scientific terms remain uncatalogued. According to traditional estimates, neighboring German has a vocabulary of about 185,000 and French fewer than 100,000, including such Franglais as le snacque-barre and le hit-parade."


H.

Login to post a comment