Start a new topic

How to "simplify" a glossary

Hi all,
I wondered if there is a way to reduce the source text entries in my glossary, which I feel are quite redundant at the moment. I suppose there is a way to simplify things!

Let's say I want to add to my glossary a verb like "make a habit". So far I did something like this, i.e. adding all possible variants of the verb "to make"

image


Is there a way to insert just one term instead? I tried to have a look at the online resources, but I am still a bit puzzled - should I use an asterisk, for instance, inserting just "ma* a habit"? And in this case, will CT recognise automatically all the possible endings (make, makes, made, making)?

Thanks for your help! Any suggestion to simplify my glossary is more than welcome :)

> @Igor, I am not sure I am getting their meaning - can you please provide me with an example?


To keep things simple, just stick to the following:


1. Just add pipe (|) before those special expressions in your normal glossary.


2. To catch various "make a habit" forms:


Add this entry to your glossary (on the source language side):


|ma.+? a habit


It means that:


'ma' is followed by:


any character - which is what dot . means there

appearing one or more times - which is what +? means there


followed by:


 ' a habit'


In other words, to catch various forms of 'make a habit', your new entry (|ma.+? a habit) means:


'ma' followed by any character appearing one or more times (made, making, makes) followed by ' a habit'


Of course, you need to put the translation on the target language side in the glossary.


I hope it is clearer now.


 



1 person likes this

Not yet, will do.

Thanks!

Since stemming will introduce vagueness and also will lose important information (at least with languages that have grammatical cases), I decided to add alternative forms via a macro. Since you're on Windows, you can use AutoHotkey to create such a macro for your source language.


It's not as complicated as it sounds.


If it's really true what DeepL suggests, you can use only one target language term: fare l'abitudine. And it that case you can indeed write all source terms in one line.


If not, you'll have to write them in different lines:


to go to school

to visit school

to finish school

to leave school


andare a scuola

visitare la scuola

finire la scuola

lasciare la scuola


The simplest form of the macro would copy the first source term, add a semicolon after it and paste the source term. You can then alter the second source term.


A more sophisticated approach would recognise the verb and add source terms with conjugated verbs: make, makes, making.


You can add the conjugated forms from your knowledge of the source language, or you can harvest them:


image


Gives:


he had been making

he had made

he has been making

he has made

he is making

he made

he makes

he was making

he will be making

he will have been making

he will have made

he will make

he would be making

he would have been making

he would have made

he would make

I am making

I had been making

I had made

I have been making

I have made

I make

I was making

I will/shall be making

I will/shall have been making

I will/shall have made

I will/shall make

I would/should be making

I would/should have been making

I would/should have made

I would/should make

I made

they are making

they had been making

they had made

they have been making

they have made

they made

they make

they were making

they will be making

they will have been making

they will have made

they will make

they would be making

they would have been making

they would have made

they would make

we are making

we had been making

we had made

we have been making

we have made

we made

we make

we were making

we will/shall be making

we will/shall have been making

we will/shall have made

we will/shall make

we would/should be making

we would/should have been making

we would/should have made

we would/should make

you are making

you are making

you had been making

you had been making

you had made

you had made

you have been making

you have been making

you have made

you have made

you made

you made

you make

you make

you were making

you were making

you will be making

you will be making

you will have been making

you will have been making

you will have made

you will have made

you will make

you will make

you would be making

you would be making

you would have been making

you would have been making

you would have made

you would have made

you would make

you would make


Gives:


am

are

be

been

had

has

have

he

I

is

made

make

makes

making

shall

should

they

was

we

were

will

would

you


Gives:


made

make

makes

making


The extraction of the conjugated verb forms can be automated too (quite simple to achieve).

Demo:


image


Thanks for your detailed replies!


My glossary already features a single target term in Italian, and I don't mind if I have to edit it according to the source text - I use the glossary only to get a suggestion. In other words, I don't need a 1:1 correspondence between single source variations and target terms, as long as I get a match from the glossary.


I just wanted to understand if there is a way to replace the lengthy, multiple entries "make a habit;makes a habit;made a habit;making a habit " with a simplified expression taking into account different suffixes/endings:


image


Is there a suitable solution in this case?

Many thanks!

You can use Hunspell to do the stemming. You’ll have to activate that setting via the Preferences.
Look up word stems

This way...?


image

 

image


Is the option "trim new term end" what I am looking for?


Thank you.

hun.jpg
(155 KB)

Hello Elisa,


> Is there a way to insert just one term instead?


You might just use the following regular expression:


|ma.*? a habit


The pipe character (|) at the start indicates this is a special (regular) expression.



This will be even more accurate:


|ma.+? a habit


.(dot) means any character.

*? = zero or more times.

+? = one or more times.

And if you add the expression to a dedicated regex glossary, you can omit the pipe.

*? = zero or more times.

+? = one or more times.

 

@Igor, I am not sure I am getting their meaning - can you please provide me with an example?


--


And if you add the expression to a dedicated regex glossary, you can omit the pipe.


@alwayslockyourbike does this imply, then, that I can use expressions starting with "|" inside my regular glossary (i.e. this glossary will feature both "simple" terms and terms containing regular expressions)?


--


Thank you both, guys!

You can test here: https://www.freeformatter.com/java-regex-tester.html leave the starting pipe away here
Login to post a comment