Start a new topic

REQ Mask sensitive data prior to sending segments to MT engines

After reading this:


I was wondering whether it's time to have an option to mask sensitive data in our source documents, before sending them to MT systems.

Either via regular expressions (all URLs, all e-mail addresses, all company names with GmbH etc., product names with CamelCase etc.) or via a list with names (iPhone, Coca Cola, VAG etc.).

Phone numbers, street names etc., I'm quite sure that the list will be long.

And ideally, there would be some harvesting mechanism to collect candidates for masking (though this never can be 100 % VDB proof):

  • D-12345 (postal codes)
  • Names of cities (and villages): the list is not endless
  • Streetnames (Andreas-Schubert-Straße, Andreas-Schubert-Str. etc., words that contain 'Straße', 'Weg' etc.)
  • Yet to come

And because the mechanism will never be VDB proof, perhaps a mode will be necessary that allows us to check the content of a new segment for sensitive data, mask any data on the fly (which will be added to the list) and only then send the segment to the MT engines?

Probably, non-translatables can be used for this.

NTF > on the fly generated token > MT > CafeTran > replace token with NTF // all transparent for the Translator who doesn’t even notice the masking in the tabbed pane.
Wikipedia: While obfuscation can make reading, writing, and reverse-engineering a program difficult and time-consuming, it will not necessarily make it impossible. // Nevertheless, a setting at the MT tab in the Preferences: “Mask NTFs before sending to MT”, could be useful.
Login to post a comment