After reading this:
I was wondering whether it's time to have an option to mask sensitive data in our source documents, before sending them to MT systems.
Either via regular expressions (all URLs, all e-mail addresses, all company names with GmbH etc., product names with CamelCase etc.) or via a list with names (iPhone, Coca Cola, VAG etc.).
Phone numbers, street names etc., I'm quite sure that the list will be long.
And ideally, there would be some harvesting mechanism to collect candidates for masking (though this never can be 100 % VDB proof):
And because the mechanism will never be VDB proof, perhaps a mode will be necessary that allows us to check the content of a new segment for sensitive data, mask any data on the fly (which will be added to the list) and only then send the segment to the MT engines?
Probably, non-translatables can be used for this.