Today, we are super excited to announce the release of Neural dictionary; a significant translation quality improvement to our platform. In this blog post, we will explore the neural dictionary feature.
Neural dictionary is an extension to our dynamic dictionary and phrase dictionary features in Azure AI Translator. Both allow our users to customize the translation output by providing their own translations for specific terms or phrases. Our previous method used verbatim dictionary, which was an exact find-and-replace operation. Neural dictionary improves translation quality for sentences which may include one or more term translations by letting the machine translation model adjust both the term and the context to produce more fluent translation. At the same time, it preserves the high term translation accuracy.
The following English-German example demonstrates differences in translation outputs between both methods when a custom terminology translation is requested:
|Basic Knowledge of
|Grundkenntnisse der regelmäßiges Testen
|Grundkenntnisse des regelmäßigen Testens
The chart below illustrates the significant improvements the new feature brings on common publicly available terminology test sets in Automotive (https://aclanthology.org/2021.eacl-main.271), Health (https://aclanthology.org/2021.emnlp-main.477) and Covid-19 domains (https://aclanthology.org/2021.wmt-1.69) using our general translation models.
We also conducted a series of customer evaluations on Custom Translator platform and neural dictionary models. We measured the translation quality gains on customer's data between models with and without the Neural dictionary extension. Five customers participated, covering German, Spanish, and French in different business domains.
The chart below shows the average improvement of COMET in the education domain for English-German, English-Spanish, and English-French; for general models on the left, and for customized models on the right. BLEU color bars represent general translation quality without neural dictionary and ORANGE color bars represent translation quality using neural dictionary. These are overall average improvements on the entire test sets. For segments including one or more customer's dictionary entries (between 19% and 63%), the improvement is as high as +6.3 to +12.9 COMET points.
- Currently available (as of December 6, 2023): Chinese simplified, French, German, Italian, Japanese, Korean, Polish, Russian, Spanish and Swedish – to and from English.
- We are adding more in the future. For updates, refer to Custom Translator release notes.
How neural dictionary works
Neural dictionary does not employ the exact find-and-replace operation when handling custom terminology translation. Instead, it translates terms or phrases from the dictionary in a way that fits best the entire context. This means that the term can be inflected or have different casing, or that the surrounding words can be adjusted, producing a more fluent and coherent translation.
Let's say, for example, we have the following input sentence in English and its translation into Polish without any dictionary phrases is as follows:
|We need a fast solution that will be understandable.
|Potrzebujemy szybkiego rozwiązania, które będzie zrozumiałe.
If you want to make sure that “solution” is translated as “alternatywa” (“an alternative” in English), you can add a dynamic dictionary annotation to achieve that:
|We need a fast
|Potrzebujemy szybkiego alternatywa, który będzie zrozumiały.
|Potrzebujemy szybkiej alternatywy, która będzie zrozumiała.
The output produced by the previous method is not fluent as grammatical gender consistency is violated. The neural dictionary produces fluent output by a) inflecting the requested replacement and b) changing the surrounding words where needed. It can also change the casing in some cases, as in the following example:
|akcje tej firmy jest tani.
|Akcje tej firmy są tanie.
Neural dictionary expects that the requested translation of a term is provided in its base grammatical form. Multi-word terms are also supported and should be provided as noun phrases, i.e., words should not be lemmatized independently (for example, “Estonian parliamentary election” will be better than “Estonia parliament election”).
How to enable neural dictionary
For all supported languages listed above, neural dictionary is immediately available for all customers using Custom Translator platform with phrase dictionaries. Full (or dictionary only) custom model retraining is required to enable neural dictionary.
- If you want to ensure that the phrase dictionary entry is used more often when working with neural dictionary, you may consider adding the phrase entry with the source part in various forms. For example, next to “solution _ alternatywa”, you may want to add the following entries as well: “Solution _ alternatywa”, “solutions _ alternatywy”, “Solutions _ alternatywy”.
- If the goal is to ensure that a specific word or phrase is copied “as is” from the input text to the output translation when using phrase dictionary, consider enforcing verbatim dictionary as it may be more consistent.
- Avoid adding translations of common or frequent words or phrases to the phrase dictionary.
To learn more about Custom Translator and how it can help your business thrive in the global marketplace, start with the Custom Translator beginner's guide.
What you can do with Microsoft Custom Translator
Use Microsoft Custom Translator with your translation solutions to help globalize your business and improve customer interactions.