Neural Machine Translation or NMT is known for its exceptionally high-quality translation services. Being the latest in the line of machine translation systems, NMT definitely proves its superiority and is poised to revolutionise the way the translation industry currently functions.
However, like all systems, NMT requires ongoing feedback and scores to ensure the continued production of high-quality results. The most common evaluation method we use for NMT today is BLEU (Bilingual Evaluation Understudy). However, BLEU is an outdated method that has also been applied to some of the older machine translation systems. Though it made sense back then, NMT is simply too advanced for BLEU.
Quality translation systems need quality evaluation methods
NMT is in dire need of a better evaluation approach. It’s important because NMT is gradually becoming the industry standard and changing the way translation processes are carried out. Like any piece of majorly disruptive technology, even machine translation systems can have an impact on technically trained staff, sales, marketing and project management.
The “outdated” BLEU system became the standard mainly due to prevalence. Everybody went with it because it worked with the previous machine translation system. However, with NMT, the developments (especially in design) have been significant, rendering BLEU completely ineffective at quantifying the quality of output.
BLEU is what the industry refers to as an “n-gram-based metric system.” Such systems are useless when it comes to assessing the capabilities of NMT over other machine translation systems, such as SMT or RBMT. Current research shows that NMT, despite possessing greater capabilities over earlier machine translation systems, manages to receive only two BLEU points more.
You see, NMT is a character-level translation system and that means we need something that works at this level. The ChrF evaluation approach, proposed by Maja Popovic, is one such approach.
However, as the days progress, we might actually witness the standardisation of NMT quality assurance becoming more fragmented, similar to what we currently see in the demand for NMT. Practitioners will soon begin to come up with proprietary evaluation methods that are specific and relevant to their own needs. For instance, we may begin to see metrics based on named entities, machine learning and custom QA systems. In fact, multiple evaluation methods may be used in combination.
NMT is the new “state-of-the-art,” and eventually, evaluation systems designed to handle the paradigm shift will show up.