Translation Quality

Summary

The following histogram summarizes BLEU ranges for predictions in the test-set. Click on bars to see the translations with BLEU in corresponding interval.

Note: refined-4 is most likely the best, and graphs only start getting better towards that.

Even at lower BLEUs, there's successful capture of meaning using very different words, so BLEU may not necessarily correlate with quality.

Average BLEU:

Dataset:
Direction:

Individual Samples