Neural vs Statistical Machine Translation: Revisiting the Bangla-English Language Pair - Bangla Language Computing Research

Arid Hasan, Firoj Alam, Shammur Absar Chowdhury, Naira Khan: Neural vs Statistical Machine Translation: Revisiting the Bangla-English Language Pair. In: 2nd International Conference on Bangla Speech and Language Processing (ICBSLP), 2019.

Abstract

Machine translation systems facilitate our communication and access to information, taking down language barriers. It is a well-researched area of Natural Language Processing (NLP), especially for resource-rich languages (e.g., language pairs in Europarl Parallel corpus). Besides these languages, there is also work on other language pairs including the Bangla-English language pair. In the current study, we aim to revisit both Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) approaches using well-known, publicly available corpora for the Bangla-English (Bangla to English) language pair. We reported how the performance of the models differ based on the data and modeling techniques; consequently, we also compared the results obtained with Google's machine translation system. Our findings, across different corpora, indicates that NMT based approaches outperform SMT systems. Our results also outperform existing baselines by a large margin.

BibTeX (Download)

@inproceedings{arid2019MTb,
title = {Neural vs Statistical Machine Translation: Revisiting the Bangla-English Language Pair},
author = {Arid Hasan and Firoj Alam and Shammur Absar Chowdhury and Naira Khan},
url = {https://www.researchgate.net/publication/338223297_Neural_vs_Statistical_Machine_Translation_Revisiting_the_Bangla-English_Language_Pair},
year  = {2019},
date = {2019-01-01},
booktitle = {2nd International Conference on Bangla Speech and Language Processing (ICBSLP)},
abstract = {Machine translation systems facilitate our communication and access to information, taking down language barriers. It is a well-researched area of Natural Language Processing (NLP), especially for resource-rich languages (e.g., language pairs in Europarl Parallel corpus). Besides these languages, there is also work on other language pairs including the Bangla-English language pair. In the current study, we aim to revisit both Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) approaches using well-known, publicly available corpora for the Bangla-English (Bangla to English) language pair. We reported how the performance of the models differ based on the data and modeling techniques; consequently, we also compared the results obtained with Google's machine translation system. Our findings, across different corpora, indicates that NMT based approaches outperform SMT systems. Our results also outperform existing baselines by a large margin.},
keywords = {Bangla-to-English, English-to-Bangla, Machine Translation, Neural Machine Translation, Statistical Machine Translation},
pubstate = {published},
tppubtype = {inproceedings}
}