Paper accepted at ICCIT 2017 - Bangla Language Computing Research

Paper accepted at ICCIT 2017

ICCIT 2017
25th October 2017

Accepted a paper ICCIT-2017 on Bangla Grapheme to Phoneme Conversion Using Conditional Random Fields by Shammur Absar Chowdhury, Firoj Alam, Naira Khan, and Sheak Rashed Haider Noori.

Integrated with handheld devices, toys, KIOSKs, and call centers, Text to Speech (TTS) and Speech Recognition (SR) have become widely used applications in everyday life. One of the core components of said applications is Grapheme to Phoneme (G2P) conversion. The task at hand is the mapping of the written form to the spoken form, i.e. mapping one sequence to another. In Natural Language Processing (NLP), it is typically referred to as a sequence to sequence labeling task. The task however, is a language dependent one and has primarily been implemented for English and similar resource-rich languages. In comparison, very little has been done for digitally under-resourced languages such as Bangla (ethnonym: Bangla; exonym: Bengali). The current state-of-the-art Bangla Grapheme to Phoneme conversion is limited to rule-based and lexicon based approaches, the development of which requires a significant contribution of linguistic experts. In this paper, we propose a data-driven machine learning approach for Bangla G2P conversion. We evaluate the existing rule based approaches and design a machine learning model using Conditional Ran-dom Fields (CRFs). To train the machine learning models we have only used character level contextual features due to the fact that extracting hand crafted features requires specialized knowledge. We have evaluated the systems using two publicly available datasets. We have obtained promising results with a phoneme error rate of 1.51% and 14.88% for CRBLP and Google pronunciation lexicons, respectively.
Paper Link


title={Bangla grapheme to phoneme conversion using conditional random fields},
author={Chowdhury, Shammur Absar and Alam, Firoj and Khan, Naira and Noori, Sheak RH},
booktitle={2017 20th International Conference of Computer and Information Technology (ICCIT)},