Paper Accepted at ICCIT 2016 - Bangla Language Computing Research

Paper Accepted at ICCIT 2016

ICCIT 2016
25th October 2016

Accepted a paper ICCIT-2016 on Bidirectional LSTMs – CRFs Networks for Bangla POS Tagging by Firoj Alam, Shammur Absar Chowdhury and Sheak Rashed Haider Noori.

Part-of-speech (POS) information is one of the fundamental components in the natural language processing pipeline, which helps in extracting higher-level information such as named entities, discourse, and syntactic structure of a sentence. For some languages, such as English, Dutch, and Chinese, it is considered as a solved problem due to the higher accuracy (97%) of the predicted system. Significant efforts have been made for such languages in terms of making the data publicly accessible and also organizing evaluation campaigns. Compared to that there are very fewer efforts for Bangla (ethnonym: Bangla; exonym: Bengali). In this paper, we present a knowledge poor approach for POS tagging, which we evaluated using publicly accessible dataset from LDC. The motivation of our approach is that we did not want to rely on any existing resources such as lexicon or named entity recognizer for designing the system as they are not publicly available and difficult to develop. We have not used any handcrafted features, rather we employed distributed representations of word and characters. We designed the system using Long Short Term Memory (LSTM) neural networks followed by Conditional Random Fields (CRFs) for designing the model with an inclusion of pre-trained word embedded model. We obtained promising results with an accuracy of 86.0%.
Paper Link


title={Bidirectional lstms—crfs networks for bangla pos tagging},
author={Alam, Firoj and Chowdhury, Shammur Absar and Noori, Sheak Rashed Haider},
booktitle={2016 19th International Conference on Computer and Information Technology (ICCIT)},