Text Corpus Development


For NLP research and development, there is a need to have a large text corpus. It can facilitate conducting linguistic research and getting linguistic insights. In addition, it is important for developing NLP applications.


The idea is to collect text corpus from different sources, which include online newspaper, social media, government websites, blogs. This will help in conducting basic linguistic research and developing NLP applications. In addition to developing the corpus, we also aim to develop tools that can facilitate domain experts. The functionality of the tools include finding word frequency list, bi-gram, tri-gram analysis, letter frequency and so on.

Shadin Bangla Corpus(Bangla Text Corpus Development)