Bangla Natural Language Processing Tools, Scripts, Papers
Note: We listed the following tools and resources for the sake of their dissemination and accessibility. We neither claim their ownership nor taking any responsibility to their uses. Please use and cite the appropriate authors if you use them for your research work. If you use them with any of your software application please contact the authors OR use them at your own risk.
Typing Tools and Keyboards
Libraries
- Avro Phonetic Library(JavaScript, Go,C++)
- jQuery.IME - Supports Avro, Probhat, Inscript, National (BD)
- OpenBangla BengaliPhoneticParser.swift
- Rupantor: A very flexible Bengali phonetic parser/converter
- Bijoy2Unicode: A Python package for bidirectional conversion between Bijoy encoding and Unicode Bangla.
Corpora (Corpus) and Datasets
- Bangla Wikipedia Dump
- Bangla Corpus Builder(Aniruddha Adhikary)
- Indian Language Part-of-Speech Tagset: Bengali (LDC2010T16)
- IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b (LDC2016S08)
- BanglaLekha Corpus (Handwriting)(ULAB, Dhaka)
- Bangla word-list (Bangla Akademy Banan Abhidhan)(SNLTR)
- SHRUTI Bangla Speech Corpus (IIT, Kharagpur)
- Bengali Stopwords List
- Bangla TTS Speech Corpus (Google)
- Large Bengali ASR Dataset (Google)
- Ekush: Bangla Handwritten Characters Datasets(DIU, Dhaka)
- IsharaLipi: Bangla Sign Language Digits and Characters(DIU, Dhaka)
NLP Tools
- Bangla POS Tagger (HMM/CRF/ME Based)(IIT, Kharagpur)
- Bangla POS Tagger
- Bangla POS Tagger
- Bangla POS Tagger (XML Based)
- Morphological Analyzer
- Rule Based Chunker
- Statistical Chunker
- Bengali Dependency Parser
- Rule Based Bengali Stemmer (Debasis Ganguly)
- .Net Rule Based Bengali Stemmer
- Java Rule Based Bengali Stemmer
- PHP Rule Based Bengali Stemmer
- JavaScript Bengali Stemmer
- Java Bengali Stemmer
- Java Bengali Stemmer
- Bengali Word Embeddings
- Bengali WordNet
- Bengali Sentiment Analysis
- Keyword Extraction
Dictionary
OCR
Text to Speech (TTS)
- Katha: Bangla TTS(CRBLP, BRACU)
- Bengali-HTS (HMM-based Bangla TTS)