indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2
-
Updated
Jan 2, 2024 - Jupyter Notebook
indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2
A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanagari script.
Vyākarana: A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages
Non-contextual : Word2Vec, FastText Contextual : BERT, RoBERTa, ELECTRA, CamemBERT, Distil-BERT, XLM-RoBERTa Analyzed embedding models, used the best one to build a Flask web app for Hindi NER and data collection from user feedback, deployed on AWS.
Lightweight on-device Hindi TTS for Android & iOS — fine-tuned on AI4Bharat IndicVoices, ONNX export, runs offline on CPU in real-time.
Contextualized Topic Modeling using Zero-Shot Learning on Indic Languages (IndicCTM)
HumanCTO's Indic Voice Pipeline — Download, transcribe, and translate audio/video in 12 Indian languages. 100% local, no API keys. Claude Code skills powered by OpenAI Whisper + vasista22 + AI4Bharat IndicWhisper fine-tuned models.
Bharat Multimodal EO AI - ISRO Satellite Vision + Indic NLP for Disaster Management, Agriculture & Climate Monitoring
KPT: Kannada Pre-trained Transformer
Unified API for Indian language AI — Speech-to-Text, Translation, TTS, Language ID & NLU for 22 languages. Powered by Whisper, IndicTrans2, Parler-TTS.
🇮🇳 A curated list of public APIs for Indian language processing — Translation, ASR, TTS, NLP, OCR and more.
Python toolkit to decode legacy Hindi font-encoded PDFs (KrutiDev, Chanakya, DevLys) into Unicode Devanagari. Built for Hindi PDF & govt document ingestion pipelines.
Classical Telugu prosody (chandassu) identifier — detects and classifies traditional poetic meters using NLP
First practical Python implementation of Devanagari-aware Hindi text readability scoring. Pearson r=0.81 · 49-sentence corpus · 3 original formulas · Zero dependencies
Multi-agent RAG system for high-quality Telugu story generation — Planner → Drafter → Critic loop
BhojRAG is a research-grade hybrid Retrieval-Augmented Generation system for unstandardized low-resource Indic languages, designed to improve retrieval and grounded generation for Bhojpuri using character n-gram BM25, fine-tuned MuRIL embeddings, and Reciprocal Rank Fusion.
Ultra-Fast Indic Text-to-Speech Engine with Zero-Shot Voice Cloning
A production-ready, frugal, sovereign AI system that orchestrates India's open-source language models to achieve state-of-the-art reasoning on consumer hardware through Test-Time Compute (TTC) and Cognitive Serialization.
Add a description, image, and links to the indic-nlp topic page so that developers can more easily learn about it.
To associate your repository with the indic-nlp topic, visit your repo's landing page and select "manage topics."