The Bharat Foundation Model Framework (BFMF) represents a landmark initiative in India's journey towards AI independence. As a strategic, community-driven open-source project, we are developing sovereign foundation models that are uniquely Indian - trained on our data, built with cutting-edge open technologies, and deployed on domestic infrastructure.
To accelerate this national mission, we invite leading institutions, technology companies, research organizations, and visionary individuals to join us as strategic partners in building India's AI future.
| Category | Strategic Value | Partner Examples |
|---|---|---|
| Computing Infrastructure | High-performance GPU/TPU clusters for model training and optimization | ExoStack, Cloud Providers, HPC Centers |
| Data Partnerships | Curated datasets spanning Indian languages, domains, and use-cases | Research Institutions, Government Agencies |
| Research & Development | Advanced model architectures and domain-specific innovations | AI Institutes, Universities, R&D Labs |
| Strategic Investment | Sustainable funding for compute, talent, and infrastructure | Corporations, Investment Firms, Grants |
| Technical Ecosystem | Engineering expertise, tools, and platform capabilities | Technology Companies, Developer Communities |
- Priority Access: Early preview and integration rights for new Bharat models
- Co-Innovation: Joint R&D opportunities and technology collaboration
- Market Leadership: First-mover advantage in India's emerging AI ecosystem
- Brand Recognition: Featured placement on BFMF's partner ecosystem
- Technical Support: Dedicated integration and optimization assistance
- Research Publications: Co-authored papers and technical documentation
| Resource | Scope | Strategic Impact |
|---|---|---|
| Computing Infrastructure | 16-32 NVIDIA A100/H100 GPUs | Enable Bharat-Base (7B) model training |
| Storage Infrastructure | 50TB+ NVMe/Cloud Storage | Support dataset and model hosting |
| Financial Resources | โน10-20L Investment | Scale compute and research operations |
| Research Partnerships | 5-10 Technical Partners | Drive model optimization and evaluation |
Join us in shaping India's AI future. Connect with our partnership team:
- Email: partnerships@bharat-ai.org
- Strategic Discussions: Partner Inquiry Portal
- Partnership Portal: https://bharat-ai.org/partners (launching soon)
Our team will provide detailed information about partnership opportunities, technical requirements, and engagement frameworks.
"Every partnership strengthens India's path to AI sovereignty. Together, we're not just adopting AI - we're defining its future with Indian innovation and values."
Join us in building India's sovereign AI capabilities. The future of AI is being written in Bharat.
To empower India's AI independence by providing an open, modular, and scalable foundation model framework โ enabling developers, researchers, and institutions to build Indian-language, domain-specific, and locally governed AI systems.
- ๐ฎ๐ณ Built for Bharat โ Native support for Indian languages, datasets, and regional diversity.
- ๐ง Foundation First โ Focused on base model pretraining, fine-tuning, and adaptation.
- ๐ Sovereign AI โ 100% self-hostable, privacy-first, and data-residency compliant.
- ๐ Open Collaboration โ Interoperable with Hugging Face, OpenAI-style APIs, and ExoStack.
- ๐งฉ Modular Stack โ Each layer can work independently or as part of a complete pipeline.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BharatFM Stack โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 7. Governance & Registry (MLflow, Audit, ACL) โ
โ 6. Serving & Deployment (vLLM, ExoStack) โ
โ 5. Evaluation & Benchmark (HELM, lm-eval) โ
โ 4. Fine-tuning Interface (Axolotl, LoRA) โ
โ 3. Training Engine (Deepspeed, Megatron) โ
โ 2. Model Architectures (GLM, LLaMA, Mistral) โ
โ 1. Data Layer (Indic Corpora, RedPajama) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Module | Description | Tools / Dependencies |
|---|---|---|
| bharat_data | Prepares multilingual datasets, tokenizers, and cleaning pipelines | Hugging Face Datasets, IndicNLP, Dolma |
| bharat_model | Defines base model architectures (decoder-only, encoder-decoder, mixture-of-experts) | PyTorch, Transformers, Megatron-LM |
| bharat_train | Distributed pretraining and fine-tuning pipeline | Deepspeed, Axolotl, FSDP |
| bharat_eval | Evaluation and benchmarking suite | HELM, lm-eval-harness, OpenCompass |
| bharat_deploy | Serving layer using vLLM or ExoStack | FastAPI, Triton, vLLM |
| bharat_registry | Model registry, versioning, and experiment tracking | MLflow, Hugging Face Hub |
| bharat_cli | CLI toolkit for job scheduling, config management | Typer, ExoCLI |
| Model | Type | Size | Purpose |
|---|---|---|---|
| Bharat-Base | Decoder-only | 1.3B / 7B | General-purpose pre-trained model |
| Bharat-Lite | 1.3B | On-device / low-resource | |
| Bharat-MoE | Mixture of Experts | 12ร7B | Scalable modular architecture |
| Bharat-Gov | Finetuned | Governance, policy, public data | |
| Bharat-Edu | Finetuned | Education, tutoring, content generation | |
| Bharat-Lang | Finetuned | Multilingual & translation tasks |
| Layer | Integrated With |
|---|---|
| Compute | ExoStack |
| Training | Axolotl, Deepspeed |
| Inference | vLLM, ExoServe |
| Registry | MLflow, Hugging Face Hub |
| Dataset | IndicNLP, RedPajama, Dolma |
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+ (for GPU training)
- 16GB+ RAM (for 1.3B model training)
# Clone the repository
git clone https://github.com/bharat-ai/bharat-fm.git
cd bharat-fm
# Install in development mode
pip install -e ".[dev]"
# Or install with specific extras
pip install -e ".[train,eval,deploy]"bharat train --model glm --dataset indic_mix --steps 50000bharat finetune --model bharat-base --lora --dataset govt_databharat deploy --model bharat-gov --infra exostack --replicas 3bharat eval --model bharat-base --benchmark helm --languages hi,enbharat-fm/
โ
โโโ README.md
โโโ LICENSE
โโโ pyproject.toml
โโโ requirements.txt
โ
โโโ bharat_data/
โ โโโ datasets/
โ โโโ tokenizers/
โ โโโ preprocess.py
โ
โโโ bharat_model/
โ โโโ config/
โ โโโ modeling_glm.py
โ โโโ modeling_llama.py
โ โโโ modeling_moe.py
โ
โโโ bharat_train/
โ โโโ trainer.py
โ โโโ finetune.py
โ โโโ deepspeed_config.json
โ
โโโ bharat_eval/
โ โโโ benchmarks/
โ โโโ evaluator.py
โ
โโโ bharat_deploy/
โ โโโ api.py
โ โโโ inference_server.py
โ
โโโ bharat_registry/
โ โโโ mlflow_utils.py
โ
โโโ bharat_cli/
โ โโโ main.py
โ
โโโ docs/
โ โโโ architecture.png
โ โโโ user_guide.md
โ
โโโ examples/
โ โโโ hello_bharat.py
โ โโโ config_examples/
โ
โโโ tests/
โโโ test_data.py
โโโ test_model.py
โโโ test_train.py
from bharat_model import BharatLite
from bharat_deploy import InferenceServer
# Load pre-trained model
model = BharatLite.from_pretrained("bharat-ai/bharat-lite-1.3b")
# Create inference server
server = InferenceServer(model, host="0.0.0.0", port=8000)
server.start()from bharat_train import FineTuner
from bharat_data import IndicDataset
# Load dataset
dataset = IndicDataset("hindi_english_pairs")
# Initialize fine-tuner
finetuner = FineTuner(
base_model="bharat-base",
lora_rank=16,
learning_rate=2e-5
)
# Fine-tune
finetuner.train(dataset, epochs=3)from bharat_eval import Evaluator
from bharat_model import BharatBase
# Load model
model = BharatBase.from_pretrained("bharat-base")
# Initialize evaluator
evaluator = Evaluator(model)
# Run benchmarks
results = evaluator.evaluate(
benchmarks=["helm", "lm-eval"],
languages=["hi", "en", "bn"]
)
print(results)# Clone and install
git clone https://github.com/bharat-ai/bharat-fm.git
cd bharat-fm
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run tests
pytest tests/
# Run linting
ruff check bharat_/
black bharat_/We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
| Phase | Goal | Timeline |
|---|---|---|
| Phase 1 | Base repo setup + data pipeline + model configs | โ Nov 2025 |
| Phase 2 | Training engine + Axolotl integration | Dec 2025 |
| Phase 3 | Inference + deployment (ExoStack integration) | Jan 2026 |
| Phase 4 | Launch Bharat-Lite (1.3B multilingual) | Mar 2026 |
| Phase 5 | Community datasets + fine-tuned variants | Mid 2026 |
- Technical Lead: AI Research Institute, India
- Community Lead: Open Source India Foundation
- Industry Lead: Bharat AI Consortium
This project is licensed under the Apache 2.0 / Bharat Open AI License (BOAL).
- โ Commercial usage allowed
- โ Academic usage allowed
- โ Modification and distribution allowed
- ๐ India-first attribution required
See LICENSE for the full license text.
- GitHub Discussions: Join the conversation
- Discord Community: Join our Discord
- Documentation: Read the docs
- Issues: Report bugs or request features
- Government of India - For supporting sovereign AI initiatives
- Indian AI Research Community - For technical guidance and expertise
- Open Source Community - For building the foundation we build upon
- Hugging Face - For the amazing transformers ecosystem
- ExoStack - For compute infrastructure integration
If you use BFMF in your research, please cite:
@software{bharat_fmf_2025,
title={Bharat Foundation Model Framework: India's Open-Source Ecosystem for Sovereign AI},
author={Bharat AI Team},
year={2025},
url={https://github.com/bharat-ai/bharat-fm},
license={Apache 2.0 / BOAL}
}