Skip to content

jitenkr2030/BFMF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Strategic Partnership Program for India's AI Sovereignty

Empowering India's AI Future Through Collaboration

The Bharat Foundation Model Framework (BFMF) represents a landmark initiative in India's journey towards AI independence. As a strategic, community-driven open-source project, we are developing sovereign foundation models that are uniquely Indian - trained on our data, built with cutting-edge open technologies, and deployed on domestic infrastructure.

To accelerate this national mission, we invite leading institutions, technology companies, research organizations, and visionary individuals to join us as strategic partners in building India's AI future.

Partnership Opportunities

Category Strategic Value Partner Examples
Computing Infrastructure High-performance GPU/TPU clusters for model training and optimization ExoStack, Cloud Providers, HPC Centers
Data Partnerships Curated datasets spanning Indian languages, domains, and use-cases Research Institutions, Government Agencies
Research & Development Advanced model architectures and domain-specific innovations AI Institutes, Universities, R&D Labs
Strategic Investment Sustainable funding for compute, talent, and infrastructure Corporations, Investment Firms, Grants
Technical Ecosystem Engineering expertise, tools, and platform capabilities Technology Companies, Developer Communities

Partnership Benefits

Strategic Advantages

  • Priority Access: Early preview and integration rights for new Bharat models
  • Co-Innovation: Joint R&D opportunities and technology collaboration
  • Market Leadership: First-mover advantage in India's emerging AI ecosystem

Institutional Benefits

  • Brand Recognition: Featured placement on BFMF's partner ecosystem
  • Technical Support: Dedicated integration and optimization assistance
  • Research Publications: Co-authored papers and technical documentation

Current Strategic Requirements (Q4 2025)

Resource Scope Strategic Impact
Computing Infrastructure 16-32 NVIDIA A100/H100 GPUs Enable Bharat-Base (7B) model training
Storage Infrastructure 50TB+ NVMe/Cloud Storage Support dataset and model hosting
Financial Resources โ‚น10-20L Investment Scale compute and research operations
Research Partnerships 5-10 Technical Partners Drive model optimization and evaluation

Become a Strategic Partner

Join us in shaping India's AI future. Connect with our partnership team:

Our team will provide detailed information about partnership opportunities, technical requirements, and engagement frameworks.


"Every partnership strengthens India's path to AI sovereignty. Together, we're not just adopting AI - we're defining its future with Indian innovation and values."

Join us in building India's sovereign AI capabilities. The future of AI is being written in Bharat.

๐Ÿ‡ฎ๐Ÿ‡ณ Bharat Foundation Model Framework (BFMF)

India's Open-Source Ecosystem for Building, Training, and Deploying Foundation Models

License Python PyTorch Docs Community


๐Ÿงญ Vision

To empower India's AI independence by providing an open, modular, and scalable foundation model framework โ€” enabling developers, researchers, and institutions to build Indian-language, domain-specific, and locally governed AI systems.


โš™๏ธ Core Philosophy

  • ๐Ÿ‡ฎ๐Ÿ‡ณ Built for Bharat โ€“ Native support for Indian languages, datasets, and regional diversity.
  • ๐Ÿง  Foundation First โ€“ Focused on base model pretraining, fine-tuning, and adaptation.
  • ๐Ÿ”’ Sovereign AI โ€“ 100% self-hostable, privacy-first, and data-residency compliant.
  • ๐ŸŒ Open Collaboration โ€“ Interoperable with Hugging Face, OpenAI-style APIs, and ExoStack.
  • ๐Ÿงฉ Modular Stack โ€“ Each layer can work independently or as part of a complete pipeline.

๐Ÿ—๏ธ High-Level Architecture

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚                BharatFM Stack                 โ”‚
 โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
 โ”‚ 7. Governance & Registry (MLflow, Audit, ACL) โ”‚
 โ”‚ 6. Serving & Deployment (vLLM, ExoStack)      โ”‚
 โ”‚ 5. Evaluation & Benchmark (HELM, lm-eval)     โ”‚
 โ”‚ 4. Fine-tuning Interface (Axolotl, LoRA)      โ”‚
 โ”‚ 3. Training Engine (Deepspeed, Megatron)      โ”‚
 โ”‚ 2. Model Architectures (GLM, LLaMA, Mistral)  โ”‚
 โ”‚ 1. Data Layer (Indic Corpora, RedPajama)      โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

BFMF Architecture


๐Ÿงฉ Modular Components

Module Description Tools / Dependencies
bharat_data Prepares multilingual datasets, tokenizers, and cleaning pipelines Hugging Face Datasets, IndicNLP, Dolma
bharat_model Defines base model architectures (decoder-only, encoder-decoder, mixture-of-experts) PyTorch, Transformers, Megatron-LM
bharat_train Distributed pretraining and fine-tuning pipeline Deepspeed, Axolotl, FSDP
bharat_eval Evaluation and benchmarking suite HELM, lm-eval-harness, OpenCompass
bharat_deploy Serving layer using vLLM or ExoStack FastAPI, Triton, vLLM
bharat_registry Model registry, versioning, and experiment tracking MLflow, Hugging Face Hub
bharat_cli CLI toolkit for job scheduling, config management Typer, ExoCLI

๐Ÿง  Supported Model Families

Model Type Size Purpose
Bharat-Base Decoder-only 1.3B / 7B General-purpose pre-trained model
Bharat-Lite 1.3B On-device / low-resource
Bharat-MoE Mixture of Experts 12ร—7B Scalable modular architecture
Bharat-Gov Finetuned Governance, policy, public data
Bharat-Edu Finetuned Education, tutoring, content generation
Bharat-Lang Finetuned Multilingual & translation tasks

๐ŸŒ Integration with Other Frameworks

Layer Integrated With
Compute ExoStack
Training Axolotl, Deepspeed
Inference vLLM, ExoServe
Registry MLflow, Hugging Face Hub
Dataset IndicNLP, RedPajama, Dolma

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+
  • PyTorch 2.0+
  • CUDA 11.8+ (for GPU training)
  • 16GB+ RAM (for 1.3B model training)

Installation

# Clone the repository
git clone https://github.com/bharat-ai/bharat-fm.git
cd bharat-fm

# Install in development mode
pip install -e ".[dev]"

# Or install with specific extras
pip install -e ".[train,eval,deploy]"

Basic Usage

1. Train Bharat-Lite (1.3B) for Hindi-English Chat

bharat train --model glm --dataset indic_mix --steps 50000

2. Fine-tune Bharat-Gov for Policy AI

bharat finetune --model bharat-base --lora --dataset govt_data

3. Deploy via ExoStack

bharat deploy --model bharat-gov --infra exostack --replicas 3

4. Evaluate Model Performance

bharat eval --model bharat-base --benchmark helm --languages hi,en

๐Ÿ“ Project Structure

bharat-fm/
โ”‚
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ requirements.txt
โ”‚
โ”œโ”€โ”€ bharat_data/
โ”‚   โ”œโ”€โ”€ datasets/
โ”‚   โ”œโ”€โ”€ tokenizers/
โ”‚   โ””โ”€โ”€ preprocess.py
โ”‚
โ”œโ”€โ”€ bharat_model/
โ”‚   โ”œโ”€โ”€ config/
โ”‚   โ”œโ”€โ”€ modeling_glm.py
โ”‚   โ”œโ”€โ”€ modeling_llama.py
โ”‚   โ””โ”€โ”€ modeling_moe.py
โ”‚
โ”œโ”€โ”€ bharat_train/
โ”‚   โ”œโ”€โ”€ trainer.py
โ”‚   โ”œโ”€โ”€ finetune.py
โ”‚   โ””โ”€โ”€ deepspeed_config.json
โ”‚
โ”œโ”€โ”€ bharat_eval/
โ”‚   โ”œโ”€โ”€ benchmarks/
โ”‚   โ””โ”€โ”€ evaluator.py
โ”‚
โ”œโ”€โ”€ bharat_deploy/
โ”‚   โ”œโ”€โ”€ api.py
โ”‚   โ””โ”€โ”€ inference_server.py
โ”‚
โ”œโ”€โ”€ bharat_registry/
โ”‚   โ””โ”€โ”€ mlflow_utils.py
โ”‚
โ”œโ”€โ”€ bharat_cli/
โ”‚   โ””โ”€โ”€ main.py
โ”‚
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ architecture.png
โ”‚   โ””โ”€โ”€ user_guide.md
โ”‚
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ hello_bharat.py
โ”‚   โ””โ”€โ”€ config_examples/
โ”‚
โ””โ”€โ”€ tests/
    โ”œโ”€โ”€ test_data.py
    โ”œโ”€โ”€ test_model.py
    โ””โ”€โ”€ test_train.py

๐Ÿ“˜ Example Use Cases

1. Multilingual Chatbot

from bharat_model import BharatLite
from bharat_deploy import InferenceServer

# Load pre-trained model
model = BharatLite.from_pretrained("bharat-ai/bharat-lite-1.3b")

# Create inference server
server = InferenceServer(model, host="0.0.0.0", port=8000)
server.start()

2. Custom Fine-tuning

from bharat_train import FineTuner
from bharat_data import IndicDataset

# Load dataset
dataset = IndicDataset("hindi_english_pairs")

# Initialize fine-tuner
finetuner = FineTuner(
    base_model="bharat-base",
    lora_rank=16,
    learning_rate=2e-5
)

# Fine-tune
finetuner.train(dataset, epochs=3)

3. Model Evaluation

from bharat_eval import Evaluator
from bharat_model import BharatBase

# Load model
model = BharatBase.from_pretrained("bharat-base")

# Initialize evaluator
evaluator = Evaluator(model)

# Run benchmarks
results = evaluator.evaluate(
    benchmarks=["helm", "lm-eval"],
    languages=["hi", "en", "bn"]
)

print(results)

๐Ÿ› ๏ธ Development

Setting up Development Environment

# Clone and install
git clone https://github.com/bharat-ai/bharat-fm.git
cd bharat-fm
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run tests
pytest tests/

# Run linting
ruff check bharat_/
black bharat_/

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿš€ Roadmap (Phase-wise)

Phase Goal Timeline
Phase 1 Base repo setup + data pipeline + model configs โœ… Nov 2025
Phase 2 Training engine + Axolotl integration Dec 2025
Phase 3 Inference + deployment (ExoStack integration) Jan 2026
Phase 4 Launch Bharat-Lite (1.3B multilingual) Mar 2026
Phase 5 Community datasets + fine-tuned variants Mid 2026

๐Ÿ›๏ธ Governance

Steering Committee

  • Technical Lead: AI Research Institute, India
  • Community Lead: Open Source India Foundation
  • Industry Lead: Bharat AI Consortium

Decision Making Process


๐Ÿ“„ License

This project is licensed under the Apache 2.0 / Bharat Open AI License (BOAL).

  • โœ… Commercial usage allowed
  • โœ… Academic usage allowed
  • โœ… Modification and distribution allowed
  • ๐Ÿ“ India-first attribution required

See LICENSE for the full license text.


๐Ÿค Community & Support


๐Ÿ™ Acknowledgments

  • Government of India - For supporting sovereign AI initiatives
  • Indian AI Research Community - For technical guidance and expertise
  • Open Source Community - For building the foundation we build upon
  • Hugging Face - For the amazing transformers ecosystem
  • ExoStack - For compute infrastructure integration

๐Ÿ“ˆ Citation

If you use BFMF in your research, please cite:

@software{bharat_fmf_2025,
  title={Bharat Foundation Model Framework: India's Open-Source Ecosystem for Sovereign AI},
  author={Bharat AI Team},
  year={2025},
  url={https://github.com/bharat-ai/bharat-fm},
  license={Apache 2.0 / BOAL}
}

๐Ÿ‡ฎ๐Ÿ‡ณ Made with โค๏ธ for Bharat's AI Independence

Back to top

About

India's Open-Source Ecosystem for Building, Training, and Deploying Foundation Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors