Open-source Indian politician data platform โ powered by AI enrichment
Browse, search, and explore data on Indian MPs and MLAs. Enrich politician profiles automatically using LLM-based agents.
Getting Started ยท Contributing with AI ยท Daily agent schedule ยท API Reference ยท Project Structure
| Feature | Description | |
|---|---|---|
| ๐ | Search & Browse | Look up MPs and MLAs by name, state, constituency, or party |
| ๐ค | AI Enrichment | Automatically fill education, family, criminal records, and more using LLMs |
| ๐ | Multi-Model Failover | Gemini โ OpenAI โ Perplexity with per-model cooldown on rate limits |
| ๐๏ธ | JSON-First Data | Source of truth lives in version-controlled JSON files |
| ๐ง | Vector Search | Semantic Q&A โ in progress (learning doc) |
| ๐ | Google OAuth | User accounts via NextAuth with backend sync |
| ๐ | Stats Dashboard | Party breakdown, state coverage, and enrichment progress |
|
Backend |
Frontend |
|
AI / LLM |
Infrastructure |
| Tool | Version |
|---|---|
| Python | 3.11+ |
| Node.js | 18+ |
| Docker | Latest (optional, for Postgres) |
git clone https://github.com/<your-username>/Rajniti.git
cd Rajniti
# Backend (runs via Python venv)
make install # creates venv + pip install -r requirements.txt
cp .env.example .env # configure your environment
. venv/bin/activate # activate the virtual environment
# Frontend
cd frontend && npm installBackend commands (make run, make test, db and lint targets) use the project virtualenv (venv/) automatically. To run Python scripts by hand, either use those Make targets or activate the venv first: source venv/bin/activate (or make venv for an interactive shell with venv active).
Copy .env.example and fill in the required values:
# Backend โ .env
FLASK_ENV=development
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rajniti # optional
GEMINI_API_KEY=your-key-here # free tier โ only key you need to get started
# Frontend โ frontend/.env
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=your-secret
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
NEXT_PUBLIC_API_URL=http://localhost:8000# Option A: Docker (backend + Postgres)
make dev
# Option B: Run backend directly (via venv)
make run # starts Flask API on :8000 (uses project venv)
# Frontend (separate terminal)
make frontend # starts Next.js on :3000To run other Python commands in the project venv, use make venv to open a shell with the venv activated, or run source venv/bin/activate in your terminal first.
Open http://localhost:8000/api/v1/health โ you should see a healthy response.
This is the easiest way to contribute. Run AI agents locally with your own API keys, and open a PR with enriched politician data.
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ JSON Data โโโโโโถโ LLM Agents โโโโโโถโ Enriched โโโโโโถโ Open a PR โ
โ (mp/mla) โ โ (Gemini/GPT) โ โ JSON Data โ โ โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
The enrichment pipeline reads politicians from JSON, queries LLMs for missing details (education, family, criminal records, etc.), and writes the results back. A local SQLite cache prevents re-processing.
1. Fork & set up
git clone <your-fork-url>
cd Rajniti
git checkout -b enrich/<scope> # e.g. enrich/mp-education
make install
cp .env.example .env # add your API key(s)2. Get an API key (at least one)
| Provider | How to Get a Key | Env Variable | Cost |
|---|---|---|---|
| Gemini | Google AI Studio | GEMINI_API_KEY |
Free tier (rate-limited) |
| OpenAI | platform.openai.com | OPENAI_API_KEY |
Paid |
| Perplexity | perplexity.ai | PERPLEXITY_API_KEY |
Paid |
Fastest setup: Get a free Gemini key from Google AI Studio, paste it as
GEMINI_API_KEYin your.env, and you're ready to run agents โ no paid key needed.
Models fail over automatically (Gemini โ OpenAI โ Perplexity). Order is configured in app/config/agent_config.py.
3. Run the agent
# Run the agent for all politicians
python3 scripts/run_politician_agent.py
# Test with a small batch first
python3 scripts/run_politician_agent.py --type MP --limit 3 --log-level INFO
# Run for all MPs
python3 scripts/run_politician_agent.py --type MP --log-level INFO
# Run for all MLAs
python3 scripts/run_politician_agent.py --type MLA --log-level INFO
# Target a single politician
python3 scripts/run_politician_agent.py --id "<POLITICIAN_ID>"
# Force re-run (ignore cache)
python3 scripts/run_politician_agent.py --type MP --force4. Add MLAs for a new state
python3 scripts/fetch_mlas.py --state "Andhra Pradesh" --log-level INFO5. Open a PR
git add app/data/mp.json app/data/mla.json
git commit -m "Enrich MP education data"
git push -u origin enrich/<scope>Then open a Pull Request. Include: the state/scope, number of records, and how you tested.
If you keep the repo on your laptop and prefer not to start the enrichment agent by hand every day, you can run it on a daily schedule (example: 8:00 in your local time). The same command updates app/data/mp.json and app/data/mla.json when you omit --type โ it processes every politician who still needs enrichment (respecting the local cache).
Important: Scheduled jobs must run with the project root as the working directory so python-dotenv can load the repoโs .env (API keys). Use absolute paths in scripts and in cron/Task Scheduler.
1. One wrapper script (Unix โ macOS and Linux)
Save as something like ~/bin/rajniti-daily-enrich.sh, edit REPO_ROOT, then make it executable (chmod +x ~/bin/rajniti-daily-enrich.sh):
#!/usr/bin/env bash
set -euo pipefail
REPO_ROOT="/absolute/path/to/Rajniti" # <-- change this
LOG_DIR="${HOME}/logs"
mkdir -p "${LOG_DIR}"
cd "${REPO_ROOT}"
# Assumes you use the project venv from `make install`
# shellcheck source=/dev/null
. "${REPO_ROOT}/venv/bin/activate"
# One run: all MPs and MLAs that still need work (omit --type)
python scripts/run_politician_agent.py --log-level INFO \
>> "${LOG_DIR}/rajniti-agent.log" 2>&1Run immediately (any OS): open a terminal, cd to the repo, activate the venv, and run the same python scripts/run_politician_agent.py --log-level INFO line (or use the commands in 3. Run the agent above). No need to wait for the next 8:00 run.
2. macOS โ crontab
Edit your user crontab:
crontab -eAdd one line to run every day at 08:00 local time (adjust the script path if needed):
0 8 * * * /Users/YOUR_USER/bin/rajniti-daily-enrich.sh- List your crontab:
crontab -l - Remove all cron entries:
crontab -r(use with care) - Change time: edit the five time fields;
0 8 * * *= minute0, hour8, every day. Seeman 5 crontabon your system. - Sleep / closed lid:
crononly runs while the Mac is awake at the scheduled time. If you often miss the window, run the script manually when youโre back, or considerlaunchd(~/Library/LaunchAgents/) withStartCalendarIntervalfor a user agent that behaves similarly (still requires the machine to be awake, unless you use a always-on machine or a remote runner).
3. Linux โ crontab
Same idea as macOS:
crontab -eExample (8:00 daily):
0 8 * * * /home/YOUR_USER/bin/rajniti-daily-enrich.shEnsure the shebang script is executable and that cron has access to your environment if you rely on anything outside the script (the wrapper above avoids that by using absolute paths and cd).
4. Windows โ Task Scheduler
a) Create C:\Users\YOUR_USER\bin\rajniti-daily-enrich.bat (adjust paths):
@echo off
setlocal
cd /d C:\absolute\path\to\Rajniti
call venv\Scripts\activate.bat
python scripts\run_politician_agent.py --log-level INFO >> "%USERPROFILE%\logs\rajniti-agent.log" 2>&1Create the log folder once: mkdir %USERPROFILE%\logs
b) Open Task Scheduler โ Create Taskโฆ (not โCreate Basic Taskโ if you want full control).
- General: Run only when user is logged on (typical for a laptop), or โRun whether user is logged on or notโ if you need headless runs (may require stored password).
- Triggers: Newโฆ โ Daily โ start time 8:00:00 AM (local time).
- Actions: Newโฆ โ Start a program โ Program/script:
C:\Users\YOUR_USER\bin\rajniti-daily-enrich.bat(orcmd.exewith arguments/c "C:\โฆ\rajniti-daily-enrich.bat"if you prefer).
Run immediately: In Task Scheduler, right-click the task โ Run. Or from cmd.exe: schtasks /Run /TN "YourTaskName" (use the exact task name you created).
Change or remove: Task Scheduler Library โ select the task โ Properties (edit triggers/times) or Delete.
5. Updating the schedule later
| Platform | View | Edit | Remove |
|---|---|---|---|
| macOS / Linux | crontab -l |
crontab -e |
Delete the line in the editor, or crontab -r to clear everything |
| Windows | Task Scheduler โ your task | Properties โ Triggers / Actions | Delete the task |
After any change to the wrapper script path or repo location, update the scheduled command to match.
These rules are non-negotiable for all PRs.
| Rule | Details |
|---|---|
| No secrets | Never commit .env or API keys |
| No cache files | app/database/cache.db is local-only |
| Data PRs only touch JSON | Your PR should update app/data/mp.json and/or app/data/mla.json |
| Tests must pass | Run make test before pushing |
| Review your diff | Ensure only intended changes are included |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/politicians |
List politicians (filter by type) |
GET |
/api/v1/politicians/search?q= |
Search by name |
GET |
/api/v1/politicians/<id> |
Get a single politician |
GET |
/api/v1/politicians/state/<state> |
Filter by state |
GET |
/api/v1/politicians/party/<party> |
Filter by party |
GET |
/api/v1/stats |
Summary statistics |
GET |
/api/v1/states |
List all states |
GET |
/api/v1/parties |
List all parties |
POST |
/api/v1/questions/ask |
Ask a question (501 until vector store is reimplemented; see docs/VECTOR_DBS.md) |
GET |
/api/v1/health |
Health check |
Want to enrich a new field (e.g., criminal records, social media)?
- Add a prompt builder in
app/prompts/politician_prompts.py - Create a process class in
app/agents/politician_agent.py - Register it in
PoliticianAgent.__init__by appending toself.processes
The architecture is designed to be extensible โ each enrichment field is an independent process.
The backend uses PostgreSQL via SQLAlchemy + Alembic. Migrations run automatically on server startup (via alembic upgrade head).
Option A โ Local Docker Postgres (for development):
# .env
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rajniti
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=rajnitimake dev # starts Postgres + API via Docker ComposeIf running the API outside Docker (venv), use localhost in the URL. If running inside Docker, use postgres (the compose service name).
Option B โ Supabase (for staging/production):
# .env โ use the session-mode pooler URL (port 5432), NOT the direct URL
DATABASE_URL=postgresql://postgres.PROJECT_REF:PASSWORD@aws-0-REGION.pooler.supabase.com:5432/postgresThe direct Supabase host (
db.*.supabase.co) is IPv6-only and will fail in Docker. Always use the session-mode pooler URL from your Supabase dashboard (Settings > Database > Connection string > Session mode).
- On server startup:
alembic upgrade headruns automatically, applying any pending migrations. metadata.create_allruns as a fallback for brand-new databases with no tables at all.
To disable auto-migration, set SKIP_DB_AUTO_MIGRATE=1 in .env.
After editing a model file (e.g. adding a column to app/database/models/user.py):
# 1. Generate a migration from the model diff
python scripts/db.py autogenerate -m "add column_name to users"
# 2. Review the generated file in alembic/versions/ โ remove anything unwanted
# (Alembic may detect tables in the DB that aren't in your models)
# 3. Apply it
python scripts/db.py migrateOr equivalently using Alembic directly:
alembic revision --autogenerate -m "add column_name to users"
alembic upgrade headThe next time the server starts, it will apply the migration automatically for all environments (local, GCP, etc.).
| Command | What it does |
|---|---|
python scripts/db.py migrate |
Run alembic upgrade head |
python scripts/db.py autogenerate -m "msg" |
Generate migration from model diff |
python scripts/db.py init |
metadata.create_all (no Alembic) |
alembic history |
Show migration chain |
alembic current |
Show current revision in DB |
make test # all tests
make test-unit # unit tests only
make test-e2e # end-to-end tests
make coverage # tests + coverage report
make lint # backend + frontend linting
make format # auto-format with Black + isortIn GitHub Actions, backend checks run in one job, in order: lint (Black, isort, Flake8, mypy) โ unit tests โ integration tests โ E2E tests. See .github/workflows/ci.yml.
cd frontend
npm test # all Jest tests (unit + integration)
npm run test:unit # Jest unit tests only
npm run test:integration
npm run test:e2e # Playwright (needs dev server; see frontend README)Full frontend testing notes, E2E setup, and CI behavior: frontend/README.md.
In GitHub Actions, after ESLint and TypeScript, unit, integration, and E2E jobs run in parallel, then a production build runs if all pass.
Rajniti/
โโโ app/
โ โโโ agents/ # LLM-based enrichment agents
โ โโโ config/ # Agent & provider configuration
โ โโโ controllers/ # API request handlers
โ โโโ core/ # Utilities, logging, errors
โ โโโ data/ # mp.json, mla.json (source of truth)
โ โโโ database/ # Models, migrations, SQLite cache
โ โโโ prompts/ # LLM prompt builders
โ โโโ routes/ # Flask route definitions
โ โโโ schemas/ # Pydantic validation schemas
โ โโโ services/ # Business logic layer
โโโ frontend/
โ โโโ README.md # Frontend scripts, testing, CI notes
โ โโโ app/ # Next.js App Router pages
โ โโโ components/ # React components
โ โโโ data/ # Generated static data (contributors.json)
โ โโโ __tests__/e2e/ # Playwright E2E tests (browser)
โ โโโ hooks/ # Custom React hooks
โ โโโ lib/ # Shared utilities
โโโ scripts/ # CLI scripts (agent runner, DB, MLA fetcher)
โโโ tests/ # Unit, integration, and E2E tests
โโโ alembic/ # Database migrations
โโโ docker/ # Docker init scripts
โโโ .github/
โ โโโ workflows/ # CI/CD (lint, test, release)
โ โโโ PULL_REQUEST_TEMPLATE.md
โโโ Dockerfile
โโโ docker-compose.yml
โโโ Makefile
โโโ requirements.txt
โโโ pyproject.toml
Agents use a failover LLM client with automatic per-model cooldown:
- Models are tried top-to-bottom from
PROVIDER_CONFIGSinapp/config/agent_config.py - If a model hits a rate limit (429), it enters cooldown and the next model is used
- Cooldown is per-model โ
gemini-1.5-flashcooling doesn't blockgemini-2.0-flash - Only API keys go in
.env; model names and order are configured in code
make dev # Local Postgres + API (development)
make prod # API only, expects external Postgres (e.g. Supabase)
make stop # Stop all containers
make clean # Remove containers + volumes
make reset # Full reset (wipes data, fresh start)Contributors are highlighted on the website at /contributors.
How it works:
scripts/generate_contributors.pyfetches contributor data from the GitHub API and writesfrontend/data/contributors.json.- A GitHub Actions workflow (
.github/workflows/update_contributors.yml) runs weekly (Monday midnight UTC) and on manual dispatch to keep the file up to date. It only commits when the data has actually changed. - The frontend reads the static JSON at build time โ no runtime GitHub API calls.
Running locally:
# Generate/refresh contributors data (optional GITHUB_TOKEN for higher rate limits)
python scripts/generate_contributors.py
# With a token
GITHUB_TOKEN=ghp_... python scripts/generate_contributors.pyThis project is licensed under the MIT License.
Built with care for Indian democracy ๐ฎ๐ณ