🏛️ Rajniti

Open-source Indian politician data platform — powered by AI enrichment

Browse, search, and explore data on Indian MPs and MLAs. Enrich politician profiles automatically using LLM-based agents.

Getting Started · Contributing with AI · Daily agent schedule · API Reference · Project Structure

✨ Features

	Feature	Description
🔍	Search & Browse	Look up MPs and MLAs by name, state, constituency, or party
🤖	AI Enrichment	Automatically fill education, family, criminal records, and more using LLMs
🔄	Multi-Model Failover	Gemini → OpenAI → Perplexity with per-model cooldown on rate limits
🗃️	JSON-First Data	Source of truth lives in version-controlled JSON files
🧠	Vector Search	Semantic Q&A — in progress (learning doc)
🔐	Google OAuth	User accounts via NextAuth with backend sync
📊	Stats Dashboard	Party breakdown, state coverage, and enrichment progress

🏗️ Tech Stack

Backend

Frontend

AI / LLM

Infrastructure

🚀 Getting Started

Prerequisites

Tool	Version
Python	3.11+
Node.js	18+
Docker	Latest (optional, for Postgres)

1. Clone & Install

git clone https://github.com/<your-username>/Rajniti.git
cd Rajniti

# Backend (runs via Python venv)
make install            # creates venv + pip install -r requirements.txt
cp .env.example .env    # configure your environment
. venv/bin/activate     # activate the virtual environment

# Frontend
cd frontend && npm install

Backend commands (make run, make test, db and lint targets) use the project virtualenv (venv/) automatically. To run Python scripts by hand, either use those Make targets or activate the venv first: source venv/bin/activate (or make venv for an interactive shell with venv active).

2. Configure Environment

Copy .env.example and fill in the required values:

# Backend — .env
FLASK_ENV=development
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rajniti   # optional
GEMINI_API_KEY=your-key-here          # free tier — only key you need to get started

# Frontend — frontend/.env
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=your-secret
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
NEXT_PUBLIC_API_URL=http://localhost:8000

3. Run

# Option A: Docker (backend + Postgres)
make dev

# Option B: Run backend directly (via venv)
make run                # starts Flask API on :8000 (uses project venv)

# Frontend (separate terminal)
make frontend           # starts Next.js on :3000

To run other Python commands in the project venv, use make venv to open a shell with the venv activated, or run source venv/bin/activate in your terminal first.

4. Verify

Open http://localhost:8000/api/v1/health — you should see a healthy response.

🤖 Contributing with AI

This is the easiest way to contribute. Run AI agents locally with your own API keys, and open a PR with enriched politician data.

How It Works

┌──────────────┐     ┌───────────────┐     ┌──────────────┐     ┌──────────────┐
│  JSON Data   │────▶│  LLM Agents   │────▶│  Enriched    │────▶│  Open a PR   │
│  (mp/mla)    │     │  (Gemini/GPT) │     │  JSON Data   │     │              │
└──────────────┘     └───────────────┘     └──────────────┘     └──────────────┘

The enrichment pipeline reads politicians from JSON, queries LLMs for missing details (education, family, criminal records, etc.), and writes the results back. A local SQLite cache prevents re-processing.

Step-by-Step

1. Fork & set up

git clone <your-fork-url>
cd Rajniti
git checkout -b enrich/<scope>       # e.g. enrich/mp-education
make install
cp .env.example .env                 # add your API key(s)

2. Get an API key (at least one)

Provider	How to Get a Key	Env Variable	Cost
Gemini	Google AI Studio	`GEMINI_API_KEY`	Free tier (rate-limited)
OpenAI	platform.openai.com	`OPENAI_API_KEY`	Paid
Perplexity	perplexity.ai	`PERPLEXITY_API_KEY`	Paid

Fastest setup: Get a free Gemini key from Google AI Studio, paste it as GEMINI_API_KEY in your .env, and you're ready to run agents — no paid key needed.

Models fail over automatically (Gemini → OpenAI → Perplexity). Order is configured in app/config/agent_config.py.

3. Run the agent

# Run the agent for all politicians
python3 scripts/run_politician_agent.py

# Test with a small batch first
python3 scripts/run_politician_agent.py --type MP --limit 3 --log-level INFO

# Run for all MPs
python3 scripts/run_politician_agent.py --type MP --log-level INFO

# Run for all MLAs
python3 scripts/run_politician_agent.py --type MLA --log-level INFO

# Target a single politician
python3 scripts/run_politician_agent.py --id "<POLITICIAN_ID>"

# Force re-run (ignore cache)
python3 scripts/run_politician_agent.py --type MP --force

4. Add MLAs for a new state

python3 scripts/fetch_mlas.py --state "Andhra Pradesh" --log-level INFO

5. Open a PR

git add app/data/mp.json app/data/mla.json
git commit -m "Enrich MP education data"
git push -u origin enrich/<scope>

Then open a Pull Request. Include: the state/scope, number of records, and how you tested.

Optional: Daily agent schedule (macOS, Linux, Windows)

If you keep the repo on your laptop and prefer not to start the enrichment agent by hand every day, you can run it on a daily schedule (example: 8:00 in your local time). The same command updates app/data/mp.json and app/data/mla.json when you omit --type — it processes every politician who still needs enrichment (respecting the local cache).

Important: Scheduled jobs must run with the project root as the working directory so python-dotenv can load the repo’s .env (API keys). Use absolute paths in scripts and in cron/Task Scheduler.

1. One wrapper script (Unix — macOS and Linux)

Save as something like ~/bin/rajniti-daily-enrich.sh, edit REPO_ROOT, then make it executable (chmod +x ~/bin/rajniti-daily-enrich.sh):

#!/usr/bin/env bash
set -euo pipefail

REPO_ROOT="/absolute/path/to/Rajniti"   # <-- change this
LOG_DIR="${HOME}/logs"
mkdir -p "${LOG_DIR}"

cd "${REPO_ROOT}"
# Assumes you use the project venv from `make install`
# shellcheck source=/dev/null
. "${REPO_ROOT}/venv/bin/activate"

# One run: all MPs and MLAs that still need work (omit --type)
python scripts/run_politician_agent.py --log-level INFO \
  >> "${LOG_DIR}/rajniti-agent.log" 2>&1

Run immediately (any OS): open a terminal, cd to the repo, activate the venv, and run the same python scripts/run_politician_agent.py --log-level INFO line (or use the commands in 3. Run the agent above). No need to wait for the next 8:00 run.

2. macOS — crontab

Edit your user crontab:

crontab -e

Add one line to run every day at 08:00 local time (adjust the script path if needed):

0 8 * * * /Users/YOUR_USER/bin/rajniti-daily-enrich.sh

List your crontab: crontab -l
Remove all cron entries: crontab -r (use with care)
Change time: edit the five time fields; 0 8 * * * = minute 0, hour 8, every day. See man 5 crontab on your system.
Sleep / closed lid: cron only runs while the Mac is awake at the scheduled time. If you often miss the window, run the script manually when you’re back, or consider launchd (~/Library/LaunchAgents/) with StartCalendarInterval for a user agent that behaves similarly (still requires the machine to be awake, unless you use a always-on machine or a remote runner).

3. Linux — crontab

Same idea as macOS:

crontab -e

Example (8:00 daily):

0 8 * * * /home/YOUR_USER/bin/rajniti-daily-enrich.sh

Ensure the shebang script is executable and that cron has access to your environment if you rely on anything outside the script (the wrapper above avoids that by using absolute paths and cd).

4. Windows — Task Scheduler

a) Create C:\Users\YOUR_USER\bin\rajniti-daily-enrich.bat (adjust paths):

@echo off
setlocal
cd /d C:\absolute\path\to\Rajniti
call venv\Scripts\activate.bat
python scripts\run_politician_agent.py --log-level INFO >> "%USERPROFILE%\logs\rajniti-agent.log" 2>&1

Create the log folder once: mkdir %USERPROFILE%\logs

b) Open Task Scheduler → Create Task… (not “Create Basic Task” if you want full control).

General: Run only when user is logged on (typical for a laptop), or “Run whether user is logged on or not” if you need headless runs (may require stored password).
Triggers: New… → Daily → start time 8:00:00 AM (local time).
Actions: New… → Start a program → Program/script: C:\Users\YOUR_USER\bin\rajniti-daily-enrich.bat (or cmd.exe with arguments /c "C:\…\rajniti-daily-enrich.bat" if you prefer).

Run immediately: In Task Scheduler, right-click the task → Run. Or from cmd.exe: schtasks /Run /TN "YourTaskName" (use the exact task name you created).

Change or remove: Task Scheduler Library → select the task → Properties (edit triggers/times) or Delete.

5. Updating the schedule later

Platform	View	Edit	Remove
macOS / Linux	`crontab -l`	`crontab -e`	Delete the line in the editor, or `crontab -r` to clear everything
Windows	Task Scheduler → your task	Properties → Triggers / Actions	Delete the task

After any change to the wrapper script path or repo location, update the scheduled command to match.

🛡️ Contribution Rules

These rules are non-negotiable for all PRs.

Rule	Details
No secrets	Never commit `.env` or API keys
No cache files	`app/database/cache.db` is local-only
Data PRs only touch JSON	Your PR should update `app/data/mp.json` and/or `app/data/mla.json`
Tests must pass	Run `make test` before pushing
Review your diff	Ensure only intended changes are included

🔌 API Endpoints

Method	Endpoint	Description
`GET`	`/api/v1/politicians`	List politicians (filter by type)
`GET`	`/api/v1/politicians/search?q=`	Search by name
`GET`	`/api/v1/politicians/<id>`	Get a single politician
`GET`	`/api/v1/politicians/state/<state>`	Filter by state
`GET`	`/api/v1/politicians/party/<party>`	Filter by party
`GET`	`/api/v1/stats`	Summary statistics
`GET`	`/api/v1/states`	List all states
`GET`	`/api/v1/parties`	List all parties
`POST`	`/api/v1/questions/ask`	Ask a question (`501` until vector store is reimplemented; see docs/VECTOR_DBS.md)
`GET`	`/api/v1/health`	Health check

🧩 Adding a New Enrichment Process

Want to enrich a new field (e.g., criminal records, social media)?

Add a prompt builder in app/prompts/politician_prompts.py
Create a process class in app/agents/politician_agent.py
Register it in PoliticianAgent.__init__ by appending to self.processes

The architecture is designed to be extensible — each enrichment field is an independent process.

🗄️ Database & Migrations

The backend uses PostgreSQL via SQLAlchemy + Alembic. Migrations run automatically on server startup (via alembic upgrade head).

Database Setup

Option A — Local Docker Postgres (for development):

# .env
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rajniti
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=rajniti

make dev          # starts Postgres + API via Docker Compose

If running the API outside Docker (venv), use localhost in the URL. If running inside Docker, use postgres (the compose service name).

Option B — Supabase (for staging/production):

# .env — use the session-mode pooler URL (port 5432), NOT the direct URL
DATABASE_URL=postgresql://postgres.PROJECT_REF:PASSWORD@aws-0-REGION.pooler.supabase.com:5432/postgres

The direct Supabase host (db.*.supabase.co) is IPv6-only and will fail in Docker. Always use the session-mode pooler URL from your Supabase dashboard (Settings > Database > Connection string > Session mode).

How Migrations Work

On server startup: alembic upgrade head runs automatically, applying any pending migrations.
metadata.create_all runs as a fallback for brand-new databases with no tables at all.

To disable auto-migration, set SKIP_DB_AUTO_MIGRATE=1 in .env.

When You Change a Model

After editing a model file (e.g. adding a column to app/database/models/user.py):

# 1. Generate a migration from the model diff
python scripts/db.py autogenerate -m "add column_name to users"

# 2. Review the generated file in alembic/versions/ — remove anything unwanted
#    (Alembic may detect tables in the DB that aren't in your models)

# 3. Apply it
python scripts/db.py migrate

Or equivalently using Alembic directly:

alembic revision --autogenerate -m "add column_name to users"
alembic upgrade head

The next time the server starts, it will apply the migration automatically for all environments (local, GCP, etc.).

Migration Commands Reference

Command	What it does
`python scripts/db.py migrate`	Run `alembic upgrade head`
`python scripts/db.py autogenerate -m "msg"`	Generate migration from model diff
`python scripts/db.py init`	`metadata.create_all` (no Alembic)
`alembic history`	Show migration chain
`alembic current`	Show current revision in DB

🧪 Testing

Backend (Python)

make test              # all tests
make test-unit         # unit tests only
make test-e2e          # end-to-end tests
make coverage          # tests + coverage report
make lint              # backend + frontend linting
make format            # auto-format with Black + isort

In GitHub Actions, backend checks run in one job, in order: lint (Black, isort, Flake8, mypy) → unit tests → integration tests → E2E tests. See .github/workflows/ci.yml.

Frontend (Next.js)

cd frontend
npm test               # all Jest tests (unit + integration)
npm run test:unit      # Jest unit tests only
npm run test:integration
npm run test:e2e       # Playwright (needs dev server; see frontend README)

Full frontend testing notes, E2E setup, and CI behavior: frontend/README.md.

In GitHub Actions, after ESLint and TypeScript, unit, integration, and E2E jobs run in parallel, then a production build runs if all pass.

📂 Project Structure

Rajniti/
├── app/
│   ├── agents/            # LLM-based enrichment agents
│   ├── config/            # Agent & provider configuration
│   ├── controllers/       # API request handlers
│   ├── core/              # Utilities, logging, errors
│   ├── data/              # mp.json, mla.json (source of truth)
│   ├── database/          # Models, migrations, SQLite cache
│   ├── prompts/           # LLM prompt builders
│   ├── routes/            # Flask route definitions
│   ├── schemas/           # Pydantic validation schemas
│   └── services/          # Business logic layer
├── frontend/
│   ├── README.md          # Frontend scripts, testing, CI notes
│   ├── app/               # Next.js App Router pages
│   ├── components/        # React components
│   ├── data/              # Generated static data (contributors.json)
│   ├── __tests__/e2e/     # Playwright E2E tests (browser)
│   ├── hooks/             # Custom React hooks
│   └── lib/               # Shared utilities
├── scripts/               # CLI scripts (agent runner, DB, MLA fetcher)
├── tests/                 # Unit, integration, and E2E tests
├── alembic/               # Database migrations
├── docker/                # Docker init scripts
├── .github/
│   ├── workflows/         # CI/CD (lint, test, release)
│   └── PULL_REQUEST_TEMPLATE.md
├── Dockerfile
├── docker-compose.yml
├── Makefile
├── requirements.txt
└── pyproject.toml

⚙️ LLM Provider Configuration

Agents use a failover LLM client with automatic per-model cooldown:

Models are tried top-to-bottom from PROVIDER_CONFIGS in app/config/agent_config.py
If a model hits a rate limit (429), it enters cooldown and the next model is used
Cooldown is per-model — gemini-1.5-flash cooling doesn't block gemini-2.0-flash
Only API keys go in .env; model names and order are configured in code

🐳 Docker

make dev       # Local Postgres + API (development)
make prod      # API only, expects external Postgres (e.g. Supabase)
make stop      # Stop all containers
make clean     # Remove containers + volumes
make reset     # Full reset (wipes data, fresh start)

👥 Contributors

Contributors are highlighted on the website at /contributors.

How it works:

scripts/generate_contributors.py fetches contributor data from the GitHub API and writes frontend/data/contributors.json.
A GitHub Actions workflow (.github/workflows/update_contributors.yml) runs weekly (Monday midnight UTC) and on manual dispatch to keep the file up to date. It only commits when the data has actually changed.
The frontend reads the static JSON at build time — no runtime GitHub API calls.

Running locally:

# Generate/refresh contributors data (optional GITHUB_TOKEN for higher rate limits)
python scripts/generate_contributors.py

# With a token
GITHUB_TOKEN=ghp_... python scripts/generate_contributors.py

📄 License

This project is licensed under the MIT License.

Built with care for Indian democracy 🇮🇳

Report a Bug · Request a Feature · Contribute Data

Name		Name	Last commit message	Last commit date
Latest commit History 601 Commits
.github		.github
alembic		alembic
app		app
chroma_db		chroma_db
docs		docs
frontend		frontend
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
alembic.ini		alembic.ini
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
readme.md		readme.md
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
run.py		run.py
skills-lock.json		skills-lock.json

Folders and files

Latest commit

History

Repository files navigation

🏛️ Rajniti

✨ Features

🏗️ Tech Stack

🚀 Getting Started

Prerequisites

1. Clone & Install

2. Configure Environment

3. Run

4. Verify

🤖 Contributing with AI

How It Works

Step-by-Step

Optional: Daily agent schedule (macOS, Linux, Windows)

🛡️ Contribution Rules

🔌 API Endpoints

🧩 Adding a New Enrichment Process

🗄️ Database & Migrations

Database Setup

How Migrations Work

When You Change a Model

Migration Commands Reference

🧪 Testing

Backend (Python)

Frontend (Next.js)

📂 Project Structure

⚙️ LLM Provider Configuration

🐳 Docker

👥 Contributors

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages