Skip to content

imsks/Rajniti

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

601 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›๏ธ Rajniti

Open-source Indian politician data platform โ€” powered by AI enrichment

Python Next.js Flask LangChain License: MIT Docker CI PRs Welcome


Browse, search, and explore data on Indian MPs and MLAs. Enrich politician profiles automatically using LLM-based agents.

Getting Started ยท Contributing with AI ยท Daily agent schedule ยท API Reference ยท Project Structure


โœจ Features

Feature Description
๐Ÿ” Search & Browse Look up MPs and MLAs by name, state, constituency, or party
๐Ÿค– AI Enrichment Automatically fill education, family, criminal records, and more using LLMs
๐Ÿ”„ Multi-Model Failover Gemini โ†’ OpenAI โ†’ Perplexity with per-model cooldown on rate limits
๐Ÿ—ƒ๏ธ JSON-First Data Source of truth lives in version-controlled JSON files
๐Ÿง  Vector Search Semantic Q&A โ€” in progress (learning doc)
๐Ÿ” Google OAuth User accounts via NextAuth with backend sync
๐Ÿ“Š Stats Dashboard Party breakdown, state coverage, and enrichment progress

๐Ÿ—๏ธ Tech Stack

Backend

Python Flask PostgreSQL SQLite

Frontend

Next.js React TypeScript Tailwind

AI / LLM

Gemini OpenAI LangChain

Infrastructure

Docker GCP Vercel


๐Ÿš€ Getting Started

Prerequisites

Tool Version
Python 3.11+
Node.js 18+
Docker Latest (optional, for Postgres)

1. Clone & Install

git clone https://github.com/<your-username>/Rajniti.git
cd Rajniti

# Backend (runs via Python venv)
make install            # creates venv + pip install -r requirements.txt
cp .env.example .env    # configure your environment
. venv/bin/activate     # activate the virtual environment

# Frontend
cd frontend && npm install

Backend commands (make run, make test, db and lint targets) use the project virtualenv (venv/) automatically. To run Python scripts by hand, either use those Make targets or activate the venv first: source venv/bin/activate (or make venv for an interactive shell with venv active).

2. Configure Environment

Copy .env.example and fill in the required values:

# Backend โ€” .env
FLASK_ENV=development
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rajniti   # optional
GEMINI_API_KEY=your-key-here          # free tier โ€” only key you need to get started

# Frontend โ€” frontend/.env
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=your-secret
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
NEXT_PUBLIC_API_URL=http://localhost:8000

3. Run

# Option A: Docker (backend + Postgres)
make dev

# Option B: Run backend directly (via venv)
make run                # starts Flask API on :8000 (uses project venv)

# Frontend (separate terminal)
make frontend           # starts Next.js on :3000

To run other Python commands in the project venv, use make venv to open a shell with the venv activated, or run source venv/bin/activate in your terminal first.

4. Verify

Open http://localhost:8000/api/v1/health โ€” you should see a healthy response.


๐Ÿค– Contributing with AI

This is the easiest way to contribute. Run AI agents locally with your own API keys, and open a PR with enriched politician data.

How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  JSON Data   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  LLM Agents   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Enriched    โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Open a PR   โ”‚
โ”‚  (mp/mla)    โ”‚     โ”‚  (Gemini/GPT) โ”‚     โ”‚  JSON Data   โ”‚     โ”‚              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The enrichment pipeline reads politicians from JSON, queries LLMs for missing details (education, family, criminal records, etc.), and writes the results back. A local SQLite cache prevents re-processing.

Step-by-Step

1. Fork & set up

git clone <your-fork-url>
cd Rajniti
git checkout -b enrich/<scope>       # e.g. enrich/mp-education
make install
cp .env.example .env                 # add your API key(s)

2. Get an API key (at least one)

Provider How to Get a Key Env Variable Cost
Gemini Google AI Studio GEMINI_API_KEY Free tier (rate-limited)
OpenAI platform.openai.com OPENAI_API_KEY Paid
Perplexity perplexity.ai PERPLEXITY_API_KEY Paid

Fastest setup: Get a free Gemini key from Google AI Studio, paste it as GEMINI_API_KEY in your .env, and you're ready to run agents โ€” no paid key needed.

Models fail over automatically (Gemini โ†’ OpenAI โ†’ Perplexity). Order is configured in app/config/agent_config.py.

3. Run the agent

# Run the agent for all politicians
python3 scripts/run_politician_agent.py

# Test with a small batch first
python3 scripts/run_politician_agent.py --type MP --limit 3 --log-level INFO

# Run for all MPs
python3 scripts/run_politician_agent.py --type MP --log-level INFO

# Run for all MLAs
python3 scripts/run_politician_agent.py --type MLA --log-level INFO

# Target a single politician
python3 scripts/run_politician_agent.py --id "<POLITICIAN_ID>"

# Force re-run (ignore cache)
python3 scripts/run_politician_agent.py --type MP --force

4. Add MLAs for a new state

python3 scripts/fetch_mlas.py --state "Andhra Pradesh" --log-level INFO

5. Open a PR

git add app/data/mp.json app/data/mla.json
git commit -m "Enrich MP education data"
git push -u origin enrich/<scope>

Then open a Pull Request. Include: the state/scope, number of records, and how you tested.

Optional: Daily agent schedule (macOS, Linux, Windows)

If you keep the repo on your laptop and prefer not to start the enrichment agent by hand every day, you can run it on a daily schedule (example: 8:00 in your local time). The same command updates app/data/mp.json and app/data/mla.json when you omit --type โ€” it processes every politician who still needs enrichment (respecting the local cache).

Important: Scheduled jobs must run with the project root as the working directory so python-dotenv can load the repoโ€™s .env (API keys). Use absolute paths in scripts and in cron/Task Scheduler.

1. One wrapper script (Unix โ€” macOS and Linux)

Save as something like ~/bin/rajniti-daily-enrich.sh, edit REPO_ROOT, then make it executable (chmod +x ~/bin/rajniti-daily-enrich.sh):

#!/usr/bin/env bash
set -euo pipefail

REPO_ROOT="/absolute/path/to/Rajniti"   # <-- change this
LOG_DIR="${HOME}/logs"
mkdir -p "${LOG_DIR}"

cd "${REPO_ROOT}"
# Assumes you use the project venv from `make install`
# shellcheck source=/dev/null
. "${REPO_ROOT}/venv/bin/activate"

# One run: all MPs and MLAs that still need work (omit --type)
python scripts/run_politician_agent.py --log-level INFO \
  >> "${LOG_DIR}/rajniti-agent.log" 2>&1

Run immediately (any OS): open a terminal, cd to the repo, activate the venv, and run the same python scripts/run_politician_agent.py --log-level INFO line (or use the commands in 3. Run the agent above). No need to wait for the next 8:00 run.


2. macOS โ€” crontab

Edit your user crontab:

crontab -e

Add one line to run every day at 08:00 local time (adjust the script path if needed):

0 8 * * * /Users/YOUR_USER/bin/rajniti-daily-enrich.sh
  • List your crontab: crontab -l
  • Remove all cron entries: crontab -r (use with care)
  • Change time: edit the five time fields; 0 8 * * * = minute 0, hour 8, every day. See man 5 crontab on your system.
  • Sleep / closed lid: cron only runs while the Mac is awake at the scheduled time. If you often miss the window, run the script manually when youโ€™re back, or consider launchd (~/Library/LaunchAgents/) with StartCalendarInterval for a user agent that behaves similarly (still requires the machine to be awake, unless you use a always-on machine or a remote runner).

3. Linux โ€” crontab

Same idea as macOS:

crontab -e

Example (8:00 daily):

0 8 * * * /home/YOUR_USER/bin/rajniti-daily-enrich.sh

Ensure the shebang script is executable and that cron has access to your environment if you rely on anything outside the script (the wrapper above avoids that by using absolute paths and cd).


4. Windows โ€” Task Scheduler

a) Create C:\Users\YOUR_USER\bin\rajniti-daily-enrich.bat (adjust paths):

@echo off
setlocal
cd /d C:\absolute\path\to\Rajniti
call venv\Scripts\activate.bat
python scripts\run_politician_agent.py --log-level INFO >> "%USERPROFILE%\logs\rajniti-agent.log" 2>&1

Create the log folder once: mkdir %USERPROFILE%\logs

b) Open Task Scheduler โ†’ Create Taskโ€ฆ (not โ€œCreate Basic Taskโ€ if you want full control).

  • General: Run only when user is logged on (typical for a laptop), or โ€œRun whether user is logged on or notโ€ if you need headless runs (may require stored password).
  • Triggers: Newโ€ฆ โ†’ Daily โ†’ start time 8:00:00 AM (local time).
  • Actions: Newโ€ฆ โ†’ Start a program โ†’ Program/script: C:\Users\YOUR_USER\bin\rajniti-daily-enrich.bat (or cmd.exe with arguments /c "C:\โ€ฆ\rajniti-daily-enrich.bat" if you prefer).

Run immediately: In Task Scheduler, right-click the task โ†’ Run. Or from cmd.exe: schtasks /Run /TN "YourTaskName" (use the exact task name you created).

Change or remove: Task Scheduler Library โ†’ select the task โ†’ Properties (edit triggers/times) or Delete.


5. Updating the schedule later

Platform View Edit Remove
macOS / Linux crontab -l crontab -e Delete the line in the editor, or crontab -r to clear everything
Windows Task Scheduler โ†’ your task Properties โ†’ Triggers / Actions Delete the task

After any change to the wrapper script path or repo location, update the scheduled command to match.


๐Ÿ›ก๏ธ Contribution Rules

These rules are non-negotiable for all PRs.

Rule Details
No secrets Never commit .env or API keys
No cache files app/database/cache.db is local-only
Data PRs only touch JSON Your PR should update app/data/mp.json and/or app/data/mla.json
Tests must pass Run make test before pushing
Review your diff Ensure only intended changes are included

๐Ÿ”Œ API Endpoints

Method Endpoint Description
GET /api/v1/politicians List politicians (filter by type)
GET /api/v1/politicians/search?q= Search by name
GET /api/v1/politicians/<id> Get a single politician
GET /api/v1/politicians/state/<state> Filter by state
GET /api/v1/politicians/party/<party> Filter by party
GET /api/v1/stats Summary statistics
GET /api/v1/states List all states
GET /api/v1/parties List all parties
POST /api/v1/questions/ask Ask a question (501 until vector store is reimplemented; see docs/VECTOR_DBS.md)
GET /api/v1/health Health check

๐Ÿงฉ Adding a New Enrichment Process

Want to enrich a new field (e.g., criminal records, social media)?

  1. Add a prompt builder in app/prompts/politician_prompts.py
  2. Create a process class in app/agents/politician_agent.py
  3. Register it in PoliticianAgent.__init__ by appending to self.processes

The architecture is designed to be extensible โ€” each enrichment field is an independent process.


๐Ÿ—„๏ธ Database & Migrations

The backend uses PostgreSQL via SQLAlchemy + Alembic. Migrations run automatically on server startup (via alembic upgrade head).

Database Setup

Option A โ€” Local Docker Postgres (for development):

# .env
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rajniti
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=rajniti
make dev          # starts Postgres + API via Docker Compose

If running the API outside Docker (venv), use localhost in the URL. If running inside Docker, use postgres (the compose service name).

Option B โ€” Supabase (for staging/production):

# .env โ€” use the session-mode pooler URL (port 5432), NOT the direct URL
DATABASE_URL=postgresql://postgres.PROJECT_REF:PASSWORD@aws-0-REGION.pooler.supabase.com:5432/postgres

The direct Supabase host (db.*.supabase.co) is IPv6-only and will fail in Docker. Always use the session-mode pooler URL from your Supabase dashboard (Settings > Database > Connection string > Session mode).

How Migrations Work

  1. On server startup: alembic upgrade head runs automatically, applying any pending migrations.
  2. metadata.create_all runs as a fallback for brand-new databases with no tables at all.

To disable auto-migration, set SKIP_DB_AUTO_MIGRATE=1 in .env.

When You Change a Model

After editing a model file (e.g. adding a column to app/database/models/user.py):

# 1. Generate a migration from the model diff
python scripts/db.py autogenerate -m "add column_name to users"

# 2. Review the generated file in alembic/versions/ โ€” remove anything unwanted
#    (Alembic may detect tables in the DB that aren't in your models)

# 3. Apply it
python scripts/db.py migrate

Or equivalently using Alembic directly:

alembic revision --autogenerate -m "add column_name to users"
alembic upgrade head

The next time the server starts, it will apply the migration automatically for all environments (local, GCP, etc.).

Migration Commands Reference

Command What it does
python scripts/db.py migrate Run alembic upgrade head
python scripts/db.py autogenerate -m "msg" Generate migration from model diff
python scripts/db.py init metadata.create_all (no Alembic)
alembic history Show migration chain
alembic current Show current revision in DB

๐Ÿงช Testing

Backend (Python)

make test              # all tests
make test-unit         # unit tests only
make test-e2e          # end-to-end tests
make coverage          # tests + coverage report
make lint              # backend + frontend linting
make format            # auto-format with Black + isort

In GitHub Actions, backend checks run in one job, in order: lint (Black, isort, Flake8, mypy) โ†’ unit tests โ†’ integration tests โ†’ E2E tests. See .github/workflows/ci.yml.

Frontend (Next.js)

cd frontend
npm test               # all Jest tests (unit + integration)
npm run test:unit      # Jest unit tests only
npm run test:integration
npm run test:e2e       # Playwright (needs dev server; see frontend README)

Full frontend testing notes, E2E setup, and CI behavior: frontend/README.md.

In GitHub Actions, after ESLint and TypeScript, unit, integration, and E2E jobs run in parallel, then a production build runs if all pass.


๐Ÿ“‚ Project Structure

Rajniti/
โ”œโ”€โ”€ app/
โ”‚   โ”œโ”€โ”€ agents/            # LLM-based enrichment agents
โ”‚   โ”œโ”€โ”€ config/            # Agent & provider configuration
โ”‚   โ”œโ”€โ”€ controllers/       # API request handlers
โ”‚   โ”œโ”€โ”€ core/              # Utilities, logging, errors
โ”‚   โ”œโ”€โ”€ data/              # mp.json, mla.json (source of truth)
โ”‚   โ”œโ”€โ”€ database/          # Models, migrations, SQLite cache
โ”‚   โ”œโ”€โ”€ prompts/           # LLM prompt builders
โ”‚   โ”œโ”€โ”€ routes/            # Flask route definitions
โ”‚   โ”œโ”€โ”€ schemas/           # Pydantic validation schemas
โ”‚   โ””โ”€โ”€ services/          # Business logic layer
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ README.md          # Frontend scripts, testing, CI notes
โ”‚   โ”œโ”€โ”€ app/               # Next.js App Router pages
โ”‚   โ”œโ”€โ”€ components/        # React components
โ”‚   โ”œโ”€โ”€ data/              # Generated static data (contributors.json)
โ”‚   โ”œโ”€โ”€ __tests__/e2e/     # Playwright E2E tests (browser)
โ”‚   โ”œโ”€โ”€ hooks/             # Custom React hooks
โ”‚   โ””โ”€โ”€ lib/               # Shared utilities
โ”œโ”€โ”€ scripts/               # CLI scripts (agent runner, DB, MLA fetcher)
โ”œโ”€โ”€ tests/                 # Unit, integration, and E2E tests
โ”œโ”€โ”€ alembic/               # Database migrations
โ”œโ”€โ”€ docker/                # Docker init scripts
โ”œโ”€โ”€ .github/
โ”‚   โ”œโ”€โ”€ workflows/         # CI/CD (lint, test, release)
โ”‚   โ””โ”€โ”€ PULL_REQUEST_TEMPLATE.md
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ Makefile
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ pyproject.toml

โš™๏ธ LLM Provider Configuration

Agents use a failover LLM client with automatic per-model cooldown:

  • Models are tried top-to-bottom from PROVIDER_CONFIGS in app/config/agent_config.py
  • If a model hits a rate limit (429), it enters cooldown and the next model is used
  • Cooldown is per-model โ€” gemini-1.5-flash cooling doesn't block gemini-2.0-flash
  • Only API keys go in .env; model names and order are configured in code

๐Ÿณ Docker

make dev       # Local Postgres + API (development)
make prod      # API only, expects external Postgres (e.g. Supabase)
make stop      # Stop all containers
make clean     # Remove containers + volumes
make reset     # Full reset (wipes data, fresh start)

๐Ÿ‘ฅ Contributors

Contributors are highlighted on the website at /contributors.

How it works:

  • scripts/generate_contributors.py fetches contributor data from the GitHub API and writes frontend/data/contributors.json.
  • A GitHub Actions workflow (.github/workflows/update_contributors.yml) runs weekly (Monday midnight UTC) and on manual dispatch to keep the file up to date. It only commits when the data has actually changed.
  • The frontend reads the static JSON at build time โ€” no runtime GitHub API calls.

Running locally:

# Generate/refresh contributors data (optional GITHUB_TOKEN for higher rate limits)
python scripts/generate_contributors.py

# With a token
GITHUB_TOKEN=ghp_... python scripts/generate_contributors.py

๐Ÿ“„ License

This project is licensed under the MIT License.


Built with care for Indian democracy ๐Ÿ‡ฎ๐Ÿ‡ณ

Report a Bug ยท Request a Feature ยท Contribute Data