docguard 🛡️

Stop Documentation Drift. Start Semantic Linting.

docguard is an AI-powered semantic linter that bridges the gap between implementation and documentation. It doesn't just check if a docstring exists — it understands what your code does and ensures your comments tell the truth.

Features • Installation • Quick Start • Commands • Configuration • CI/CD • MCP • Architecture

💡 Why docguard?

Traditional linters check for formatting and existence. docguard checks for truth.

In fast-moving codebases, docstrings become "stale" in subtle ways:

A parameter is renamed in the code but stays unchanged in the comment.
A return type changes, but the documentation still points to the old model.
A new exception is raised, yet it's nowhere to be found in the Raises section.
An MCP tool description no longer reflects what the tool actually does — making it invisible to LLMs.

docguard acts as an automated Peer Reviewer that specialises in documentation quality.

✨ Features

Feature	Description
🧠 Semantic Analysis	Deep logic verification using OpenAI, Gemini, or Ollama
⚠️ Severity Levels	`CRITICAL / ERROR / WARNING / INFO` — graduate what fails your build
🔗 MCP-Aware Mode	Specialized analysis for `@*.tool` decorated functions
⚡ Streaming Interface	Results appear as the first function is analysed — no waiting
🚀 Parallel File I/O	`ThreadPoolExecutor` scans large directories concurrently
💾 Persistent Cache	Content-hashed results; unchanged functions are never re-analysed
🪙 Token Budget	Smart truncation for large functions — keeps signatures and key statements
🛠️ Interactive Fixes	Review and apply surgical docstring patches with `docguard fix`
🔁 Smart Retry	Not satisfied? Provide a hint and ask the LLM again
💡 Generate Docstrings	Auto-generate Google-style docstrings for undocumented functions

📦 Installation

# Clone the repository
git clone https://github.com/skalogerakis/docguard.git
cd docguard

# Set up a virtual environment
python -m venv venv
source venv/bin/activate          # Windows: venv\Scripts\activate

# Install runtime dependencies
pip install -e .

# Install development dependencies (tests + linting)
pip install -e ".[dev]"

LLM Provider Prerequisites

Provider	Requirement
Ollama (default)	Install Ollama and run `ollama pull llama3`
OpenAI	`OPENAI_API_KEY` environment variable
Gemini	`GEMINI_API_KEY` environment variable

⚙️ Configuration

docguard reads configuration from a .env file (or any file passed via --config) and from environment variables.

# ── Provider API keys ──────────────────────────────────────────────
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...

# ── Model overrides (optional) ─────────────────────────────────────
OPENAI_MODEL=gpt-4o-mini
GEMINI_MODEL=gemini-2.5-flash
OLLAMA_MODEL=llama3

# ── Engine ─────────────────────────────────────────────────────────
MAX_CONCURRENCY=5          # Parallel LLM calls per run

# ── Analysis ───────────────────────────────────────────────────────
# Minimum severity that causes `check` to exit 1.
# Accepts: critical | error | warning | info
FAIL_ON=error

# Token budget per function (4 chars ≈ 1 token; default 8000 ≈ 2000 tokens)
MAX_FUNCTION_CHARS=8000

# ── Display ────────────────────────────────────────────────────────
# Max failures shown inline before the truncation notice
SHOW_MENU_LIMIT=50

# Cache directory (relative to cwd)
CACHE_DIR=.docguard_cache

# Extra directories to exclude (comma-separated)
EXTRA_EXCLUDE_DIRS=scripts,migrations

All settings can also be exported as shell environment variables — .env is a convenience.

Severity Levels

docguard assigns a severity to every discrepancy so you can control exactly what breaks your build:

Severity	When it applies	Default `--fail-on`
`critical`	Function does the opposite of what is documented	✅ always fails
`error`	Wrong parameter names/types, wrong return type (default)	✅ always fails
`warning`	Minor inaccuracy, incomplete description	❌ unless `--fail-on warning`
`info`	Cosmetic improvement only	❌ unless `--fail-on info`

🚀 Quick Start

# Verify docstrings in a single file (uses local Ollama — free)
docguard check src/my_module.py

# Scan a directory with Gemini, fail only on CRITICAL issues
docguard check src/ --provider gemini --fail-on critical

# Generate docstrings for undocumented functions
docguard suggest src/ --provider ollama --model llama3

# Interactively review and apply fixes
docguard fix src/ --provider openai

# Analyse only MCP tool functions
docguard check src/mcp_server.py --mcp --provider gemini

📖 Commands

`docguard check` — Verify Docstring Accuracy

Analyses every documented function and flags any discrepancy between the docstring and the implementation.

Usage: docguard check [OPTIONS] PATH

  Verify that every docstring accurately reflects its function's implementation.

Arguments:
  PATH  Path to the Python file or directory to lint  [required]

Options:
  --provider TEXT       LLM provider: gemini, openai, ollama  [default: ollama]
  -m, --model TEXT      Model name, overrides provider default
  -f, --force           Ignore cache and re-analyse everything
  --show-all            Show all failures (no cap)
  -v, --verbose         Show per-function timing and token usage (+ cache path)
  --fail-on TEXT        Minimum severity to fail on: critical|error|warning|info  [default: error]
  --mcp                 MCP mode: only check @*.tool decorated functions
  -e, --exclude TEXT    Extra directory to exclude (repeatable)
  --config TEXT         Path to a custom .env file
  --help                Show this message and exit

Exit codes:

0 — All docstrings pass (or no failures above the --fail-on threshold).
1 — One or more discrepancies found above threshold, or a fatal error occurred.

Examples:

# Quick check with local Ollama (no API key needed)
docguard check src/

# Fail only on critical or error severity issues
docguard check src/ --provider gemini --fail-on error

# Strict mode — fail on anything at all
docguard check src/ --provider openai --fail-on info

# MCP-only mode with verbose output
docguard check src/mcp_server.py --mcp --provider gemini --verbose

# Exclude generated code and migrations
docguard check src/ --exclude migrations --exclude generated --provider ollama

# Load a project-specific config file
docguard check src/ --config .env.production

# Force re-analysis, show all failures
docguard check src/ --provider openai --force --show-all

Sample output:

╭─────────────────────────────╮
│ DocGuard 🛡️  - Initializing │
╰─────────────────────────────╯
🤖 Provider: Ollama | Model: llama3
📂 Scanning: src/
💾 Cache: .docguard_cache
🧠 Analysis with ollama…

❌ src/api/users.py::get_user  [error]  (line 14)
   Reason: Docstring says "fetches by email" but implementation uses user_id
   Suggested Fix: Fetch a user record by their integer ID.

                  Args:
                      user_id: The unique integer identifier.

                  Returns:
                      A dict containing the user's profile data.

┌──────────────────┬───────┐
│ Total Functions  │ 42    │
│ Cached (Skipped) │ 38    │
│ New (Processed)  │ 4     │
│ Failed           │ 1     │
└──────────────────┴───────┘

❌ Found 1 docstring discrepancy.

`docguard check --mcp` — MCP Tool Analysis

When --mcp is passed, docguard switches to MCP mode:

Scans only functions decorated with @mcp.tool, @server.tool, @app.tool, @tool, or any @*.tool pattern.
Uses a MCP-specialized LLM prompt that evaluates:
1. Discoverability — is the summary specific enough that an LLM knows when to call this tool?
2. Parameter completeness — are all args described with type and purpose?
3. Return value accuracy — does the doc describe the actual output format?
4. Exception transparency — are failure modes documented?
Results include a discoverability_issues list with specific problems.
Uses a separate cache namespace (mcp) so results don't interfere with standard checks.

# Analyse MCP tool descriptions
docguard check src/mcp_server.py --mcp --provider gemini

# Strict MCP check — fail on any warning
docguard check src/mcp_server.py --mcp --fail-on warning --provider openai

Sample MCP output:

🔗 MCP mode: scanning only @*.tool functions
🧠 MCP tool analysis with gemini…

🔗 MCP src/mcp_server.py::search_documents  [warning]  (line 12)
   Reason: Summary is too vague — an LLM cannot distinguish this from other search tools
   Suggested Fix: Search the document store by keyword and return ranked results.
                  ...
   MCP Issues:
     • Summary does not clarify the scope of search (title-only vs full-text)
     • Missing Returns section — LLM cannot interpret the output format

`docguard suggest` — Generate Missing Docstrings

Finds all undocumented functions and generates accurate, Google-style docstrings.

Usage: docguard suggest [OPTIONS] PATH

  Generate Google-style docstring suggestions for undocumented functions.

Arguments:
  PATH  Path to the Python file or directory to scan  [required]

Options:
  --provider TEXT       LLM provider: gemini, openai, ollama  [default: ollama]
  -m, --model TEXT      Model name, overrides provider default
  -f, --force           Ignore cached suggestions and regenerate
  -v, --verbose         Show per-function timing and token usage
  -e, --exclude TEXT    Extra directory to exclude (repeatable)
  --config TEXT         Path to a custom .env file
  --help                Show this message and exit

Note: suggest only prints suggestions — it does not modify your files. Use docguard fix to apply them.

Examples:

# Generate docstrings for everything missing one
docguard suggest src/ --provider gemini

# Use a specific Ollama model
docguard suggest src/ --provider ollama --model gemma3:4b

# Regenerate even if suggestions are cached
docguard suggest src/ --provider openai --force

`docguard fix` — Interactively Apply Fixes

Runs the full analysis, then presents each failed function one by one for review and patching.

Usage: docguard fix [OPTIONS] PATH

  Interactively review and apply docstring fixes.

Arguments:
  PATH  Path to the Python file or directory to fix  [required]

Options:
  --provider TEXT       LLM provider: gemini, openai, ollama  [default: ollama]
  -m, --model TEXT      Model name, overrides provider default
  -f, --force           Ignore cache and re-analyse everything
  --auto-apply          Apply all fixes without interactive prompting
  -v, --verbose         Show per-function timing and token usage
  -e, --exclude TEXT    Extra directory to exclude (repeatable)
  --config TEXT         Path to a custom .env file
  --help                Show this message and exit

Interactive prompt options:

Input	Action
`y` or `Enter`	Apply the suggested fix to the source file
`n`	Skip this function — leave it unchanged
`r`	Retry — ask the LLM again, optionally with a hint

Examples:

# Interactive review
docguard fix src/ --provider gemini

# Fully automated — apply every fix (CI-friendly)
docguard fix src/ --provider openai --auto-apply

# Re-analyse everything and auto-apply
docguard fix src/ --force --auto-apply --provider gemini

The retry flow:

❌ src/api/users.py::get_user  [error]  (line 14)
   Issue:   Docstring says "fetches by email" but uses user_id
   Fix:     Fetch a user by their unique integer ID.

   [y]es apply / [n]o skip / [r]etry with LLM: r
   Hint for the LLM (press Enter to skip): mention the dict structure of the return value
   🔄 Retrying…
   New fix: Fetch a user record by their integer ID.

             Args:
                 user_id: The unique integer identifier.

             Returns:
                 A dict with keys 'id', 'name', and 'email'.

   [y]es apply / [n]o skip / [r]etry with LLM: y
   ✓ Applied.

`docguard cache-clear` — Clear the Result Cache

Wipes all cached analysis and generation results (all namespaces: check, suggest, mcp).

docguard cache-clear

Tip: Cache files are stored in .docguard_cache/ by default (configurable via CACHE_DIR). Add it to .gitignore.

`docguard version` — Print Version

docguard version
# DocGuard v0.1.0

🔗 MCP Mode — Built for AI Tools

When using MCP (Model Context Protocol) servers, the docstring is the interface — it's what the LLM reads to decide whether to invoke a tool and how.

A stale or inaccurate MCP tool description means:

The LLM cannot find the tool when it should be used.
The LLM invokes the tool with the wrong arguments.
The tool's output is misinterpreted.

Decorator Detection

docguard automatically detects all common MCP tool patterns:

@mcp.tool           # FastMCP
@mcp.tool()         # FastMCP with kwargs
@server.tool        # Custom server instances
@app.tool           # App-style servers
@tool               # Bare decorator

MCP Check Workflow

# Verify your MCP server's tool descriptions
docguard check src/mcp_server.py --mcp --provider gemini

# Auto-fix any drifted descriptions
docguard fix src/mcp_server.py --provider gemini --auto-apply

# Strict CI check — fail on any warning
docguard check src/mcp_server.py --mcp --fail-on warning --provider openai

What a Good MCP Docstring Looks Like

@mcp.tool
def search_documents(query: str, limit: int = 10) -> list[dict]:
    """
    Search the document store for entries matching the query string.

    Args:
        query: The search term to match against document titles and content.
        limit: Maximum number of results to return.

    Returns:
        A list of matching document dicts, each with 'id', 'title', and 'score'.

    Raises:
        ValueError: If query is empty or limit is less than 1.
    """
    ...

🔬 Running Tests

# Run all tests (unit + e2e, mocked — no API calls)
pytest tests/

# Verbose output
pytest tests/ -v

# Unit tests only
pytest tests/unit/

# E2E tests only
pytest tests/e2e/

# Run live tests against a real LLM (requires Ollama running)
pytest tests/ --run-live-llm --live-llm-provider ollama

# Live tests with a specific provider
pytest tests/ --run-live-llm --live-llm-provider gemini

The live test flag (--run-live-llm) is intentionally off by default to avoid costs and network dependencies.

🔧 CI/CD Integration

GitHub Actions

# .github/workflows/docguard.yml
name: DocGuard — Documentation Lint

on: [push, pull_request]

jobs:
  docguard:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install DocGuard
        run: pip install -e .

      - name: Install and start Ollama
        run: |
          curl -fsSL https://ollama.com/install.sh | sh
          ollama serve &
          sleep 5
          ollama pull llama3

      - name: Run DocGuard check
        run: docguard check src/ --provider ollama --fail-on error

Using OpenAI or Gemini in CI

      - name: Run DocGuard check
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: docguard check src/ --provider openai --model gpt-4o-mini --fail-on error

MCP Server CI

      - name: Verify MCP tool descriptions
        env:
          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
        # Fail if any MCP tool has a WARNING or worse description
        run: docguard check src/mcp_server.py --mcp --provider gemini --fail-on warning

Pre-commit Hook

Add to .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: docguard
        name: DocGuard — Docstring Lint
        entry: docguard check
        args: [src/, --provider, ollama, --fail-on, error]
        language: system
        types: [python]
        pass_filenames: false

      - id: docguard-mcp
        name: DocGuard — MCP Tool Lint
        entry: docguard check
        args: [src/mcp_server.py, --mcp, --provider, ollama, --fail-on, warning]
        language: system
        types: [python]
        pass_filenames: false

🏗️ Architecture

src/docguard/
├── main.py               # Entrypoint — imports from cli package
├── constants.py          # Shared constants
├── cli/                  # CLI commands (Typer + Rich)
│   ├── __init__.py       # Typer app + command registration
│   ├── shared.py         # Console, _build_engine, print helpers
│   ├── check.py          # `check` command + _run_check async core
│   ├── suggest.py        # `suggest` command + _run_suggest async core
│   ├── fix.py            # `fix` command + interactive loop helpers
│   └── misc.py           # `cache-clear`, `version`
├── core/
│   ├── config.py         # Pydantic settings (env + .env)
│   ├── engine.py         # Async streaming orchestration engine
│   └── exceptions.py     # Custom exception hierarchy
├── analysis/
│   ├── cache.py          # Content-hashed DiskCache (schema-versioned)
│   └── parser.py         # Tree-sitter single-pass AST parser (parallel I/O)
├── llm/
│   ├── base.py           # BaseLLMProvider ABC
│   ├── factory.py        # Provider registry + instantiation
│   ├── protocol.py       # LLMProviderProtocol (structural typing)
│   ├── gemini.py         # Google Gemini provider
│   ├── openai.py         # OpenAI provider
│   ├── ollama.py         # Ollama (local) provider
│   └── prompts/          # Prompt builders (split by concern)
│       ├── __init__.py   # Re-exports for backward compat
│       ├── check.py      # Accuracy analysis prompts + severity guide
│       ├── suggest.py    # Docstring generation prompts
│       └── mcp.py        # MCP-specialised discoverability prompts
├── models/
│   ├── entity.py         # CodeEntity dataclass
│   └── schema.py         # Pydantic schemas: DocstringAnalysis (+ Severity),
│                         #   DocstringGeneration, MCPToolAnalysis
├── output/               # Output format adapters (stub — planned)
│   └── __init__.py       #   planned: terminal.py, sarif.py, html_report.py
└── utils/
    ├── patcher.py        # Surgical docstring insertion/replacement
    └── timer.py          # perf_counter context manager

Key Design Properties

Property	How it's achieved
Single-pass parsing	`_parse_all` returns `(documented, undocumented, mcp_tools)` in one tree traversal
Parallel file I/O	`ThreadPoolExecutor` in `_scan_parallel` — tree-sitter C extension releases the GIL
Token budget	`_smart_truncate` keeps signature + return/raise lines; body middle is dropped
Cache safety	Keys hash only `code + docstring`; renames don't bust the cache; `_SCHEMA_VERSION` prefix auto-busts on schema changes
Severity filtering	`severity_exceeds_threshold(severity, fail_on)` — O(1) rank comparison
MCP namespace	`mcp_check_stream` uses `namespace="mcp"` so MCP and standard results never collide

Adding a New LLM Provider

Create src/docguard/llm/my_provider.py implementing BaseLLMProvider._call_raw.
Register it in _PROVIDERS dict in factory.py.
Add my_provider_model: str = "default-model" to DocGuardConfig.

The provider automatically inherits analyze, generate, retry, and mcp_analyze from BaseLLMProvider.

🗺️ Roadmap

docguard check --since HEAD~1 — Git-aware incremental analysis (only changed functions)
SARIF output — --format sarif for native GitHub Actions inline PR annotations
HTML report — --report-html for human-readable summary with trend comparison
docguard init — Generate [tool.docguard] config block in pyproject.toml
Cross-tool disambiguation — Detect MCP tools with overlapping descriptions (cosine similarity)
PyPI release — pip install docguard from the public registry

🤝 Contributing

Fork the repository.
Branch (git checkout -b feature/my-improvement).
Implement your change with tests.
Verify: pytest tests/ && ruff check src/
PR — include a clear description of the problem and solution.

📄 License

Distributed under the Apache License 2.0. See LICENSE for the full text.

Created by Stefanos Kalogerakis | GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src/docguard		src/docguard
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

docguard 🛡️

Stop Documentation Drift. Start Semantic Linting.

💡 Why docguard?

✨ Features

📦 Installation

LLM Provider Prerequisites

⚙️ Configuration

Severity Levels

🚀 Quick Start

📖 Commands

docguard check — Verify Docstring Accuracy

docguard check --mcp — MCP Tool Analysis

docguard suggest — Generate Missing Docstrings

docguard fix — Interactively Apply Fixes

docguard cache-clear — Clear the Result Cache

docguard version — Print Version

🔗 MCP Mode — Built for AI Tools

Decorator Detection

MCP Check Workflow

What a Good MCP Docstring Looks Like

🔬 Running Tests

🔧 CI/CD Integration

GitHub Actions

Using OpenAI or Gemini in CI

MCP Server CI

Pre-commit Hook

🏗️ Architecture

Key Design Properties

Adding a New LLM Provider

🗺️ Roadmap

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`docguard check` — Verify Docstring Accuracy

`docguard check --mcp` — MCP Tool Analysis

`docguard suggest` — Generate Missing Docstrings

`docguard fix` — Interactively Apply Fixes

`docguard cache-clear` — Clear the Result Cache

`docguard version` — Print Version

Packages