docguard is an AI-powered semantic linter that bridges the gap between implementation and documentation. It doesn't just check if a docstring exists — it understands what your code does and ensures your comments tell the truth.
Features • Installation • Quick Start • Commands • Configuration • CI/CD • MCP • Architecture
Traditional linters check for formatting and existence. docguard checks for truth.
In fast-moving codebases, docstrings become "stale" in subtle ways:
- A parameter is renamed in the code but stays unchanged in the comment.
- A return type changes, but the documentation still points to the old model.
- A new exception is raised, yet it's nowhere to be found in the
Raisessection. - An MCP tool description no longer reflects what the tool actually does — making it invisible to LLMs.
docguard acts as an automated Peer Reviewer that specialises in documentation quality.
| Feature | Description |
|---|---|
| 🧠 Semantic Analysis | Deep logic verification using OpenAI, Gemini, or Ollama |
CRITICAL / ERROR / WARNING / INFO — graduate what fails your build |
|
| 🔗 MCP-Aware Mode | Specialized analysis for @*.tool decorated functions |
| ⚡ Streaming Interface | Results appear as the first function is analysed — no waiting |
| 🚀 Parallel File I/O | ThreadPoolExecutor scans large directories concurrently |
| 💾 Persistent Cache | Content-hashed results; unchanged functions are never re-analysed |
| 🪙 Token Budget | Smart truncation for large functions — keeps signatures and key statements |
| 🛠️ Interactive Fixes | Review and apply surgical docstring patches with docguard fix |
| 🔁 Smart Retry | Not satisfied? Provide a hint and ask the LLM again |
| 💡 Generate Docstrings | Auto-generate Google-style docstrings for undocumented functions |
# Clone the repository
git clone https://github.com/skalogerakis/docguard.git
cd docguard
# Set up a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install runtime dependencies
pip install -e .
# Install development dependencies (tests + linting)
pip install -e ".[dev]"| Provider | Requirement |
|---|---|
| Ollama (default) | Install Ollama and run ollama pull llama3 |
| OpenAI | OPENAI_API_KEY environment variable |
| Gemini | GEMINI_API_KEY environment variable |
docguard reads configuration from a .env file (or any file passed via --config) and from environment variables.
# ── Provider API keys ──────────────────────────────────────────────
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...
# ── Model overrides (optional) ─────────────────────────────────────
OPENAI_MODEL=gpt-4o-mini
GEMINI_MODEL=gemini-2.5-flash
OLLAMA_MODEL=llama3
# ── Engine ─────────────────────────────────────────────────────────
MAX_CONCURRENCY=5 # Parallel LLM calls per run
# ── Analysis ───────────────────────────────────────────────────────
# Minimum severity that causes `check` to exit 1.
# Accepts: critical | error | warning | info
FAIL_ON=error
# Token budget per function (4 chars ≈ 1 token; default 8000 ≈ 2000 tokens)
MAX_FUNCTION_CHARS=8000
# ── Display ────────────────────────────────────────────────────────
# Max failures shown inline before the truncation notice
SHOW_MENU_LIMIT=50
# Cache directory (relative to cwd)
CACHE_DIR=.docguard_cache
# Extra directories to exclude (comma-separated)
EXTRA_EXCLUDE_DIRS=scripts,migrationsAll settings can also be exported as shell environment variables — .env is a convenience.
docguard assigns a severity to every discrepancy so you can control exactly what breaks your build:
| Severity | When it applies | Default --fail-on |
|---|---|---|
critical |
Function does the opposite of what is documented | ✅ always fails |
error |
Wrong parameter names/types, wrong return type (default) | ✅ always fails |
warning |
Minor inaccuracy, incomplete description | ❌ unless --fail-on warning |
info |
Cosmetic improvement only | ❌ unless --fail-on info |
# Verify docstrings in a single file (uses local Ollama — free)
docguard check src/my_module.py
# Scan a directory with Gemini, fail only on CRITICAL issues
docguard check src/ --provider gemini --fail-on critical
# Generate docstrings for undocumented functions
docguard suggest src/ --provider ollama --model llama3
# Interactively review and apply fixes
docguard fix src/ --provider openai
# Analyse only MCP tool functions
docguard check src/mcp_server.py --mcp --provider geminiAnalyses every documented function and flags any discrepancy between the docstring and the implementation.
Usage: docguard check [OPTIONS] PATH
Verify that every docstring accurately reflects its function's implementation.
Arguments:
PATH Path to the Python file or directory to lint [required]
Options:
--provider TEXT LLM provider: gemini, openai, ollama [default: ollama]
-m, --model TEXT Model name, overrides provider default
-f, --force Ignore cache and re-analyse everything
--show-all Show all failures (no cap)
-v, --verbose Show per-function timing and token usage (+ cache path)
--fail-on TEXT Minimum severity to fail on: critical|error|warning|info [default: error]
--mcp MCP mode: only check @*.tool decorated functions
-e, --exclude TEXT Extra directory to exclude (repeatable)
--config TEXT Path to a custom .env file
--help Show this message and exit
Exit codes:
0— All docstrings pass (or no failures above the--fail-onthreshold).1— One or more discrepancies found above threshold, or a fatal error occurred.
Examples:
# Quick check with local Ollama (no API key needed)
docguard check src/
# Fail only on critical or error severity issues
docguard check src/ --provider gemini --fail-on error
# Strict mode — fail on anything at all
docguard check src/ --provider openai --fail-on info
# MCP-only mode with verbose output
docguard check src/mcp_server.py --mcp --provider gemini --verbose
# Exclude generated code and migrations
docguard check src/ --exclude migrations --exclude generated --provider ollama
# Load a project-specific config file
docguard check src/ --config .env.production
# Force re-analysis, show all failures
docguard check src/ --provider openai --force --show-allSample output:
╭─────────────────────────────╮
│ DocGuard 🛡️ - Initializing │
╰─────────────────────────────╯
🤖 Provider: Ollama | Model: llama3
📂 Scanning: src/
💾 Cache: .docguard_cache
🧠 Analysis with ollama…
❌ src/api/users.py::get_user [error] (line 14)
Reason: Docstring says "fetches by email" but implementation uses user_id
Suggested Fix: Fetch a user record by their integer ID.
Args:
user_id: The unique integer identifier.
Returns:
A dict containing the user's profile data.
┌──────────────────┬───────┐
│ Total Functions │ 42 │
│ Cached (Skipped) │ 38 │
│ New (Processed) │ 4 │
│ Failed │ 1 │
└──────────────────┴───────┘
❌ Found 1 docstring discrepancy.
When --mcp is passed, docguard switches to MCP mode:
- Scans only functions decorated with
@mcp.tool,@server.tool,@app.tool,@tool, or any@*.toolpattern. - Uses a MCP-specialized LLM prompt that evaluates:
- Discoverability — is the summary specific enough that an LLM knows when to call this tool?
- Parameter completeness — are all args described with type and purpose?
- Return value accuracy — does the doc describe the actual output format?
- Exception transparency — are failure modes documented?
- Results include a
discoverability_issueslist with specific problems. - Uses a separate cache namespace (
mcp) so results don't interfere with standard checks.
# Analyse MCP tool descriptions
docguard check src/mcp_server.py --mcp --provider gemini
# Strict MCP check — fail on any warning
docguard check src/mcp_server.py --mcp --fail-on warning --provider openaiSample MCP output:
🔗 MCP mode: scanning only @*.tool functions
🧠 MCP tool analysis with gemini…
🔗 MCP src/mcp_server.py::search_documents [warning] (line 12)
Reason: Summary is too vague — an LLM cannot distinguish this from other search tools
Suggested Fix: Search the document store by keyword and return ranked results.
...
MCP Issues:
• Summary does not clarify the scope of search (title-only vs full-text)
• Missing Returns section — LLM cannot interpret the output format
Finds all undocumented functions and generates accurate, Google-style docstrings.
Usage: docguard suggest [OPTIONS] PATH
Generate Google-style docstring suggestions for undocumented functions.
Arguments:
PATH Path to the Python file or directory to scan [required]
Options:
--provider TEXT LLM provider: gemini, openai, ollama [default: ollama]
-m, --model TEXT Model name, overrides provider default
-f, --force Ignore cached suggestions and regenerate
-v, --verbose Show per-function timing and token usage
-e, --exclude TEXT Extra directory to exclude (repeatable)
--config TEXT Path to a custom .env file
--help Show this message and exit
Note:
suggestonly prints suggestions — it does not modify your files. Usedocguard fixto apply them.
Examples:
# Generate docstrings for everything missing one
docguard suggest src/ --provider gemini
# Use a specific Ollama model
docguard suggest src/ --provider ollama --model gemma3:4b
# Regenerate even if suggestions are cached
docguard suggest src/ --provider openai --forceRuns the full analysis, then presents each failed function one by one for review and patching.
Usage: docguard fix [OPTIONS] PATH
Interactively review and apply docstring fixes.
Arguments:
PATH Path to the Python file or directory to fix [required]
Options:
--provider TEXT LLM provider: gemini, openai, ollama [default: ollama]
-m, --model TEXT Model name, overrides provider default
-f, --force Ignore cache and re-analyse everything
--auto-apply Apply all fixes without interactive prompting
-v, --verbose Show per-function timing and token usage
-e, --exclude TEXT Extra directory to exclude (repeatable)
--config TEXT Path to a custom .env file
--help Show this message and exit
Interactive prompt options:
| Input | Action |
|---|---|
y or Enter |
Apply the suggested fix to the source file |
n |
Skip this function — leave it unchanged |
r |
Retry — ask the LLM again, optionally with a hint |
Examples:
# Interactive review
docguard fix src/ --provider gemini
# Fully automated — apply every fix (CI-friendly)
docguard fix src/ --provider openai --auto-apply
# Re-analyse everything and auto-apply
docguard fix src/ --force --auto-apply --provider geminiThe retry flow:
❌ src/api/users.py::get_user [error] (line 14)
Issue: Docstring says "fetches by email" but uses user_id
Fix: Fetch a user by their unique integer ID.
[y]es apply / [n]o skip / [r]etry with LLM: r
Hint for the LLM (press Enter to skip): mention the dict structure of the return value
🔄 Retrying…
New fix: Fetch a user record by their integer ID.
Args:
user_id: The unique integer identifier.
Returns:
A dict with keys 'id', 'name', and 'email'.
[y]es apply / [n]o skip / [r]etry with LLM: y
✓ Applied.
Wipes all cached analysis and generation results (all namespaces: check, suggest, mcp).
docguard cache-clearTip: Cache files are stored in
.docguard_cache/by default (configurable viaCACHE_DIR). Add it to.gitignore.
docguard version
# DocGuard v0.1.0When using MCP (Model Context Protocol) servers, the docstring is the interface — it's what the LLM reads to decide whether to invoke a tool and how.
A stale or inaccurate MCP tool description means:
- The LLM cannot find the tool when it should be used.
- The LLM invokes the tool with the wrong arguments.
- The tool's output is misinterpreted.
docguard automatically detects all common MCP tool patterns:
@mcp.tool # FastMCP
@mcp.tool() # FastMCP with kwargs
@server.tool # Custom server instances
@app.tool # App-style servers
@tool # Bare decorator# Verify your MCP server's tool descriptions
docguard check src/mcp_server.py --mcp --provider gemini
# Auto-fix any drifted descriptions
docguard fix src/mcp_server.py --provider gemini --auto-apply
# Strict CI check — fail on any warning
docguard check src/mcp_server.py --mcp --fail-on warning --provider openai@mcp.tool
def search_documents(query: str, limit: int = 10) -> list[dict]:
"""
Search the document store for entries matching the query string.
Args:
query: The search term to match against document titles and content.
limit: Maximum number of results to return.
Returns:
A list of matching document dicts, each with 'id', 'title', and 'score'.
Raises:
ValueError: If query is empty or limit is less than 1.
"""
...# Run all tests (unit + e2e, mocked — no API calls)
pytest tests/
# Verbose output
pytest tests/ -v
# Unit tests only
pytest tests/unit/
# E2E tests only
pytest tests/e2e/
# Run live tests against a real LLM (requires Ollama running)
pytest tests/ --run-live-llm --live-llm-provider ollama
# Live tests with a specific provider
pytest tests/ --run-live-llm --live-llm-provider geminiThe live test flag (
--run-live-llm) is intentionally off by default to avoid costs and network dependencies.
# .github/workflows/docguard.yml
name: DocGuard — Documentation Lint
on: [push, pull_request]
jobs:
docguard:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install DocGuard
run: pip install -e .
- name: Install and start Ollama
run: |
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
sleep 5
ollama pull llama3
- name: Run DocGuard check
run: docguard check src/ --provider ollama --fail-on error - name: Run DocGuard check
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: docguard check src/ --provider openai --model gpt-4o-mini --fail-on error - name: Verify MCP tool descriptions
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
# Fail if any MCP tool has a WARNING or worse description
run: docguard check src/mcp_server.py --mcp --provider gemini --fail-on warningAdd to .pre-commit-config.yaml:
repos:
- repo: local
hooks:
- id: docguard
name: DocGuard — Docstring Lint
entry: docguard check
args: [src/, --provider, ollama, --fail-on, error]
language: system
types: [python]
pass_filenames: false
- id: docguard-mcp
name: DocGuard — MCP Tool Lint
entry: docguard check
args: [src/mcp_server.py, --mcp, --provider, ollama, --fail-on, warning]
language: system
types: [python]
pass_filenames: falsesrc/docguard/
├── main.py # Entrypoint — imports from cli package
├── constants.py # Shared constants
├── cli/ # CLI commands (Typer + Rich)
│ ├── __init__.py # Typer app + command registration
│ ├── shared.py # Console, _build_engine, print helpers
│ ├── check.py # `check` command + _run_check async core
│ ├── suggest.py # `suggest` command + _run_suggest async core
│ ├── fix.py # `fix` command + interactive loop helpers
│ └── misc.py # `cache-clear`, `version`
├── core/
│ ├── config.py # Pydantic settings (env + .env)
│ ├── engine.py # Async streaming orchestration engine
│ └── exceptions.py # Custom exception hierarchy
├── analysis/
│ ├── cache.py # Content-hashed DiskCache (schema-versioned)
│ └── parser.py # Tree-sitter single-pass AST parser (parallel I/O)
├── llm/
│ ├── base.py # BaseLLMProvider ABC
│ ├── factory.py # Provider registry + instantiation
│ ├── protocol.py # LLMProviderProtocol (structural typing)
│ ├── gemini.py # Google Gemini provider
│ ├── openai.py # OpenAI provider
│ ├── ollama.py # Ollama (local) provider
│ └── prompts/ # Prompt builders (split by concern)
│ ├── __init__.py # Re-exports for backward compat
│ ├── check.py # Accuracy analysis prompts + severity guide
│ ├── suggest.py # Docstring generation prompts
│ └── mcp.py # MCP-specialised discoverability prompts
├── models/
│ ├── entity.py # CodeEntity dataclass
│ └── schema.py # Pydantic schemas: DocstringAnalysis (+ Severity),
│ # DocstringGeneration, MCPToolAnalysis
├── output/ # Output format adapters (stub — planned)
│ └── __init__.py # planned: terminal.py, sarif.py, html_report.py
└── utils/
├── patcher.py # Surgical docstring insertion/replacement
└── timer.py # perf_counter context manager
| Property | How it's achieved |
|---|---|
| Single-pass parsing | _parse_all returns (documented, undocumented, mcp_tools) in one tree traversal |
| Parallel file I/O | ThreadPoolExecutor in _scan_parallel — tree-sitter C extension releases the GIL |
| Token budget | _smart_truncate keeps signature + return/raise lines; body middle is dropped |
| Cache safety | Keys hash only code + docstring; renames don't bust the cache; _SCHEMA_VERSION prefix auto-busts on schema changes |
| Severity filtering | severity_exceeds_threshold(severity, fail_on) — O(1) rank comparison |
| MCP namespace | mcp_check_stream uses namespace="mcp" so MCP and standard results never collide |
- Create
src/docguard/llm/my_provider.pyimplementingBaseLLMProvider._call_raw. - Register it in
_PROVIDERSdict infactory.py. - Add
my_provider_model: str = "default-model"toDocGuardConfig.
The provider automatically inherits analyze, generate, retry, and mcp_analyze from BaseLLMProvider.
-
docguard check --since HEAD~1— Git-aware incremental analysis (only changed functions) - SARIF output —
--format sariffor native GitHub Actions inline PR annotations - HTML report —
--report-htmlfor human-readable summary with trend comparison -
docguard init— Generate[tool.docguard]config block inpyproject.toml - Cross-tool disambiguation — Detect MCP tools with overlapping descriptions (cosine similarity)
- PyPI release —
pip install docguardfrom the public registry
- Fork the repository.
- Branch (
git checkout -b feature/my-improvement). - Implement your change with tests.
- Verify:
pytest tests/ && ruff check src/ - PR — include a clear description of the problem and solution.
Distributed under the Apache License 2.0. See LICENSE for the full text.