MLGuard

Pre-deployment release-gate checks for ML models. Three checks, one command, pass or fail.

Why

Monitoring is useful after a model is live, but release decisions need checks before a new model reaches production. This project is a small release gate for that moment: it compares a candidate model and current data against a known baseline, then returns a clear pass, warn, or fail verdict.

It runs three checks:

Data drift — are the input features still distributed the same way? (PSI)
Performance regression — did accuracy/F1 drop compared to baseline?
Latency regression — is inference slower than before?

If any check fails, the CLI exits non-zero so it can block a CI/CD pipeline.

Quick start

git clone https://github.com/GoparapukethaN/mlguard.git
cd mlguard
pip install -e .

# create a baseline from your current model + data
mlguard baseline --model model.pkl --data reference.csv --target label

# check a new model/data against the baseline
mlguard check --model model.pkl --ref reference.csv --current new_data.csv --target label

mlguard check fails fast when the baseline file is missing. If you intentionally want a drift-only run while bootstrapping a baseline, pass --allow-missing-baseline.

Model files are loaded as local joblib/pickle-style artifacts. Only run MLGuard against trusted model files that you created or reviewed.

Output:

mlguard — pre-deployment release-gate checks

  Reference: 300 rows, Current: 300 rows
  Model: model.pkl
  Baseline: ./mlguard_baseline.json

  [1/3] Checking data drift...
    feature_0: PSI=0.4521 FAIL
    feature_1: PSI=0.8234 FAIL
    feature_2: PSI=0.0089 PASS
    feature_3: PSI=0.0124 PASS
    feature_4: PSI=0.2891 FAIL

  [2/3] Checking performance regression...
    accuracy: 0.9467 → 0.7900 (-16.6%) FAIL
    f1: 0.9467 → 0.7884 (-16.7%) FAIL

  [3/3] Checking inference latency...
    p95=1.65ms (baseline=1.97ms, -16.2%) PASS

  FAIL — 3 feature(s) with significant drift; performance regression detected

  Report saved to ./mlguard_report.md
  JSON report saved to ./mlguard_report.json

Exit code 1 on FAIL, 0 on PASS/WARN. The Markdown report is useful for review, and the JSON report is useful for CI annotations, dashboards, or release records.

The three checks

Data drift (PSI)

Population Stability Index compares the distribution of each feature between your reference data and the current data. If a feature's distribution shifted significantly (PSI > 0.2), it means the model is seeing data it wasn't trained on.

PSI < 0.1: no drift
PSI 0.1-0.2: moderate (WARN)
PSI > 0.2: significant (FAIL)

Performance regression

Loads the model, runs predictions on the current data, and compares accuracy/F1 against the saved baseline. If accuracy dropped more than 10%, something is wrong.

Drop < 5%: PASS
Drop 5-10%: WARN
Drop > 10%: FAIL

Latency regression

Times 100 single-sample predictions and compares p95 latency against the baseline. A jump in latency usually means something changed in preprocessing or the model architecture got bigger.

Increase < 15%: PASS
Increase 15-30%: WARN
Increase > 30%: FAIL

GitHub Actions

You can run the CLI directly in a deployment workflow:

- name: ML release gate
  run: |
    pip install -e .
    mlguard check \
      --model ./model.pkl \
      --ref ./data/reference.csv \
      --current ./data/latest.csv \
      --target label

The exit code blocks the pipeline on FAIL.

The included composite action accepts custom Markdown/JSON output paths and uploads both reports as a single mlguard-reports artifact.

- uses: GoparapukethaN/mlguard/action@main
  with:
    model-path: ./models/model.pkl
    reference-data: ./data/reference.csv
    current-data: ./data/current.csv
    baseline-path: ./mlguard_baseline.json
    target-column: target

Example

# run the included example (trains a model, simulates drift, runs checks)
pip install -e .
python examples/sklearn_example.py

Verified Locally

make verify

Current local verification: ruff clean, 26 passed, and the sklearn example produces Markdown and JSON release-gate reports with summary counts when drift and performance regression are simulated.

Sample report: docs/example-report.md Verification checklist: docs/verification.md Case study: docs/case-study.md

Running tests

pip install -e ".[dev]"
pytest tests/ -v

Limitations

Tested with sklearn-style estimators; other model wrappers can work if they expose .predict()
PyTorch .pt/.pth files are not supported yet; adding that should include a tested adapter from tensor inputs to model outputs
Model deserialization is for trusted local artifacts only
PSI needs at least 10 samples per feature to be meaningful
Latency check measures single-sample prediction time, not batched
No GPU-specific latency profiling (CPU only for now)
Baselines are JSON files — no database, no dashboard

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
action		action
docs		docs
examples		examples
mlguard		mlguard
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLGuard

Why

Quick start

The three checks

Data drift (PSI)

Performance regression

Latency regression

GitHub Actions

Example

Verified Locally

Running tests

Limitations

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLGuard

Why

Quick start

The three checks

Data drift (PSI)

Performance regression

Latency regression

GitHub Actions

Example

Verified Locally

Running tests

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages