Release: AKD v0.1.1 by NISH1001 · Pull Request #420 · NASA-IMPACT/akd-core

NISH1001 · 2026-04-14T18:18:02Z

Summary

Patch release v0.1.1 — adds per-category logprob scoring to the multi-risk Granite Guardian tool and consolidates version management.

Per-category logprob scores: MultiRiskGraniteGuardianTool Step 2 now requests token logprobs from Ollama and derives a score = exp(first_token_logprob) per detected harm category, giving callers a real numeric confidence signal (e.g., Violence=0.97 vs Harmful=0.35)
Score threshold filtering: New score_threshold config field (0.0–1.0, default 0.0) to optionally drop low-confidence category detections and reduce false positives
Single version source: Replaced hardcoded __version__ in akd/__init__.py with importlib.metadata.version("akd") — version is now sourced solely from pyproject.toml

Changes since v0.1.0

abd39dd Consolidate _parse_categories into single dict-returning function
0ea08ee Add per-category score to MultiRiskGraniteGuardianTool via Step 2 logprobs
7fc0863 Bump version to 0.1.1, use importlib.metadata for single version source

Test plan

Verify CI passes on this PR
Confirm all existing tests pass (uv run pytest)
uv run python -c "import akd; print(akd.__version__)" prints 0.1.1

…probs ### What changes I have done - Added `score_threshold` config field to `MultiRiskGraniteGuardianToolConfig` to optionally drop low-confidence detections and reduce false positives. Defaults to `0.0` (no filter) to preserve current behavior. - Each entry in `risk_results` now includes a per-category `score` in [0, 1], derived from the logprob of the category's first emitted token in Step 2. - Categories whose per-category score is below `score_threshold` are dropped from `detected_risks` and `risk_results`. Categories without a score (e.g., when Ollama does not return logprobs) pass through unfiltered as a graceful fallback. - Step 1 decision remains label-based (`Yes`/`No`) as per the model card; no extra Ollama calls are added. ### Why The multi-harm model's self-reported text confidence (e.g., `"High"`, `"Not Harmful"`) is effectively binary in practice and useless for thresholding. False positives on some categories (e.g., `Harmful` flagged on sarcasm) could not be filtered without manual category exclusion. Per-category logprob-derived scores give callers a real numeric signal to threshold on (e.g., `Violence=0.97` vs `Harmful=0.35` for the same input). ### How I made the changes - `akd/guardrails/providers/granite_guardian.py`: - `MultiRiskGraniteGuardianToolConfig`: added `score_threshold: float = 0.0` with `ge=0.0, le=1.0` validation. - `_call_category_detection`: added top-level `logprobs: True` and `top_logprobs: 5` to the Ollama `/api/generate` request body (Ollama accepts these as top-level params, not inside `options`). - `_parse_categories_with_scores`: new helper that parses comma-separated categories and computes per-category scores as `exp(first_token_logprob)`. - `_first_token_logprob_per_category`: new static helper that walks the token stream, skipping whitespace/commas, and returns the logprob of the first token of each emitted category. - `_parse_categories`: kept as a thin wrapper over `_parse_categories_with_scores` for backward compatibility. - `_arun`: applies `score_threshold` as a filter in the Step 2 detected-categories list comprehension; builds `risk_results` with `{"is_risky": True, "score": <float|None>}` per category. ### How to test - `uv run pytest tests/guardrails/` — existing tests still pass (8 failures in `test_granite_think.py` are pre-existing and require a live `granite3.3-guardian:8b` model). - `uv run python scripts/test_multi_harm.py` with live Ollama + `granite-guardian-3.2-5b-multi-harm-GGUF` — verifies per-category scores appear in `risk_results` (e.g., Violence=0.97, Unethical Behavior=0.76 for the same violent input; Harmful=0.35 on sarcasm, correctly flagged as low-confidence).

### What - Merged `_parse_categories` and `_parse_categories_with_scores` into a single `_parse_categories` method that returns `dict[GraniteHarmCategory, float | None]` (category -> per-category score). - Removed the redundant `"scores"` key from the Step 2 return dict; `"categories"` now holds both the categories and their scores as a dict mapping. - `unfiltered_categories` in `extra` now preserves per-category scores alongside the category list (previously was a bare list). ### Why The previous split had `_parse_categories` as a thin list-returning wrapper over `_parse_categories_with_scores` purely for backward compatibility, but `_parse_categories` was only called internally and had no external consumers — dead weight. A single dict return is also a more natural fit: `risk_results` is already a dict of category -> metadata, and downstream consumers need both iteration and score lookup. Dicts preserve insertion order in Python 3.7+, so the model's emission order is kept. ### How - `akd/guardrails/providers/granite_guardian.py`: - `_parse_categories`: now takes optional `token_logprobs`, returns `dict[GraniteHarmCategory, float | None]`. When `token_logprobs` is None/empty, scores are `None` (same behavior as the pre-logprobs version, just wrapped in dict keys instead of a list). - `_call_category_detection`: returns `{"categories": <dict>, "raw_response": ...}` (removed the separate `"scores"` key). - `_arun`: renamed local from `per_category_scores` to `category_scores`; iterates `category_scores.items()` directly in the filter comprehension; passes the whole `category_scores` dict as `extra["unfiltered_categories"]` to preserve scores in observability output. ### How to test - `uv run pytest tests/guardrails/ --ignore=tests/guardrails/test_granite_think.py` — all 21 tests pass. - `uv run python scripts/test_multi_harm.py` with live Ollama — confirms `risk_results` still has `score` per category and `extra["unfiltered_categories"]` now includes scores.

enhance: add per-category score to MultiRiskGraniteGuardianTool via logprobs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bump version to 0.1.1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sync uv.lock with v0.1.1

github-actions · 2026-04-14T18:25:51Z

❌ Tests failed (exit code: 1)

📊 Test Results

Passed: 579
Failed: 2
Skipped: 39
Warnings: 184
Coverage: 76%

Branch: develop
PR: #420
Commit: 5fc883c

📋 Full coverage report and logs are available in the workflow run.

github-actions · 2026-04-14T18:30:44Z

❌ Tests failed (exit code: 1)

📊 Test Results

Passed: 579
Failed: 2
Skipped: 39
Warnings: 182
Coverage: 76%

Branch: develop
PR: #420
Commit: 0c3c93f

📋 Full coverage report and logs are available in the workflow run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add fallback for __version__ when running from source

github-actions · 2026-04-14T18:47:45Z

❌ Tests failed (exit code: 1)

📊 Test Results

Passed: 579
Failed: 2
Skipped: 39
Warnings: 184
Coverage: 76%

Branch: develop
PR: #420
Commit: d70508e

📋 Full coverage report and logs are available in the workflow run.

NISH1001 and others added 5 commits April 13, 2026 16:45

Merge pull request #418 from NASA-IMPACT/feature/logprobs-granite

01bf23a

enhance: add per-category score to MultiRiskGraniteGuardianTool via logprobs

Bump version to 0.1.1, use importlib.metadata for single version source

7fc0863

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #419 from NASA-IMPACT/release/v0.1.1

984b428

Bump version to 0.1.1

NISH1001 temporarily deployed to integration April 14, 2026 18:18 — with GitHub Actions Inactive

NISH1001 requested review from rohit-sahoo and sanzog03 April 14, 2026 18:18

NISH1001 and others added 2 commits April 14, 2026 13:21

Sync uv.lock with v0.1.1

af09300

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #421 from NASA-IMPACT/chore/sync-uv-lock

f32bce8

Sync uv.lock with v0.1.1

NISH1001 temporarily deployed to integration April 14, 2026 18:21 — with GitHub Actions Inactive

sanzog03 approved these changes Apr 14, 2026

View reviewed changes

sanzog03 reviewed Apr 14, 2026

View reviewed changes

Comment thread akd/__init__.py Outdated

sanzog03 reviewed Apr 14, 2026

View reviewed changes

Comment thread akd/guardrails/providers/granite_guardian.py

NISH1001 and others added 2 commits April 14, 2026 13:38

Add fallback for __version__ when running from source

213255a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #422 from NASA-IMPACT/fix/version-fallback

0986578

Add fallback for __version__ when running from source

NISH1001 temporarily deployed to integration April 14, 2026 18:39 — with GitHub Actions Inactive

NISH1001 merged commit 079c3a1 into main Apr 14, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release: AKD v0.1.1#420

Release: AKD v0.1.1#420
NISH1001 merged 9 commits into
mainfrom
develop

NISH1001 commented Apr 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NISH1001 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes since v0.1.0

Test plan

Uh oh!

github-actions Bot commented Apr 14, 2026

📊 Test Results

Uh oh!

github-actions Bot commented Apr 14, 2026

📊 Test Results

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 14, 2026

📊 Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NISH1001 commented Apr 14, 2026 •

edited

Loading