Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
71a59c7
feat(core): implement context-aware VSM resolution + ReDoS canary (ZR…
PythonWoods-Dev Apr 4, 2026
05ae6ac
feat(security): integrate Shield with SentinelReporter and Exit Code …
PythonWoods-Dev Apr 4, 2026
975fdc8
tests(security): ShieldReportingIntegrity suite + mutmut infrastructu…
PythonWoods-Dev Apr 4, 2026
120ce60
docs(security): bilingual parity, CHANGELOG a4 final, mutation gate d…
PythonWoods-Dev Apr 4, 2026
76b935f
release: bump version 0.5.0a3 → 0.5.0a4
PythonWoods-Dev Apr 4, 2026
7f3d672
docs(internal): add registry for architectural gaps and tech debt
PythonWoods-Dev Apr 4, 2026
f4f2736
feat(cli,docs): per-command ok_message and Sentinel homepage panels
PythonWoods-Dev Apr 7, 2026
f132c44
docs: tighten sentinel demo density and align summary geometry
PythonWoods-Dev Apr 7, 2026
38be6f1
fix(scanner,cli): resolve ZRT-005 Bootstrap Paradox
PythonWoods-Dev Apr 8, 2026
f4134fe
docs(adr,i18n): add ADR 003, PSR section in CONTRIBUTING.it.md, nav t…
PythonWoods-Dev Apr 8, 2026
1c82d2d
CONTRIBUTING.md / CONTRIBUTING.it.md:
PythonWoods-Dev Apr 8, 2026
3c0b1c1
docs: checks.md Shield+References, arch_gaps ZRT-005, architecture.md…
PythonWoods-Dev Apr 8, 2026
9f437b5
feat(engine): Blood Sentinel exit 3, graph integrity Θ(V+E), --show-i…
PythonWoods-Dev Apr 8, 2026
023c4cf
test: TestShowInfoFilter, Shield hex-payload, CLI --show-info coverage
PythonWoods-Dev Apr 8, 2026
25e0dde
examples: Shield comment block in all configs, safety_demonstration.md
PythonWoods-Dev Apr 8, 2026
e28dcab
chore(release): v0.5.0a4 CHANGELOG and pre-release audit package
PythonWoods-Dev Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
.claude/
.agent/
.deepcode/
.redteam/

# ────────────────────────────────────────────────────────────────────────────
# Python
Expand Down Expand Up @@ -81,6 +82,8 @@ coverage-*.json
.tox/
.nox/
mutmut*
!mutmut_pytest.ini
# ↑ Keep mutmut_pytest.ini tracked: isolated pytest config for the mutation session.
.mutmut-cache/
mutants/

Expand Down
133 changes: 133 additions & 0 deletions CHANGELOG.it.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
<!-- SPDX-FileCopyrightText: 2026 PythonWoods <dev@pythonwoods.dev> -->
<!-- SPDX-License-Identifier: Apache-2.0 -->
<!-- markdownlint-disable MD024 -->
# Registro delle modifiche

Tutte le modifiche rilevanti a Zenzic sono documentate qui.
Il formato segue [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
Le versioni seguono il [Semantic Versioning](https://semver.org/).

---

## [Non rilasciato]

## [0.5.0a4] — 2026-04-08 — Il Sentinel Indurito: Sicurezza & Integrità

> **Rilascio Alpha 4.** Quattro vulnerabilità confermate chiuse (ZRT-001–004), tre
> nuovi pilastri di hardening aggiunti (Sentinella di Sangue, Integrità del Grafo,
> Scudo Esadecimale), e piena parità documentale bilingue raggiunta. In attesa di
> revisione manuale prima della promozione a Release Candidate.
>
> Branch: `fix/sentinel-hardening-v0.5.0a4`

### Aggiunto

- **Integrità del grafo — rilevamento link circolari.** Zenzic ora pre-calcola
un registro dei cicli (Fase 1.5) tramite ricerca depth-first iterativa (Θ(V+E))
sul grafo dei link interni risolti. Ogni link il cui target appartiene a un ciclo
emette un finding `CIRCULAR_LINK` con severità `info`. I link di navigazione
reciproca (A ↔ B) sono una struttura valida della documentazione; il finding è
puramente informativo — non influisce mai sugli exit code in modalità normale o
`--strict`. O(1) per query in Phase 2. Le Ghost Route (URL canonici generati da
plugin senza file sorgente fisico) sono correttamente escluse dal grafo dei cicli.

- **`INTERNAL_GLOSSARY.toml`** — registro bilingue EN↔IT dei termini tecnici
(15 voci) per un vocabolario coerente tra documentazione inglese e italiana. Copre
i concetti principali: Porto Sicuro, Rotta Fantasma, Mappa del Sito Virtuale,
Motore a Due Passaggi, Scudo, Sentinella di Sangue e altri. Mantenuto da S-0.
Tutti i termini con `stable = true` richiedono un ADR prima della rinomina.

- **Parità documentale bilingue.** `docs/checks.md` e `docs/it/checks.md` aggiornati
con le sezioni Sentinella di Sangue, Link Circolari e Scudo Esadecimale.
`CHANGELOG.it.md` creato. Piena parità EN↔IT applicata per il Protocollo di
Parità Bilingue.

### ⚠️ Sicurezza

- **Sentinella di Sangue — classificazione degli attraversamenti di percorso (Exit Code 3).**
`check links` e `check all` ora classificano i finding di path-traversal per
intenzione. Un href che esce da `docs/` e si risolve in una directory di sistema
del SO (`/etc/`, `/root/`, `/var/`, `/proc/`, `/sys/`, `/usr/`) viene classificato
come `PATH_TRAVERSAL_SUSPICIOUS` con severità `security_incident` e attiva
l'**Exit Code 3** — un nuovo exit code dedicato riservato alle sonde del sistema
host. L'Exit 3 ha priorità sull'Exit 2 (violazione credenziali) e non viene mai
soppresso da `--exit-zero`. Gli attraversamenti fuori confine ordinari (es.
`../../repo-adiacente/`) restano `PATH_TRAVERSAL` con severità `error` (Exit Code 1).

- **Scudo Esadecimale — rilevamento di payload hex-encoded.**
Un nuovo pattern built-in dello Shield, `hex-encoded-payload`, rileva sequenze di
tre o più escape hex `\xNN` consecutive (`(?:\\x[0-9a-fA-F]{2}){3,}`). La soglia
`{3,}` evita falsi positivi sulle singole escape hex comuni nella documentazione
delle regex. I finding escono con codice 2 (Shield, non sopprimibile) e si
applicano a tutti i flussi di contenuto inclusi i blocchi di codice delimitati.

- **[ZRT-001] Shield Blind Spot — Bypass YAML Frontmatter (CRITICO).**
`_skip_frontmatter()` veniva usato come sorgente di righe dello Shield,
scartando silenziosamente ogni riga nel blocco YAML `---` del file prima che
il motore regex girasse. Qualsiasi coppia chiave-valore (`aws_key: AKIA…`,
`github_token: ghp_…`) era invisibile allo Shield.
**Fix:** Il flusso Shield ora usa `enumerate(fh, start=1)` grezzo — ogni byte
del file viene scansionato. Il flusso contenuto usa ancora `_iter_content_lines()`
con salto del frontmatter per evitare falsi positivi da valori di metadati.
Architettura **Dual-Stream**.

- **[ZRT-002] ReDoS + Deadlock ProcessPoolExecutor (ALTO).**
Un pattern `[[custom_rules]]` come `^(a+)+$` superava il controllo
`_assert_pickleable()` e veniva distribuito ai worker process senza timeout.
**Due difese aggiunte:**
— *Canary (prevenzione):* `_assert_regex_canary()` stress-testa ogni pattern
`CustomRule` sotto un watchdog `signal.SIGALRM` di 100 ms. I pattern ReDoS
sollevano `PluginContractError` prima della prima scansione.
— *Timeout (contenimento):* `ProcessPoolExecutor.map()` sostituito con
`submit()` + `future.result(timeout=30)`.

- **[ZRT-003] Bypass Shield Split-Token — Offuscamento Tabelle Markdown (MEDIO).**
Il separatore `|` delle tabelle Markdown spezzava i token segreti su più celle.
**Fix:** Le righe di tabella vengono de-pipe prima della scansione Shield.

- **[ZRT-004] Injection Path Traversal nei Link Reference (BASSO).**
Link reference con href malevoli potevano sfuggire alla sandbox `docs/`.
**Fix:** La validazione PATH_TRAVERSAL applicata ai link reference come ai link
inline.

## [0.5.0a3] — 2026-03-28 — Il Sentinel: Plugin, Regole Adattive, Hooks Pre-commit

> Branch: `feat/sentinel-v0.5.0a3`

### Aggiunto

- **Sistema Plugin** — `[[custom_rules]]` in `zenzic.toml` per regole regex
personalizzate. `PluginContractError` per la validazione contratto a boot.
- **Regex Canary** — watchdog SIGALRM 100 ms per backtracking catastrofico.
- **Hooks Pre-commit** — configurazione ufficiale per pipeline CI.
- **UI Sentinel** — palette colori, reporter a griglia, output Sentinel rinnovato.

## [0.5.0a1] — 2026-03-15 — Il Sentinel: Motore Adattivo delle Regole

> Branch: `feat/sentinel-v0.5.0a1`

### Aggiunto

- **AdaptiveRuleEngine** — motore di analisi estensibile con Phase 3.
- **Hybrid Adaptive Engine** — integrazione MkDocs + motore adattivo.
- **Pannelli Sentinel** — output strutturato per tutti i controlli.

## [0.4.0] — 2026-03-01 — Il Grande Disaccoppiamento

> Branch: `feat/engine-decoupling`

### Aggiunto

- **Factory entry-point dinamica** — `--engine` CLI flag; protocollo
`has_engine_config`.
- **InMemoryPathResolver** — resolver agnostico rispetto al motore.
- **Tower of Babel Guard** — fallback i18n per ancora mancante nella locale.

## [0.3.0] — 2026-02-15 — Two-Pass Pipeline

### Aggiunto

- **Two-Pass Engine** — Phase 1 (I/O parallelo) + Phase 2 (validazione O(1)).
- **Virtual Site Map (VSM)** — proiezione logica del sito renderizzato.
- **Shield** — rilevamento segreti, Stream Dual, exit code 2.
- **Validazione anchor cross-lingua** — Tower of Babel Guard.
193 changes: 191 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,196 @@ Versions follow [Semantic Versioning](https://semver.org/).

## [Unreleased]

## [0.5.0a3] — 2026-04-03 — The Sentinel: Aesthetic Sprint, Parallel Anchors & Agnostic Target
## [0.5.0a4] — 2026-04-08 — The Hardened Sentinel: Security & Integrity

> **Alpha 4 Release.** Four confirmed vulnerabilities closed (ZRT-001–004), three
> new hardening pillars added (Blood Sentinel, Graph Integrity, Hex Shield), and
> full bilingual documentation parity achieved. Pending manual review before
> Release Candidate promotion.
>
> Branch: `fix/sentinel-hardening-v0.5.0a4`

### Added

- **Graph Integrity — circular link detection.** Zenzic now pre-computes a cycle
registry (Phase 1.5) via iterative depth-first search (Θ(V+E)) over the resolved
internal link graph. Any link whose target belongs to a cycle emits a `CIRCULAR_LINK`
finding at severity `info`. Mutual navigation links (A ↔ B) are valid documentation
structure and are expected; the finding is advisory only — it never affects exit
codes in normal or `--strict` mode. O(1) per-query in Phase 2. Ghost Routes
(plugin-generated canonical URLs without physical source files) are correctly
excluded from the cycle graph and cannot produce false positives.

- **`INTERNAL_GLOSSARY.toml`** — bilingual EN↔IT term registry (15 entries) for
consistent technical vocabulary across English and Italian documentation. Covers
core concepts: Safe Harbor, Ghost Route, Virtual Site Map, Two-Pass Engine, Shield,
Blood Sentinel, and more. Maintained by S-0. All terms marked `stable = true`
require an ADR before renaming.

- **Bilingual documentation parity.** `docs/checks.md` and `docs/it/checks.md`
updated with Blood Sentinel, Circular Links, and Hex Shield sections.
`CHANGELOG.it.md` created. Full English–Italian parity enforced per the
Bilingual Parity Protocol.

### ⚠️ Security

- **Blood Sentinel — system-path traversal classification (Exit Code 3).**
`check links` and `check all` now classify path-traversal findings by intent.
An href that escapes `docs/` and resolves to an OS system directory (`/etc/`,
`/root/`, `/var/`, `/proc/`, `/sys/`, `/usr/`) is classified as
`PATH_TRAVERSAL_SUSPICIOUS` with severity `security_incident` and triggers
**Exit Code 3** — a new, dedicated exit code reserved for host-system probes.
Exit 3 takes priority over Exit 2 (credential breach) and is never suppressed
by `--exit-zero`. Plain out-of-bounds traversals (e.g. `../../sibling-repo/`)
remain `PATH_TRAVERSAL` at severity `error` (Exit Code 1).

- **Hex Shield — hex-encoded payload detection.**
A new built-in Shield pattern `hex-encoded-payload` detects runs of three or
more consecutive `\xNN` hex escape sequences (`(?:\\x[0-9a-fA-F]{2}){3,}`).
The `{3,}` threshold avoids false positives on single hex escapes common in
regex documentation. Findings exit with code 2 (Shield, non-suppressible)
and apply to all content streams including fenced code blocks.

- **[ZRT-001] Shield Blind Spot — YAML Frontmatter Bypass (CRITICAL).**
`_skip_frontmatter()` was used as the Shield's line source, silently
discarding every line in a file's YAML `---` block before the regex
engine ran. Any key-value pair (`aws_key: AKIA…`, `github_token: ghp_…`)
was invisible to the Shield and would have exited `zenzic check all` with
code `0`.
**Fix:** The Shield stream now uses a raw `enumerate(fh, start=1)` —
every byte of the file is scanned. The content stream (ref-def harvesting)
still uses `_iter_content_lines()` with frontmatter skipping to avoid
false-positive link findings from metadata values. This is the
**Dual-Stream** architecture described in the remediation directives.
*Exploit PoC confirmed via live script: 0 findings before fix, correct
detection of AWS / OpenAI / Stripe / GitHub tokens after fix.*

- **[ZRT-002] ReDoS + ProcessPoolExecutor Deadlock (HIGH).**
A `[[custom_rules]]` pattern like `^(a+)+$` passed the eager
`_assert_pickleable()` check (pickle is blind to regex complexity) and
was distributed to worker processes. The `ProcessPoolExecutor` had no
timeout: any worker hitting a ReDoS-vulnerable pattern on a long input
line hung permanently, blocking the entire CI pipeline.
**Two defences added:**
— *Canary (prevention):* `_assert_regex_canary()` stress-tests every
`CustomRule` pattern against three canary strings (`"a"*30+"b"`, etc.)
under a `signal.SIGALRM` watchdog of 100 ms at `AdaptiveRuleEngine`
construction time. ReDoS patterns raise `PluginContractError` before the
first file is scanned. (Linux/macOS only; silently skipped on Windows.)
— *Timeout (containment):* `ProcessPoolExecutor.map()` replaced with
`submit()` + `future.result(timeout=30)`. A timed-out worker produces a
`Z009: ANALYSIS_TIMEOUT` `RuleFinding` instead of hanging the scan.
The new `_make_timeout_report()` and `_make_error_report()` helpers
ensure clean error surfacing in the standard findings UI.
*Exploit PoC confirmed: `^(a+)+$` on `"a"*30+"b"` timed out in 5 s;
both defences independently prevent scan lock-up.*

- **[ZRT-003] Split-Token Shield Bypass — Markdown Table Obfuscation (MEDIUM).**
The Shield's `scan_line_for_secrets()` ran each raw line through the
regex patterns once. A secret fragmented across backtick spans and a
string concatenation operator (`` `AKIA` + `1234567890ABCDEF` ``) inside
a Markdown table cell was never reconstructed, so the 20-character
contiguous `AKIA[0-9A-Z]{16}` pattern never matched.
**Fix:** New `_normalize_line_for_shield()` pre-processor in `shield.py`
unwraps backtick spans, removes concatenation operators, and collapses
table pipes before scanning. Both the raw line and the normalised form are
scanned; a `seen` set prevents duplicate findings when both forms match.

### Changed

- **[ZRT-004] Context-Aware VSM Resolution — `VSMBrokenLinkRule` (MEDIUM).**
`_to_canonical_url()` was a `@staticmethod` without access to the source
file's directory. Relative hrefs containing `..` segments (e.g.
`../../c/target.md` from `docs/a/b/page.md`) were resolved as if they
originated from the docs root, producing false negatives: broken relative
links in nested files were silently passed.
**Fix:** New `ResolutionContext` dataclass (`docs_root: Path`,
`source_file: Path`) added to `rules.py`. `BaseRule.check_vsm()` and
`AdaptiveRuleEngine.run_vsm()` accept `context: ResolutionContext | None`
(default `None` — fully backwards-compatible). `_to_canonical_url()` is
now an instance method that resolves `..` segments via `os.path.normpath`
relative to `context.source_file.parent` when context is provided, then
re-maps to a docs-relative posix path before the clean-URL transformation.
Paths that escape `docs_root` return `None` (Shield boundary respected).

- **[GA-1] Telemetry / Executor Worker Count Synchronisation.**
`ProcessPoolExecutor(max_workers=workers)` used the raw `workers` sentinel
(may be `None`) while the telemetry reported `actual_workers` (always an
integer). Both now use `actual_workers`, eliminating the divergence.

- **Stream Multiplexing** (`scanner.py`). `ReferenceScanner.harvest()`
now explicitly documents its two-stream design: **Shield stream** (all
lines, raw `enumerate`) and **Content stream** (`_iter_content_lines`,
frontmatter/fence filtered). Comments updated to make the architectural
intent visible to future contributors.

- **[Z-SEC-002] Secure Breach Reporting Pipeline (Commit 2).**
Four structural changes harden the path from secret detection to CI output:

— *Breach Panel (`reporter.py`):* findings with `severity="security_breach"`
render as a dedicated high-contrast panel (red on white) positioned before
all other findings. Surgical caret underlines (`^^^^`) are positioned using
the `col_start` and `match_text` fields added to `SecurityFinding`.

— *Surgical Secret Masking — `_obfuscate_secret()`:* raw secret material is
never passed to Rich or CI log streams. The function partially redacts
credentials (first 4 + last 4 chars; full redaction for strings ≤ 8 chars)
and is the **sole authorised path** for rendering secret values in output.

— *Bridge Function — `_map_shield_to_finding()` (`scanner.py`):* a single
pure function is the only authorised conversion point between the Shield
detection layer and `SentinelReporter`. Extracted as a standalone function
so that mutation testing can target it directly and unambiguously.

— *Post-Render Exit 2 (`cli.py`):* the security hard-stop is now applied
**after** `reporter.render()`, guaranteeing the full breach panel is
visible in CI logs before the process exits with code 2.

### Testing

- **`tests/test_redteam_remediation.py`** — 25 new tests organised in four
classes, one per ZRT finding:
- `TestShieldFrontmatterCoverage` (4 tests) — verifies Shield catches
AWS, GitHub, and multi-pattern secrets inside YAML frontmatter; confirms
correct line-number reporting; guards against false positives on clean
metadata.
- `TestReDoSCanary` (6 tests) — verifies canary rejects classic `(a+)+`
and alternation-based `(a|aa)+` ReDoS patterns at engine construction;
confirms safe patterns pass; verifies non-`CustomRule` subclasses are
skipped.
- `TestShieldNormalizer` (8 tests) — verifies `_normalize_line_for_shield`
unwraps backtick spans, removes concat operators, collapses table pipes;
verifies `scan_line_for_secrets` catches split-token AWS key; confirms
deduplication prevents double-emit when raw and normalised both match.
- `TestVSMContextAwareResolution` (7 tests) — verifies multi-level `..`
resolution from nested dirs, single `..` from subdirs, absent-from-VSM
still emits Z001, path-traversal escape returns no false Z001, backwards
compatibility without context, `index.md` directory mapping, and
`run_vsm` context forwarding.
- **`tests/test_rules.py`** — `_BrokenVsmRule.check_vsm()` updated to
accept the new `context=None` parameter (API compatibility fix).
- **731 tests pass.** Zero regressions. `pytest --tb=short` — all green.

- **`TestShieldReportingIntegrity` — Mutation Gate (Commit 3, Z-TEST-003).**
Three mandatory tests serving as permanent Mutation Gate guards for the
security reporting pipeline:
- *The Invisible:* `_map_shield_to_finding()` must always emit
`severity="security_breach"` — a downgrade to `"warning"` is caught
immediately (`assert 'warning' == 'security_breach'`).
- *The Amnesiac:* `_obfuscate_secret()` must never return the raw secret
— removing the redaction logic is caught immediately
(`assert raw_key not in output`).
- *The Silencer:* `_map_shield_to_finding()` must never return `None` —
a bridge function that discards findings is caught immediately
(`assert result is not None`).

**Manual verification (The Sentinel's Trial):** all three mutants were
applied by hand and confirmed killed. `mutmut` v3 automatic reporting was
blocked by an editable-install interaction (see `mutmut_pytest.ini`); manual
verification accepted per Architecture Lead authorisation (Z-TEST-003).
**28 tests in `test_redteam_remediation.py`, all green.**

## [0.5.0a4] — 2026-04-03 — The Sentinel: Aesthetic Sprint, Parallel Anchors & Agnostic Target

> **Sprint 13 + 14 + 15.** Three tracks delivered in one tag.
> Track A — Performance & SDK: deterministic two-phase anchor validation, `zenzic.rules` public
Expand Down Expand Up @@ -1397,7 +1586,7 @@ It has been superseded by the 0.5.x stabilization cycle.
<!-- ─── Reference link definitions ──────────────────────────────────────────── -->

[Unreleased]: https://github.com/PythonWoods/zenzic/compare/v0.5.0a3...HEAD
[0.5.0a3]: https://github.com/PythonWoods/zenzic/compare/v0.5.0a2...v0.5.0a3
[0.5.0a4]: https://github.com/PythonWoods/zenzic/compare/v0.5.0a2...v0.5.0a3
[0.5.0a2]: https://github.com/PythonWoods/zenzic/compare/v0.5.0a1...v0.5.0a2
[0.5.0a1]: https://github.com/PythonWoods/zenzic/compare/v0.4.0-rc5...v0.5.0a1
[0.4.0-rc5]: https://github.com/PythonWoods/zenzic/compare/v0.4.0-rc4...v0.4.0-rc5
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ abstract: >
scanner (the Shield). Built on pure functional principles in Python 3.11+, it operates
source-first — no build framework required — and integrates with any Markdown-based
documentation system via a plugin adapter protocol.
version: 0.5.0a3
version: 0.5.0a4
date-released: 2026-04-03
url: "https://zenzic.pythonwoods.dev/"
repository-code: "https://github.com/PythonWoods/zenzic"
Expand Down
Loading