feat(fread): image and PDF reading via fread-media engine by lliWcWill · Pull Request #38 · lliWcWill/fsuite

lliWcWill · 2026-04-29T08:52:34Z

fread media reading: from binary skip to document-aware agent I/O

This PR turns fread from a text-only reader that skipped binary files into a first-class document and media reader for agents. The original seam was small and blunt: fread detected binary content and stepped aside. This branch replaces that limitation with a layered media path that can inspect images, extract PDF text, optionally render PDF pages, return MCP-native image blocks, and record best-effort document memories without making the read path fragile.

It also became a full hardening episode for fsuite itself. The first version added the capability. The review rounds turned it into something sharper: byte-accurate budgets, UTF-8-safe truncation, deterministic failure modes, source/install parity, MCP schema correctness, CI-proof executable bits, and a much better docs/site surface for how agents should use the suite.

Current state

PR: feat(fread): image and PDF reading via fread-media engine #38
Branch: codex/fread-media-pdf-image
Base: master
Head: c4da4735e61512c84e577c84edbdd6342931541b
GitHub PR metadata: 34 changed files, +6,562 / -443
Local diff stat: 34 files changed, 6,890 insertions, 443 deletions
Commit train: 42 commits from initial engine build through review hardening, CodeRabbit autofix fallout, and final CI recovery
Review threads inspected/triaged: 62 total, 0 unresolved
Latest CI: run 25142119629 success on c4da4735e61512c84e577c84edbdd6342931541b
Local full runner: bash tests/run_all_tests.sh passed 22/22 suites

Executive summary

fread can now read:

Images: PNG, JPEG, GIF, WEBP
PDFs as text by default
PDFs as metadata with --meta-only
PDFs as rendered page images with --render
Mixed batches containing both text and media
Media through CLI and MCP without dumping raw base64 into normal pretty output

The feature is intentionally conservative:

PDF default is text extraction, because rendering pages is far more expensive in model context.
PDF rendering is opt-in and page-capped.
Images are resized/compressed to fit token budget unless the caller explicitly requests no truncation.
Media reads have structured statuses, errors, and diagnostics so downstream callers can tell the difference between read, skipped, budget-skipped, and error states.
Memory ingest is best-effort and never allowed to break a read.

Why this matters

Before this PR, agents had to leave fsuite to inspect screenshots, diagrams, PDFs, invoices, papers, or scanned-ish documents. That broke the workflow contract in three ways:

The tool suite stopped being the primary reconnaissance surface.
Telemetry lost the actual read operation because the agent had to switch tools.
MCP consumers either got no content or had to handle base64-shaped noise outside the intended fread experience.

After this PR, media is part of the same fread flow as source code:

fread docs/spec.pdf
fread screenshot.png
fread paper.pdf --pages 2:4
fread diagram.pdf --render --pages 1:2
fsearch -o paths '*.pdf' docs | fread --from-stdin --stdin-format=paths -o json

This keeps the doctrine intact: fsuite remains the agent's fast, structured, auditable read layer.

What shipped

1. `fread-media.py`: media engine

New Python engine for image/PDF handling:

Magic-byte detection before extension trust.
Image metadata extraction: format, dimensions, byte size, MIME type.
Image budget enforcement with resize/compression loop.
PDF backend abstraction with PyMuPDF primary and Poppler fallback.
PDF text extraction by default.
PDF render mode for page images.
PDF metadata mode.
Page-range parsing and validation.
Encrypted PDF detection.
Backend-specific error envelopes.
Poppler subprocess timeouts.
JSON-first output for fread and MCP consumers.

The engine deliberately separates document reading from shell formatting. Python owns binary/media behavior; bash owns CLI orchestration.

2. `fread`: CLI dispatch and contract preservation

fread now dispatches image/PDF files before the old binary skip path:

Media files go to fread-media.py.
Non-media binaries still follow the existing binary skip behavior.
Pretty output shows useful summaries and extracted PDF text, not raw base64.
JSON output preserves structured media payloads.
Multi-file reads support mixed text/media batches.
Media statuses are reflected in files[] so -o json and -o paths consumers can make honest decisions.

New/expanded flags include:

--render
--pages
--meta-only
--no-resize
--max-pages
--max-tokens
--token-budget
--no-ingest
--no-truncate behavior for media paths

The final review rounds hardened the budget semantics heavily: byte caps count bytes, not characters; UTF-8 truncation preserves character boundaries; PDF defaults preserve the engine's 25k token guard; zero budget correctly means unlimited; and pretty-mode media output respects remaining global line/byte budgets across batches.

3. MCP integration

mcp/index.js now understands fread media output and emits MCP-native content blocks:

Image reads return type: "image" blocks plus text metadata.
PDF text reads return text content.
PDF render reads return page image blocks plus metadata.
Mixed text/media batches merge both instead of dropping one side.
Full mode disables preview caps consistently.
Failed fread requests no longer get rerun just to format errors.
Media flags are exposed in the MCP schema.

Important contract decision: the MCP path uses the MCP content shape (mimeType, top-level data), not Anthropic Messages API's nested source / media_type shape. That distinction matters because copying the wrong shape would silently make the feature unusable for MCP clients.

4. Memory ingest: ShieldCortex bridge, best-effort by design

New mcp/memory-ingest.js helper gives media reads a path into ShieldCortex without making reads depend on memory health.

Behavior:

Reads an ingest payload from stdin.
Locates ShieldCortex configuration from environment or known agent config files.
Uses hash-based recall before write to avoid flooding memory with duplicate document reads.
Times out instead of hanging the caller.
Logs failures to ~/.cache/fsuite/memory-ingest.log.
Never blocks or fails the fread operation.
Supports opt-out through FSUITE_MEMORY_INGEST=0 or --no-ingest.

This is the first concrete instance of the larger ShieldCortex/fsuite integration direction: fsuite reads become searchable memory events, while still behaving as reliable CLI operations when memory is unavailable.

5. Tests and fixtures

This PR adds a large media-focused test surface:

Image pretty output.
Image JSON payload shape.
Image token budget forwarding.
Image no-truncate / zero-budget behavior.
Image meta-only behavior.
PDF text extraction.
PDF JSON payload shape.
PDF metadata mode.
PDF render mode.
PDF render page caps.
Invalid page ranges.
Encrypted PDF errors.
Poppler fallback behavior.
Poppler pdfinfo failures.
Poppler extraction failures.
Media error statuses in files[].
Budget-skipped media behavior.
Pretty output global budget enforcement.
UTF-8 byte and character-boundary behavior.
Engine probe behavior.
Install hints/self-check behavior.
Memory-ingest helper invalid input, missing config, config parsing, and timeout handling.
MCP structured parity for media flags, media blocks, mixed batches, full mode, and error envelopes.

Fixtures added:

tests/fixtures/media/sample.png
tests/fixtures/media/sample.jpg
tests/fixtures/media/sample.pdf
tests/fixtures/media/big.pdf

6. Packaging and install parity

This PR wires the new media pieces into install and package flows:

install.sh installs fread-media.py.
install.sh installs mcp/memory-ingest.js and its node_modules when available.
Debian packaging includes the new engine/helper paths.
CI verifies installed tools remain executable.
Optional dependency messaging calls out PyMuPDF for faster PDF rendering while keeping Poppler fallback usable.

The final CodeRabbit autofix accidentally stripped executable bits from six source scripts. That was caught by CI, fixed in c4da473, and then verified by the full local suite and GitHub Actions. The lesson is simple and durable: source checkout executability is part of the CLI contract, not decorative metadata.

7. Documentation and site maturity

This PR also expands the public/docs surface around the feature and around the suite's identity:

README.md documents image/PDF reads.
AGENTS.md documents fbash MCP unlock and CLI pipe patterns.
site/src/content/docs/commands/fread.md documents media flags and behavior.
site/src/content/docs/commands/fbash.md references the new chain behavior.
site/src/content/docs/architecture/chains.md clarifies command-chain thinking.
site/src/content/docs/getting-started/mental-model.md improves the suite mental model.
site/src/content/docs/reference/changelog.md records the work.
The docs site gets CAVE-theme work, page title/action affordances, Markdown routes, and Copy/Open-in-LLM actions.
GitHub Pages base-path handling was hardened for Markdown action URLs.
ARIA menu semantics were corrected so the UI does not advertise a menu contract it does not implement.

This matters because the feature is not only a backend capability. Agents and humans both need to understand the operating model: when to read text, when to render, when budgets apply, and why media reads stay structured.

Review hardening timeline

The first pass made the feature real. The review cycles made it trustworthy.

Foundation review fixes

Early review tightened the media engine before it became the public contract:

Replaced extension trust with magic-byte detection.
Added strict page range validation.
Added encrypted PDF detection.
Added subprocess timeouts for Poppler.
Hardened backend selection and error reporting.
Kept render mode opt-in rather than default.
Preserved text-first PDF behavior for context efficiency.

Codex review rounds

Codex feedback repeatedly found edge cases that matter for agent consumers:

Budget-skipped media files must not be reported as successfully read.
MCP should not rerun failed fread calls just to format an error.
MCP schema needed all media flags surfaced.
Media JSON could not be treated as single-file-only; batch reads need arrays.
Mixed text/media reads must merge both outputs.
Media-only MCP outputs must preserve diagnostics.
UTF-8 truncation must not slice multibyte characters.
JSON byte budgets must count UTF-8 bytes, not shell string characters.
Default PDF token caps must remain in force unless explicitly disabled.
--token-budget 0 and --no-truncate must mean unlimited for images too.
Pretty-mode media output must respect remaining global line/byte budgets in multi-file reads.
Missing setsid must not leak temp payloads.
Memory ingest helper exit status must be logged correctly.
Poppler pdfinfo and extraction failures must surface as errors, not empty successful documents.
Debian/install dependencies must include the new helper/engine reality.
Paths mode and status semantics needed to remain honest.

CodeRabbit review and autofix round

CodeRabbit caught additional items in tests, docs, accessibility, and parsing:

Test helper crashes should not be swallowed.
INT/TERM should not be converted into success.
Type annotation mismatches should not drift from runtime behavior.
Stale proof counts in docs needed updating.
Tracebacks should not leak in JSON by default.
MCP diagnostics should not disappear for media-only responses.
TOML args parsing needed quoted-argument awareness.
ARIA roles needed to match actual keyboard behavior.
Test runner counters needed scope/indentation cleanup.

Then CodeRabbit's own autofix introduced a source-mode regression by stripping executable bits from CLI/test/MCP entrypoints. That regression was isolated, fixed, and verified. It is now part of the story of this PR because it made the source/install parity contract explicit.

Commit ledger

Commit	Area	What it did
`4672113`	Phase 1	Added the first `fread-media.py` image/PDF engine.
`9c1525c`	Phase 1 hardening	Applied foundation engine review fixes.
`0c06c81`	Phase 2	Wired `fread` dispatch for image/PDF files.
`59ed869`	Phase 2 follow-up	Tightened media dispatch behavior.
`57d9033`	Phase 3	Added MCP image/text block output for fread media.
`f374bfc`	Phase 4	Added memory-ingest helper for media reads.
`3918f92`	Phase 5	Added media and memory-ingest test coverage.
`17909ec`	Phase 6	Documented media reading and packaging.
`65e9185`	Review hardening	Addressed Codex P1/P2 review findings.
`c4cb6e4`	Review hardening	Addressed CodeRabbit and Codex feedback on PR #38.
`9c9ec34`	Review hardening	Fixed Codex P1 truncation and encrypted-PDF CI skip.
`b897c3c`	Review hardening	Fixed token-budget zero and multi-file media payload behavior.
`96ac97b`	Review hardening	Addressed Codex P2 follow-ups round 3.
`eca3030`	Tests	Isolated PATH stub for memory-ingest no-config test.
`f80a0ab`	Install	Shipped memory-ingest helper with MCP dependencies.
`6f44495`	Install	Fixed duplicate brace in share-file setup.
`965f783`	Review hardening	Fixed paths mode, byte caps, and Debian dependencies.
`4dc67da`	Budgeting	Avoided double-enforcing media chunk token budgets.
`dd02d20`	Review hardening	Fixed budget-skip status, MCP rerun behavior, schema flags, and help/docs probes.
`29297ce`	Style	Normalized fread MCP schema indentation.
`4b18006`	Install	Skipped flat memory-ingest install when node_modules are absent.
`0e16bc3`	Error status	Recorded `files[].status = media_error` on engine failure.
`da50f34`	Docs	Added README media-reading docs.
`b700299`	Docs/site	Refreshed guide command, internal cheatsheet, and design brief.
`2346902`	Budgets/batches	Enforced media output caps and iterated media payload arrays.
`b2b72b7`	Agent docs	Documented fbash MCP unlock for chain pipes.
`cc92e22`	Memory ingest	Preserved helper failure status in logs.
`e2edb7a`	Site	Added CAVE theme and per-page Copy/Open-in-LLM dropdown.
`1d62cbc`	Site/docs	Added chain diagram and native-vs-fsuite documentation block.
`d57da1d`	Budgets	Honored token budgets for images.
`7b43ffa`	Site/budgets	Preserved Markdown base paths and image no-truncate behavior.
`b8270b4`	Budgets/config	Addressed PR media budgets and config paths.
`509cc3c`	MCP batches	Preserved text chunks in mixed media fread.
`dceb006`	MCP caps	Aligned media MCP caps with full mode.
`52d25b9`	PDF errors	Surfaced Poppler `pdfinfo` failures.
`7db0361`	Truncation/errors	Hardened media truncation and Poppler extraction errors.
`a1bde5d`	Budgets	Counted media JSON budgets by UTF-8 bytes.
`301c3cc`	PDF budget	Preserved default PDF token budget.
`1920b2b`	Budgets	Honored zero media budgets.
`0bbc615`	Pretty budgets	Hardened media pretty-output budgets.
`5547fa2`	CodeRabbit autofix	Applied CodeRabbit autofixes, then exposed source executable-bit regression.
`c4da473`	CI recovery	Restored executable bits and returned CI to green.

File-level map

Core runtime

fread: media dispatch, flags, output formatting, budgets, statuses, memory ingest launch.
fread-media.py: image/PDF engine, backend abstraction, parsing, rendering, metadata, error envelopes.
mcp/index.js: schema flags, media block conversion, mixed output merging, caps, error handling.
mcp/memory-ingest.js: best-effort ShieldCortex write path.

Install/package

install.sh: installs media engine/helper and reports optional media dependencies.
debian/rules: includes the new media runtime pieces.
.github/workflows/ci.yml: supports the expanded suite/dependency needs.

Tests

tests/test_fread.sh: main media/budget/error regression suite.
tests/test_memory_ingest.sh: memory helper behavior.
mcp/structured-parity.test.mjs: MCP schema/content parity.
tests/run_all_tests.sh: final aggregate runner.
tests/fixtures/media/*: deterministic media fixtures.

Important design decisions

PDF default is text, not render

Rendering PDF pages is expensive. Text extraction gives agents the information they usually need at a fraction of the token cost. Render mode exists for layouts, diagrams, signatures, scanned-ish pages, and visual inspection, but it is explicit:

fread file.pdf --render --pages 1:2

Media budgets are part of the contract

Budget behavior is not just a UI nicety. MCP callers, fbash, and downstream agents depend on it. Review hardening made this explicit:

Use byte counts where the contract says bytes.
Preserve UTF-8 character boundaries.
Respect remaining global budgets across multi-file reads.
Preserve default PDF token caps.
Treat zero budget/no-truncate as unlimited only when explicitly requested.

Memory ingest is best effort

A read tool must stay reliable even if memory is offline, misconfigured, slow, or missing dependencies. Memory ingest therefore runs detached, logs failures, supports opt-out, and does not change the success/failure of the read itself.

MCP output must be native MCP, not copied from another API

This branch avoided the easy wrong move: copying Anthropic Messages API image blocks into MCP. The MCP adapter emits MCP content blocks with the correct field names and shape.

Source checkout behavior matters as much as installed behavior

CI caught this hard: if scripts are executable in installed form but not executable in source form, the development and MCP paths break. The final commit exists solely to restore that source-level contract.

Verification

Local verification

git diff --check passed.
bash tests/test_fread.sh passed 96/96 during final repair validation.
bash tests/test_fread_symbols.sh passed 12/12 during final repair validation.
bash tests/test_coderabbit_fixes.sh passed 14/14 during final repair validation.
bash tests/test_memory_ingest.sh passed 5/5 after local MCP deps install.
cd mcp && npm test passed 49/49 after local MCP deps install.
bash tests/run_all_tests.sh passed 22/22 suites.

GitHub Actions

Latest run:

CI run 25142119629
Commit: c4da4735e61512c84e577c84edbdd6342931541b
Status: success
Job: test-suite

Review triage

62 total PR review threads inspected/triaged.
0 unresolved threads remaining after final replies/resolution state settled.
CodeRabbit status is green.
Codex/CodeRabbit feedback produced concrete hardening rather than churn: each wave narrowed real contracts around budgets, errors, MCP content, install parity, and test truthfulness.

What we learned

1. Media reading is not just base64 transport

The real feature is not "can return an image." The real feature is controlled, inspectable, budget-aware document reading that behaves consistently across CLI, JSON, MCP, and tests.

2. PDF rendering must be explicit

Default-rendering PDFs would burn context and make fread feel flashy but irresponsible. Text-first preserves fsuite's token-budget-first identity.

3. Failure shape is part of product quality

Silent empty PDFs, swallowed Poppler errors, tracebacks in JSON, and success statuses without emitted content all produce bad agent behavior. This PR hardens those states into explicit errors/statuses.

4. Mixed batches are a first-class workflow

Agents do not always read one file at a time. A useful fread must handle batches with code, docs, images, and PDFs together without silently dropping part of the response.

5. Memory should enrich reads, not hold them hostage

ShieldCortex ingest is valuable, but the read operation is the primary contract. If memory fails, the read still succeeds.

6. Review bots are useful, but verification owns the result

The review bots found real issues. They also introduced a real regression. The mature workflow is not blind acceptance or defensive rejection; it is verify, fix, prove, and keep the commit trail honest.

Out of scope

This PR intentionally does not add:

OCR.
PDF table extraction.
PyMuPDF4LLM extraction.
Pandas/Excel/CSV/TSV analytics.
Notebook .ipynb reading.
A new ShieldCortex document category.
Full semantic memory search over read documents.

Those are good next chapters, but this PR establishes the media/document foundation first.

How to try it

# Image metadata/preview path
fread tests/fixtures/media/sample.png

# Image JSON/MCP-friendly payload
fread tests/fixtures/media/sample.png -o json

# PDF text default
fread tests/fixtures/media/sample.pdf

# PDF metadata
fread tests/fixtures/media/sample.pdf --meta-only

# PDF render, bounded by page range
fread tests/fixtures/media/sample.pdf --render --pages 1:2 -o json

# Mixed batch through stdin
printf '%s\n' README.md tests/fixtures/media/sample.png tests/fixtures/media/sample.pdf | \
  fread --from-stdin --stdin-format=paths -o json

# Disable memory ingest for sensitive media
FSUITE_MEMORY_INGEST=0 fread tests/fixtures/media/sample.pdf

Closing note

This started as "let fread read pictures and PDFs." It ended as a broader maturity pass on the suite's agent contract: media is structured, budgets are real, errors are explicit, MCP blocks are native, installs are complete, source scripts remain executable, docs explain the workflow, and CI proves it.

That is the shape of a tool growing up: not just more capability, but fewer lies at the boundaries.

Implements fread-media.py with probe/image/pdf subcommands. Supports PyMuPDF (primary) and Poppler (fallback) PDF backends, Pillow image backend with stdlib fallback, magic-byte detection, iterative resize to token budget, ingest_payload assembly, and structured JSON output matching the downstream spec for Phase 2 (bash) and Phase 3 (MCP). Adds test fixtures: sample.png, sample.jpg, sample.pdf (5 pages with extractable text), big.pdf (12 pages for render-refusal testing).

Critical: explicit returns after emit(); subprocess timeouts. Important: PDF encryption detection; truthful Poppler metadata; strict page-range parser; dimension-based token estimates. Spec gaps: invalid backend rejection; no-subcommand JSON error; file perms 755. Minor: dedup resize loop; Poppler stderr capture; symlink-aware paths. Reviewers: claude-opus (spec), feature-dev:code-reviewer (quality).

Adds media_dispatch() before is_binary_file skip at fread:727. Pretty mode shows metadata only; JSON mode passes through engine JSON. New flags: --render, --pages, --meta-only, --no-resize, --max-pages, --token-budget, --max-tokens, --no-ingest. Post-call memory ingest spawned detached via setsid; helper at mcp/memory-ingest.js (created in Phase 4) is best-effort.

- Pretty mode token estimates now read from engine JSON (was using byte-proxy formula, off by ~8x for images). - Error message field name was 'message' but engine emits 'error'. - New top-level "media_payload" key in JSON output: verbatim engine JSON for Phase 3 MCP integration to read without parsing chunk content strings.

Adds buildMediaContent() helper that maps fread's media_payload field to MCP 2025-11-25 content blocks. The fread MCP handler now short-circuits to image content for image/pdf-pages and to text for pdf-text/meta — bypassing cli()'s text-only renderer. Snake_case mime_type translated to camelCase mimeType per spec. Text-file path unchanged.

Adds mcp/memory-ingest.js — a Node MCP stdio client that ingests engine ingest_payload JSON via ShieldCortex remember(). Best-effort with 3s timeout, recall-by-hash dedupe, and graceful failure (exits non-zero with stderr diagnostic; never blocks the parent fread call). ShieldCortex command resolution: FSUITE_SHIELDCORTEX_CMD env → which shieldcortex-mcp → ~/.claude.json mcpServers → ~/.codex/config.toml. Phase 2 already wired the fread bash spawn; this fills the helper seam.

…se 5) Adds 16 new test functions to tests/test_fread.sh covering image pretty/JSON/meta/no-resize, PDF text/meta/render/page-cap/invalid- pages/encrypted/backend-fallback/budget-truncation paths. Adds tests/test_memory_ingest.sh for the Phase 4 helper covering empty stdin, malformed JSON, no-config, and timeout scenarios. Skips Poppler-dependent tests gracefully when pdftotext is absent.

- site/src/content/docs/commands/fread.md: media flags, output modes, token-cost table, backend selection, memory MCP integration, examples - site/src/content/docs/reference/changelog.md: Unreleased entry - debian/rules: install fread-media.py and mcp/memory-ingest.js to /usr/share/fsuite/ - install.sh: same paths + PyMuPDF detect-and-suggest hint - fread: add fallback resolver for memory-ingest.js installed paths

coderabbitai · 2026-04-29T08:54:12Z

Walkthrough

Adds first-class image/PDF handling to fread via a new Python engine (fread-media.py), a Node.js async memory ingester (mcp/memory-ingest.js), packaging/install updates, MCP adapter changes to render media blocks, new CLI flags for media control, and extensive tests and docs. Media reads optionally trigger best-effort ShieldCortex ingestion.

Changes

Cohort / File(s)	Summary
Media Engine `fread-media.py`	New CLI engine: format probing, image/pdf backends (Pillow/stdlib, PyMuPDF/Poppler), `probe`/`image`/`pdf` commands, token-budgeting, resizing/rendering, explicit error codes, emits JSON with `media_payload` and optional `ingest_payload`.
Fread CLI Integration `fread`	Dispatches binaries to `fread-media.py` (probe → run), constructs engine args from new media flags, integrates budgeted pretty/JSON outputs (`media_payload`/`media_payloads`), records media errors/budget skips, and optionally launches detached memory ingest.
Memory Ingest Helper `mcp/memory-ingest.js`	New Node.js ingester: validates payload, resolves ShieldCortex MCP command from env/which/config files, optional dedupe via `recall` (SHA256/hash tag), `remember` on miss, enforces 3s timeout, logs to stderr.
MCP Adapter `mcp/index.js`	Parses `media_payload`/`media_payloads`, builds MCP `content` blocks (images, pdf pages/meta, budgeted text), short-circuits text-only CLI rendering for media results, adds `formatExecError` helper and exposes `__test__` helpers.
Packaging & Install `debian/rules`, `install.sh`	Installs `fread-media.py` and copies MCP helper (`mcp/memory-ingest.js`) into shared paths; marks executables; install summary mentions optional PyMuPDF hint. Note: `memory-ingest.js` not shipped in `.deb` per comments.
Tests & CI `tests/test_fread.sh`, `tests/test_memory_ingest.sh`, `tests/run_all_tests.sh`, `.github/workflows/ci.yml`	Adds comprehensive media test suite (images/PDFs, budgets, truncation, backends, ingest/no-ingest), new memory-ingest tests, CI apt/pip installs for poppler/Pillow/pymupdf (non-fatal).
Docs & Site `site/src/content/docs/commands/fread.md`, `site/src/content/docs/reference/changelog.md`, `README.md`	Documents media flags/behaviors, JSON `media_payload` shape, ingest semantics and opt-outs, backend selection and error codes, and changelog/cheatsheet updates.
Site Components & Styling `site/src/components/PageActions.astro`, `site/src/components/PageTitle.astro`, `site/src/styles/custom.css`, ...	Adds page actions component, title override, new theme CSS and various site content updates and a markdown raw endpoint (`site/src/pages/[...slug].md.ts`).

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant fread as "fread CLI"
    participant engine as "fread-media.py"
    participant ingest as "memory-ingest.js"
    participant mcp as "ShieldCortex MCP"

    User->>fread: run fread <file> + media flags
    fread->>engine: probe file
    engine-->>fread: probe result (type, backend, ingest_payload?)
    alt media detected
        fread->>engine: run engine subcommand (image/pdf) with flags
        engine-->>fread: JSON result (media_payload, ingest_payload, errors)
        fread->>User: pretty or JSON output (includes media_payload)
        alt ingest enabled
            fread->>ingest: spawn detached with ingest_payload
            ingest->>ingest: validate payload, resolve MCP command
            ingest->>mcp: recall (dedupe)
            mcp-->>ingest: recall result
            alt not duplicate
                ingest->>mcp: remember (write memory)
                mcp-->>ingest: ack
            end
        end
    else engine missing/error
        fread->>User: warn/skip and fall back to prior binary-skip behavior
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Episode 3: The Mirror — fbash, fedit validation, fread upgrades #25: Modifies fread and mcp/index.js integration—directly overlaps with media payload handling and MCP rendering changes in this PR.
Episode 2: The Missing Read — fread completes the fsuite reconnaissance loop #8: Earlier fread addition and Debian packaging changes—related to installing fread artifacts (this PR adds the media engine).
[codex] Episode 3: The Archive #16: Packaging/MCP-related changes—touches Debian install rules and MCP helper placement similar to this PR.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.79% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title directly summarizes the main change: adding image and PDF reading to fread via a new media engine. Concise and specific.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description directly addresses the changeset: media reading for images/PDFs via fread-media.py engine, MCP integration, memory ingest helper, test coverage, and documentation. It explains design decisions, tradeoffs (text-first PDFs, best-effort memory), and verification results.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/fread-media-pdf-image

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17909ec36f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

P1 (fread:904): inline Python heredoc had `print(json.dumps(ip))` at column 0 instead of indented under `if ip is not None:`. SyntaxError on every media read — payload file was never populated, so detached memory ingest silently short-circuited. Now correctly indented. Verified end-to-end: ShieldCortex log line lands after fread call. P2 (fread-media.py:685): stdlib fallback condition `if estimate > args.max_tokens and not args.no_resize` was inverted relative to the documented `--no-resize` behavior ("refuse if over token budget"). Stdlib has no resize capability, so over-budget images should be rejected regardless of the flag. Drop the `and not args.no_resize` guard. Both findings from chatgpt-codex-connector on PR #38.

coderabbitai

Actionable comments posted: 12

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@fread`:
- Line 774: The command invocation may pass an unintended empty-string argument
when engine_args is empty; update the call that sets engine_json (the invocation
of python3 "$ENGINE_MEDIA" "$subcmd" "$path" with "${engine_args[@]}" and
capturing engine_exit) to use the safe array-expansion pattern so no spurious
empty argument is passed (use the ${engine_args[@]+"${engine_args[@]}"} pattern
in place of "${engine_args[@]}"), preserving the existing error redirection and
the engine_exit capture.
- Around line 899-905: The inline Python passed to python3 -c relies on
flush-left indentation and is fragile if surrounding Bash code is reindented;
add a clear comment next to the printf | python3 -c invocation warning that the
Python must remain left-aligned, and better yet replace the inline -c snippet
with a heredoc or a quoted heredoc (so the Python body can be safely indented)
while keeping the same behavior of reading engine_json and writing
ingest_payload to payload_file; reference the existing usage of engine_json,
payload_file and the python3 invocation (the small Python snippet that loads
stdin and prints d.get('ingest_payload')) so reviewers know exactly where to
make the change.
- Around line 776-779: The branch that detects empty engine output currently
calls add_warning with "$path" and returns 0; change that return to 1 so that an
empty engine_json triggers the binary-skip fallback and emits the explicit
"binary file skipped" behavior; do the same for the later error branch (the
other return 0 case) so both the engine_json empty check and the error path
return 1 instead of 0, referencing the engine_json variable and the add_warning
call to locate the spots to update.

In `@fread-media.py`:
- Around line 169-204: All methods that call fitz.open (page_count,
extract_text, render_page, get_page_dims) must ensure the Document is always
closed even if an exception occurs: wrap fitz.open(path) usage in a try/finally
or use a context manager so doc.close() is guaranteed in the finally block;
preserve existing encryption checks from extract_text/render_page (check
doc.is_encrypted and doc.needs_pass and raise) and add the same check to
get_page_dims; also validate page index bounds in render_page and get_page_dims
similar to extract_text to avoid index errors before accessing doc[page].

In `@mcp/memory-ingest.js`:
- Around line 154-160: The timeout branch of the Promise.race can leave doIngest
and its StdioClientTransport child running and orphaned; modify doIngest to
accept an AbortSignal (via AbortController) and ensure the caller aborts the
controller when the timeout fires, or explicitly call/await a transport shutdown
method (e.g., StdioClientTransport.kill()/close()/destroy()) before calling
process.exit; update the Promise.race usage to create an AbortController, pass
controller.signal into doIngest, call controller.abort() in the timeout handler
(and/or call client.transport.kill()) so the child process is cleanly terminated
before exiting.
- Around line 110-118: The arguments object is including source: payload.source
even when payload.source is undefined; change the construction in the call that
builds the arguments object (the block around arguments: {...}) to only include
the source key when payload.source is defined (e.g., conditionally spread in {
source: payload.source } only if payload.source !== undefined or truthy) so the
MCP tool never receives an explicit undefined source value.
- Around line 163-165: The ternary in the log call is redundant (msg ===
"timeout" ? "error" : "error"); update the log level selection so it conveys the
intended severity: replace the ternary with the correct conditional behavior
(for example, use "warn" when msg === "timeout" and "error" otherwise) or simply
use a single literal "error" if no special-case is required; modify the call to
log(...) accordingly where msg is defined and used in memory-ingest.js
(referencing the variables msg and the log(...) invocation).

In `@site/src/content/docs/commands/fread.md`:
- Around line 166-167: Add a single blank line between the preceding fenced code
block and the "## See also" heading in fread.md so the heading is separated from
the code block; locate the "## See also" heading in the file (near the end of
the fread documentation) and insert one empty line immediately above it.

In `@site/src/content/docs/reference/changelog.md`:
- Around line 18-19: Update the changelog entry to mention both flags and
clarify their difference: add `--token-budget` to the listed new flags and a
short parenthetical or sentence explaining that `--token-budget` is fread's
global budget (also forwarded to the engine) while `--max-tokens` is a
media-specific alias that applies per-media; reference the existing flag names
`--max-tokens` and `--token-budget` so readers can see both and understand which
is global vs media-scoped.

In `@tests/run_all_tests.sh`:
- Around line 268-277: The block that invokes run_test_suite for the
memory-ingest helper (inside main()) is misindented compared to other suite
blocks; align its lines starting from the echo through the TOTAL_TESTS update to
use the same 2-space indentation as other suites so the echo, if/then/else,
TOTAL_PASSED/TOTAL_FAILED increments and failed_suites+=("memory-ingest") lines
match the file's established style and keep the
run_test_suite/TOTAL_PASSED/TOTAL_FAILED/failed_suites/TOTAL_TESTS logic
unchanged.

In `@tests/test_fread.sh`:
- Around line 1502-1509: The test uses a default list for pages_returned which
masks missing-field failures; change the extraction so pr =
f.get('pages_returned') (no default) and update the ok condition to explicitly
verify pr is a list and len(pr) < 5 and that f.get('truncated') is True, failing
when pages_returned is None or not a list; reference the variables mp, f, pr and
the keys 'pages_returned' and 'truncated' to locate where to update the
assertion logic.

In `@tests/test_memory_ingest.sh`:
- Around line 31-35: The run_test function currently accepts a test name as $1
then immediately shifts it away and never logs it; update run_test (and
reference TESTS_RUN and the run_test function) to capture the test name before
shift and print or log it (e.g., echo "Running test: $TEST_NAME") so each
invocation records the human-readable test name for debugging while preserving
the increment of TESTS_RUN and executing the remaining command arguments as
before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3c762780-231b-47ea-8f5d-8d9ba4104ec5

📥 Commits

Reviewing files that changed from the base of the PR and between c745101 and 17909ec.

⛔ Files ignored due to path filters (4)

tests/fixtures/media/big.pdf is excluded by !**/*.pdf
tests/fixtures/media/sample.jpg is excluded by !**/*.jpg
tests/fixtures/media/sample.pdf is excluded by !**/*.pdf
tests/fixtures/media/sample.png is excluded by !**/*.png

📒 Files selected for processing (11)

debian/rules
fread
fread-media.py
install.sh
mcp/index.js
mcp/memory-ingest.js
site/src/content/docs/commands/fread.md
site/src/content/docs/reference/changelog.md
tests/run_all_tests.sh
tests/test_fread.sh
tests/test_memory_ingest.sh

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test-suite

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fsuite` for the suite-level mental model, then reach for operational tools (fs, ftree, fls, fsearch, fcontent, fmap, fread, fedit, fwrite, fbash, fprobe, fcase, freplay, fmetrics) for filesystem reconnaissance, bounded reading, and surgical editing

Applied to files:

debian/rules
site/src/content/docs/reference/changelog.md
install.sh

🪛 markdownlint-cli2 (0.22.1)

site/src/content/docs/commands/fread.md

[warning] 166-166: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🪛 Shellcheck (0.11.0)

tests/test_memory_ingest.sh

[info] 18-21: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 23-29: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 40-52: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 55-67: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 70-85: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 89-102: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

tests/test_fread.sh

[info] 1096-1116: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1119-1152: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1155-1180: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1183-1195: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1198-1216: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1219-1246: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1249-1270: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1273-1301: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1304-1330: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1333-1356: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1359-1419: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1422-1453: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1456-1472: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1475-1486: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1494-1516: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1519-1577: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

🔇 Additional comments (8)

debian/rules (1)

28-29: LGTM.

Packaging correctly mirrors install.sh: both fread-media.py and memory-ingest.js are installed to /usr/share/fsuite/ with executable permissions.

mcp/index.js (2)

1402-1434: LGTM — media content builder handles all payload types.

The buildMediaContent function correctly maps engine output to MCP content blocks. The snake_case to camelCase translation (mime_type → mimeType) is handled properly, and each payload type produces appropriate content block structures.

1608-1625: Media short-circuit correctly falls through on failure.

The try-catch ensures any engine error or missing media_payload gracefully falls back to the standard cli() text path. This is the right pattern for optional feature enablement.

install.sh (2)

274-277: LGTM.

Memory-ingest.js installation follows the same pattern as other shared files, with proper error handling via die if the file is missing.

744-748: LGTM — conditional hint is user-friendly.

The PyMuPDF hint only appears when the module isn't installed, avoiding noise for users who already have it.

tests/test_memory_ingest.sh (1)

89-102: Good timeout coverage.

Using FSUITE_SHIELDCORTEX_CMD="sleep 10" to trigger the internal 3s timeout is an effective way to test the timeout path without external dependencies.

fread (1)

913-916: Detached spawn pattern is solid.

Using setsid bash -c '...' with positional arguments correctly isolates the background process. The temp file cleanup (rm -f "$1") inside the subshell ensures no orphaned files. The </dev/null >/dev/null 2>&1 & properly detaches from the parent's file descriptors.

tests/test_fread.sh (1)

1090-1116: Tests look good.

The media test suite comprehensively covers the new engine dispatch behavior, including error paths for encryption, invalid backends, and token budget enforcement.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65e9185e12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

P1: mcp/index.js no longer double-runs fread for non-media files; fread-media.py wraps every fitz.open() in try/finally with encryption checks across all PyMuPDFBackend methods; fread media_dispatch returns 1 on empty engine output so binary skip path can fire as fallback. P2: mcp/memory-ingest.js TOML parser preserves the args array. Nits: test_fread.sh strict pages_returned check; memory-ingest.js spread-omit undefined source, kill orphan subprocess on timeout, warn instead of error log level for timeouts; markdown blank line in fread.md; clearer flag listing in changelog.md; consistent indentation in run_all_tests.sh; safe empty-array expansion in fread. CI: poppler-utils, python3-pil, pymupdf installed for media tests.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4cb6e47fd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

P1 (fread-media.py:891): PDF text truncation appended the full page when remaining<=0, blowing past the configured token budget. Now drops the page entirely when no budget is left, so total tokens stay under the cap. CI encrypted-PDF test: SKIP cleanly when fitz is not importable instead of FAIL. Phase 5 spec called for skip-not-fail; was missed. Also harden ci.yml's pymupdf install via sudo --break-system-packages so system Python can find it (previous --user install put it on a path that the test's `python3 -c 'import fitz'` couldn't see).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9c9ec34b6a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…yload) P2 fread:767: --no-truncate sets TOKEN_BUDGET=0, but the engine forward guard `(( TOKEN_BUDGET > 0 ))` dropped that case to the engine default (25000), still truncating. Engine now treats token_budget=0 as unlimited (via sys.maxsize); fread forwards the value unconditionally. P2 fread:803: MEDIA_PAYLOAD_JSON was a single global; multi-file batches overwrote earlier files. Added MEDIA_PAYLOADS_JSON array, emitted as top-level "media_payloads" plural in render_json. Singular "media_payload" preserved for single-file Phase 3 MCP backward-compat.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b897c3cd37

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

P2 memory-ingest.js:74: TOML section keys with quoted names like [mcp_servers."shield.cortex"] retained the quotes when matched against nameRe, so quoted server names were silently skipped. Strip surrounding quotes before name match. P2 test_memory_ingest.sh:77: hardcoded PATH dropped Node on toolcache/nvm/asdf setups, causing exit 127 instead of exercising the no-config branch. Prepend the directory of the active node binary to PATH so the helper invocation always finds it.

Previous attempt prepended dirname(node) to PATH, but on machines where node and shieldcortex-mcp share a directory (common for ~/.local/bin), the stub still resolved shieldcortex-mcp and the test exercised the wrong branch. Switch to an mktemp dir with a node symlink only — works on toolcache/nvm/asdf without leaking other binaries.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eca30301e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

memory-ingest.js imports @modelcontextprotocol/sdk at module load time. Previous install copied just the script flat to \${share_dir}/memory-ingest.js, leaving the SDK unresolvable — ERR_MODULE_NOT_FOUND on every memory ingest in production installs. install.sh now copies the entire mcp/ tree (script + package.json + node_modules) to \${share_dir}/mcp/ when node_modules exists. fread's helper resolver prepends <share>/mcp/memory-ingest.js to the candidate list so the colocated SDK loads correctly. Falls back to flat install with a warning when node_modules is missing, surfacing as 'helper not found' / 'helper exit code 1' in the ingest log rather than silently breaking. debian packaging unchanged for now — shipping node_modules in a .deb is too heavy. Debian users who need memory ingest should run 'npm ci --prefix mcp' under the shared dir or use install.sh.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f44495184

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

P1 fread:811: media JSON chunk now goes through check_budget_preadd so --max-bytes/--max-lines apply uniformly. Previously media files emitted full base64 regardless of byte cap. P2 fread:800: -o paths mode now skips the media formatter entirely; render_paths emits the path alone. Previously the pretty summary line contaminated machine-readable paths output. P2 debian/rules:29: dropped the standalone memory-ingest.js install line. Shipping node_modules in a .deb is too heavy; install.sh handles the full mcp/ tree for users who want detached ingest. Debian users get a clean 'helper not found' log instead of ERR_MODULE_NOT_FOUND.

check_budget_preadd checks both MAX_BYTES/MAX_LINES AND TOKEN_BUDGET. For PDF text reads with --token-budget, the engine already truncated internally, so bash gating on token budget rejected the truncated result entirely. Switched to manual MAX_BYTES/MAX_LINES check; the engine owns token budget for PDFs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dc67dad6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…s, help) - fread: budget-blocked media now records status=budget_skipped and returns before mark_file_read/_media_maybe_ingest, so over-budget reads no longer count as successful reads or trigger memory ingest. - mcp/index.js: factor cli() catch body into formatExecError() helper so the fread MCP handler can reuse it on engine error instead of re-spawning fread, cutting double-latency and duplicate telemetry on missing/invalid inputs. - mcp/index.js: expose --meta-only, --render, --pages, --no-resize, --max-pages, --max-tokens, --no-ingest in the fread MCP inputSchema and forward them to the CLI so MCP agents can drive PDF render / image meta / ingest opt-out paths. - fread: document media flags in usage(), --install-hints (Pillow / pymupdf / poppler-utils), and --self-check (optional PIL / fitz / pdftotext probe that does not flip dependency-ok). - tests: add test_fread.sh coverage for budget_skipped status + media-aware --self-check / --install-hints, and structured-parity coverage for the new fread schema fields and the formatExecError envelope shape (isError with non-empty content[text]). PR #38, Codex P2 round 2.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d62cbc984

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d57da1d729

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b43ffadea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8270b45c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 509cc3c262

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dceb0069b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52d25b9489

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7db03612d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1bde5dcb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 301c3cc36e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1920b2bfa9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lliWcWill · 2026-04-29T23:55:43Z

@coderabbitai Full review

coderabbitai · 2026-04-29T23:55:49Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 11

♻️ Duplicate comments (2)

fread-media.py (2)

203-211: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing page index bounds check in get_page_dims.

Same issue as render_page — doc[page] is accessed without bounds validation. An invalid index will raise an IndexError.

Add bounds validation

     def get_page_dims(self, path: str, page: int) -> dict:
         doc = fitz.open(path)
         try:
             if doc.is_encrypted and doc.needs_pass:
                 raise RuntimeError("PDF is encrypted and requires a password")
+            if page < 0 or page >= doc.page_count:
+                raise RuntimeError(f"Page {page} out of range (0-{doc.page_count - 1})")
             r = doc[page].rect

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@fread-media.py` around lines 203 - 211, get_page_dims currently accesses
doc[page] without validating the page index (same issue as render_page); add a
bounds check against doc.page_count (or len(doc)) before accessing doc[page] and
raise a clear exception (e.g., IndexError or ValueError) when page is negative
or >= doc.page_count, ensuring the existing finally doc.close() still executes;
reference get_page_dims, doc.page_count (or len(doc)), and doc[page].rect when
making the change.

191-201: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing page index bounds check in render_page.

PyMuPDFBackend.render_page accesses doc[page] at line 196 without validating that page is within [0, doc.page_count). If an out-of-bounds index is passed, fitz will raise an IndexError that surfaces as a generic traceback rather than a clean error message.

Add bounds validation

     def render_page(self, path: str, page: int, dpi: int = 100):
         doc = fitz.open(path)
         try:
             if doc.is_encrypted and doc.needs_pass:
                 raise RuntimeError("PDF is encrypted and requires a password")
+            if page < 0 or page >= doc.page_count:
+                raise RuntimeError(f"Page {page} out of range (0-{doc.page_count - 1})")
             pix = doc[page].get_pixmap(dpi=dpi)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@fread-media.py` around lines 191 - 201, The render_page method currently
indexes doc[page] without validating bounds; add a check using doc.page_count to
ensure 0 <= page < doc.page_count before accessing doc[page], and if invalid
raise a clear exception (e.g., ValueError or IndexError) with a descriptive
message including the requested page and total pages; keep the existing finally:
doc.close() behavior so the document is always closed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/internal/specs/2026-04-29-claude-design-site-brief.md`:
- Around line 125-136: The proof-point line "**~516 tests across 16 suites**" is
stale—replace it with the authoritative test count by reading the
source-of-truth (e.g., the test-summary/CI artifact or the repo's test-report)
and update the bullet to the current value (e.g., the new "~470 tests across 15
suites" if that's what the source shows); locate and edit the exact bullet
string "**~516 tests across 16 suites**" in the spec list and ensure the updated
count is used programmatically or documented so future briefs pull from that
source automatically.

In `@fread`:
- Around line 213-222: The current truncate_utf8_bytes implementation slices raw
bytes and decodes with "ignore", which can drop partial multibyte characters;
update truncate_utf8_bytes so Python reads sys.stdin.buffer.read(), and if
len(data) > limit find the largest prefix <= limit that successfully decodes as
UTF-8 (e.g., try decoding data[:limit] with strict error handling and on
UnicodeDecodeError decrement the end index until decode succeeds), then write
that decoded string to stdout; reference truncate_utf8_bytes, the max variable,
and the sys.stdin.buffer.read() usage to locate where to implement the
safe-prefix decoding loop.

In `@fread-media.py`:
- Around line 1034-1046: The global exception handler (the "except Exception as
exc:" block) currently always adds the full traceback to the JSON error output;
change it to include the "traceback" field only when a debug flag or env var is
set (e.g., use an argparse boolean --debug stored as args.debug or an
environment variable like FREAD_DEBUG via os.environ.get), otherwise omit or
redact the traceback value; update the handler to read that flag (e.g.,
debug_mode = args.debug or bool(os.environ.get("FREAD_DEBUG"))) and
conditionally add "traceback": _tb.format_exc() to the dict before json.dump,
leaving other fields ("type","error","code") intact and still writing the JSON
and exiting with sys.exit(1).
- Around line 332-369: The function render_page currently declares return type
-> bytes but actually returns a tuple (data, width, height); update the type
annotation to reflect the actual return type (e.g., -> Tuple[bytes, int, int])
and add the necessary typing import (from typing import Tuple) at module scope;
locate the render_page definition and adjust its signature and imports
accordingly, and run static checks to ensure callers expect the tuple form.

In `@mcp/index.js`:
- Around line 1535-1596: The MCP output currently drops fread diagnostics when
only media blocks remain because textOnlyFreadPayload returns null and
buildFreadMcpContent only returns media content; fix by preserving and copying
diagnostic/meta fields from the original parsed fread result into the final
returned object even when textPayload is null. In buildFreadMcpContent (and/or
adjust textOnlyFreadPayload) ensure the final return value includes
parsed.warnings, parsed.errors, parsed.next_hint and the parsed.files array (or
files with updated .status) or include slim/structuredContent created via
renderFreadTextResult/normalizeStructuredContent so that
warnings/errors/files[].status are propagated alongside the media content
instead of being dropped.

In `@mcp/memory-ingest.js`:
- Around line 91-98: The TOML args extraction in memory-ingest.js that uses
argsRegex and then am[1].split(',') (see cmdMatch / argsRegex / am / args)
breaks on commas inside quoted strings; replace this ad-hoc split with either a
proper TOML parser (e.g., parse the whole block with a toml library to read the
args array) or implement a quoted-string-aware tokenizer for am[1] that respects
escaped quotes and commas inside quotes, then build args from those tokens and
preserve existing trimming/unquoting logic.

In `@site/src/components/PageActions.astro`:
- Around line 78-125: The markup advertises an ARIA menu but lacks full
keyboard/menuitem behavior; remove the misleading roles instead: delete
role="menu" from the div.page-actions__menu and remove role="menuitem" from all
child buttons (e.g., the buttons with data-action="copy-md", "copy-plain",
"view-md", "open-claude", "open-chatgpt") and from the anchor
(href={ghSourceUrl}) so these remain plain buttons/links, or alternatively
implement the complete menu keyboard model if you prefer to keep ARIA; prefer
the former for a minimal fix.

In `@tests/run_all_tests.sh`:
- Line 277: Move the TOTAL_TESTS assignment into the main() function so it
executes after TOTAL_PASSED and TOTAL_FAILED are finalized; specifically, locate
the global line setting TOTAL_TESTS=$((TOTAL_PASSED + TOTAL_FAILED)) and
indent/relocate it inside the main() block (after all suites run and after
TOTAL_PASSED/TOTAL_FAILED are updated) so the calculation happens at runtime
rather than at parse time.

In `@tests/test_fread.sh`:
- Around line 1913-1915: The test harness copies only "${FREAD}" into the fake
sandbox, but the runtime fread script unconditionally sources _fsuite_common.sh
and needs that helper present; update the three copy sites that use cp
"${FREAD}" "${fake_dir}/fread" (the occurrences around the mkdir/cp blocks that
reference "${fake_dir}", "${fake_bin}", and "${payload_dir}") to also copy the
helper from the test scripts directory by adding cp
"${SCRIPT_DIR}/../_fsuite_common.sh" "${fake_dir}/_fsuite_common.sh" so the fake
sandbox contains both fread and _fsuite_common.sh (repeat the same change for
the other two similar cp occurrences).

In `@tests/test_memory_ingest.sh`:
- Around line 159-160: The current trap in main() uses "trap 'true' EXIT INT
TERM", which swallows INT/TERM; change it to preserve cancellation semantics by
only trapping EXIT (e.g., "trap 'true' EXIT") and ensure INT/TERM remain default
(either remove INT/TERM from the trap or explicitly reset them with "trap - INT
TERM"); update the trap invocation that references main() and the line "trap
'true' EXIT INT TERM" accordingly.
- Around line 31-37: The run_test helper currently swallows test failures via
`"${@}" || true`, so unexpected crashes never increment TESTS_FAILED; change
run_test (function run_test) to execute the test command directly, capture its
exit code ($?) after calling the test body, and if non-zero increment
TESTS_FAILED and report the failure (e.g., echo the exit code and test name) so
that unexpected exits are treated as failures instead of being ignored; ensure
TESTS_RUN is still incremented and existing pass()/fail() logic remains
compatible.

---

Duplicate comments:
In `@fread-media.py`:
- Around line 203-211: get_page_dims currently accesses doc[page] without
validating the page index (same issue as render_page); add a bounds check
against doc.page_count (or len(doc)) before accessing doc[page] and raise a
clear exception (e.g., IndexError or ValueError) when page is negative or >=
doc.page_count, ensuring the existing finally doc.close() still executes;
reference get_page_dims, doc.page_count (or len(doc)), and doc[page].rect when
making the change.
- Around line 191-201: The render_page method currently indexes doc[page]
without validating bounds; add a check using doc.page_count to ensure 0 <= page
< doc.page_count before accessing doc[page], and if invalid raise a clear
exception (e.g., ValueError or IndexError) with a descriptive message including
the requested page and total pages; keep the existing finally: doc.close()
behavior so the document is always closed.

🪄 Autofix (Beta)

✅ Autofix completed

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6d400cf4-74f0-466c-a95d-fea2040ded84

📥 Commits

Reviewing files that changed from the base of the PR and between c745101 and 0bbc615.

⛔ Files ignored due to path filters (5)

AGENTS.md is excluded by !AGENTS.md
tests/fixtures/media/big.pdf is excluded by !**/*.pdf
tests/fixtures/media/sample.jpg is excluded by !**/*.jpg
tests/fixtures/media/sample.pdf is excluded by !**/*.pdf
tests/fixtures/media/sample.png is excluded by !**/*.png

📒 Files selected for processing (26)

.github/workflows/ci.yml
README.md
debian/rules
docs/internal/readme-drafts/04-cheatsheet-changelog.md
docs/internal/specs/2026-04-29-claude-design-site-brief.md
fread
fread-media.py
fsuite
install.sh
mcp/index.js
mcp/memory-ingest.js
mcp/structured-parity.test.mjs
site/astro.config.mjs
site/src/components/PageActions.astro
site/src/components/PageTitle.astro
site/src/content/docs/architecture/chains.md
site/src/content/docs/commands/fread.md
site/src/content/docs/getting-started/mental-model.md
site/src/content/docs/index.mdx
site/src/content/docs/reference/changelog.md
site/src/pages/[...slug].md.ts
site/src/styles/custom.css
site/src/styles/custom_backup.css
tests/run_all_tests.sh
tests/test_fread.sh
tests/test_memory_ingest.sh

📜 Review details

🧰 Additional context used

📓 Path-based instructions (2)

docs/**

⚙️ CodeRabbit configuration file

docs/**: Check for: accurate CLI examples, no hardcoded personal paths or credentials,
and clear explanations of flags and defaults.

Files:

docs/internal/specs/2026-04-29-claude-design-site-brief.md
docs/internal/readme-drafts/04-cheatsheet-changelog.md

README.md

⚙️ CodeRabbit configuration file

README.md: Check for: accurate examples, correct filenames/paths, and no stale references.

Files:

README.md

🧠 Learnings (16)

📓 Common learnings

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fsuite` as the primary suite-level mental model for filesystem operations

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Prefer `-o json` for programmatic decisions in fsuite tool calls

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Prefer `-o paths` when piping fsuite tool output into another tool

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `-q` flag for existence checks and silent control flow in fsuite tools

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fbash` to enable Unix pipe chains when combining two or more producer/consumer steps in fsuite

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fmap` and `fread --symbol` before using broad `fcontent` searches

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fcontent` for exact-text confirmation only after narrowing search scope, not as the first conceptual search

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Open `fcase init` at the start of any non-trivial investigation and close with `fcase resolve`

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Check `fcase find --status all --deep` before starting new work to discover past investigations

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fread --symbol NAME` or `--lines START:END` instead of reading entire files

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fedit --lines` when line numbers are available from fread for fastest editing with zero ambiguity

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fedit --function_name` for symbol-scoped edits without requiring large unique context strings

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Never edit code blind; always inspect context with `fread` before calling `fedit`

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Run `fsuite` once at the beginning of work to establish the mental model

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Run `ftree --snapshot` once to establish territory and avoid rediscovering the repository unless the target changes

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fs` as the default search entry point and let it route to the appropriate fsuite tool

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fcase` investigation continuity ledger to track findings, evidence, and handoff state across context compaction

Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Pipe fsuite tool output producers (`fsearch -o paths`, `fcontent -o paths`) into tool consumers for efficient chaining

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fsuite` for the suite-level mental model, then reach for operational tools (fs, ftree, fls, fsearch, fcontent, fmap, fread, fedit, fwrite, fbash, fprobe, fcase, freplay, fmetrics) for filesystem reconnaissance, bounded reading, and surgical editing

Applied to files:

debian/rules
tests/run_all_tests.sh
site/src/components/PageActions.astro
fread-media.py
site/src/content/docs/getting-started/mental-model.md
mcp/memory-ingest.js
install.sh
site/src/content/docs/index.mdx
site/src/content/docs/commands/fread.md
site/src/content/docs/architecture/chains.md
tests/test_memory_ingest.sh
docs/internal/readme-drafts/04-cheatsheet-changelog.md
site/src/content/docs/reference/changelog.md
fsuite
fread
tests/test_fread.sh
mcp/index.js
README.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Run `fsuite` once at the start to establish the suite-level mental model; do not re-read it

Applied to files:

debian/rules
site/src/content/docs/getting-started/mental-model.md
site/src/content/docs/index.mdx
fsuite

📚 Learning: 2026-04-29T22:12:50.298Z

Learnt from: lliWcWill
Repo: lliWcWill/fsuite PR: 39
File: site/src/pages/[...slug].md.ts:18-23
Timestamp: 2026-04-29T22:12:50.298Z
Learning: In `site/src/pages/[...slug].md.ts` (fsuite repo), the `index.mdx` entry intentionally maps to slug `index` → URL `/fsuite/index.md`. PageActions is fixed (commit a5a25ca) to target that exact URL. Do NOT suggest normalizing the index slug to an empty string, as that would produce the non-standard `/fsuite/.md` URL.

Applied to files:

site/astro.config.mjs
site/src/pages/[...slug].md.ts
site/src/components/PageActions.astro
site/src/content/docs/index.mdx

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fs` as the default search entry point and let it auto-route to the appropriate tool

Applied to files:

site/src/content/docs/getting-started/mental-model.md
site/src/content/docs/index.mdx
fsuite
mcp/index.js
README.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `fmap` + `fread --symbol` before broad `fcontent` searches

Applied to files:

site/src/content/docs/getting-started/mental-model.md
site/src/content/docs/index.mdx
fsuite
fread
mcp/index.js
README.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Run `ftree --snapshot` once to establish territory and map the codebase structure. Do not rediscover the repo unless the target changes

Applied to files:

site/src/content/docs/getting-started/mental-model.md
fsuite
README.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Never edit blindly. Always inspect context with `fread` before calling `fedit`

Applied to files:

site/src/content/docs/getting-started/mental-model.md
README.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fedit --lines` when you have specific line numbers from `fread`. This provides the fastest edit mode with zero ambiguity

Applied to files:

site/src/content/docs/getting-started/mental-model.md
fread

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fedit --function_name` for symbol-scoped edits without needing large unique context strings

Applied to files:

site/src/content/docs/getting-started/mental-model.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Open `fcase init` at the start of any non-trivial investigation and close with `fcase resolve`

Applied to files:

site/src/content/docs/getting-started/mental-model.md
site/src/content/docs/index.mdx
site/src/content/docs/architecture/chains.md
README.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fread --symbol NAME` or `--lines START:END` for targeted reading. Do not read whole files without bounds

Applied to files:

site/src/content/docs/getting-started/mental-model.md
site/src/content/docs/index.mdx
fread
README.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Always check `fcase find --status all --deep` before starting new work to see if past investigations have already answered the question

Applied to files:

site/src/content/docs/getting-started/mental-model.md
README.md

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `-o paths` when piping output into another tool

Applied to files:

fread

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `-o json` for programmatic decisions and parsing

Applied to files:

fread

📚 Learning: 2026-04-10T17:12:34.291Z

Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `pretty` format only for human terminal output

Applied to files:

fread

🪛 LanguageTool

docs/internal/specs/2026-04-29-claude-design-site-brief.md

[style] ~144-~144: Consider using an alternative to strengthen your wording.
Context: ...ware." 3. AI agent tool authors who want to learn from fsuite's design decisions...

(WANT_KEEN)

[grammar] ~164-~164: Ensure spelling is correct
Context: ...hero diagram* of the sensor metaphor (fsuite tools as drones, target repo as terrain...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~164-~164: Ensure spelling is correct
Context: ...flexes"** mapping native tool habits to fsuite equivalents. This is the page that, if ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~183-~183: Ensure spelling is correct
Context: ... the new media reading. - Diagram for fcase lifecycle — shows the init → note → r...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~191-~191: This phrasing can be overused. Try elevating your writing with a more formal alternative.
Context: ...) is correct. Reorder within sections if you want, but the top-level shape stays. - Don't...

(IF_YOU_WANT)

[grammar] ~199-~199: Ensure spelling is correct
Context: ...r GitHub Pages serving. --- ## 8. The fsuite voice (so your copy sounds right) - **Direct...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~222-~222: Ensure spelling is correct
Context: ...ns"; Episode 3 is the latest milestone. These are the real story content. - `docs/interna...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~275-~275: Consider using a different verb to strengthen your wording.
Context: ...master or merge the PR yourself If you find a problem outside your scope (e.g., a t...

(FIND_ENCOUNTER)

site/src/content/docs/index.mdx

[style] ~42-~42: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...output. They do not rank their results. They do not know what a token costs. **fsui...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

site/src/content/docs/architecture/chains.md

[grammar] ~94-~94: Ensure spelling is correct
Context: ...-branch">fprobe patch ## fcase lifecycle The investigation ledger isn...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~108-~108: Ensure spelling is correct
Context: ...close + archive ## Replay chain Rerun a traced investigation ste...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.22.1)

docs/internal/specs/2026-04-29-claude-design-site-brief.md

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 88-88: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 96-96: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 100-100: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 105-105: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 108-108: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 111-111: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 114-114: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🪛 Shellcheck (0.11.0)

tests/test_memory_ingest.sh

[info] 18-21: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 23-29: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 42-54: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 57-69: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 72-98: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 100-138: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 142-155: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

tests/test_fread.sh

[info] 1096-1116: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1119-1152: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1154-1239: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1241-1325: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1327-1352: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1354-1391: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1393-1430: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1433-1458: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1461-1473: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1476-1494: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1497-1524: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1527-1548: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1551-1579: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1582-1608: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1611-1634: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1637-1657: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1660-1726: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1729-1760: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1763-1793: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1796-1838: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1841-1857: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1860-1871: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1873-1904: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1906-1984: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 1992-2014: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 2017-2036: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 2038-2092: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 2094-2148: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 2150-2208: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 2211-2238: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 2241-2249: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

[info] 2252-2260: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

🪛 Stylelint (17.9.0)

site/src/styles/custom.css

[error] 12-12: Expected "url('https://fonts.googleapis.com/css2?family=Geist:wght@300;400;500;600;700;800;900&family=JetBrains+Mono:wght@300;400;500;600;700&display=swap')" to be "'https://fonts.googleapis.com/css2?family=Geist:wght@300;400;500;600;700;800;900&family=JetBrains+Mono:wght@300;400;500;600;700&display=swap'" (import-notation)

(import-notation)

[error] 50-50: Expected "SFMono-Regular" to be "sfmono-regular" (value-keyword-case)

(value-keyword-case)

[error] 50-50: Expected "Menlo" to be "menlo" (value-keyword-case)

(value-keyword-case)

[error] 50-50: Expected "Consolas" to be "consolas" (value-keyword-case)

(value-keyword-case)

[error] 104-104: Expected "optimizeLegibility" to be "optimizelegibility" (value-keyword-case)

(value-keyword-case)

🔇 Additional comments (15)

fsuite (1)

23-24: LGTM!

Tool roster updated to 14, and the fread description correctly reflects the new image/PDF media engine dispatch. Help text is consistent and accurate.

Also applies to: 39-59

docs/internal/readme-drafts/04-cheatsheet-changelog.md (1)

381-427: LGTM!

Changelog accurately documents the v2.4.0 media reading features, new flags, status contract fixes, and test expansions. No hardcoded paths or credentials. Flag explanations are clear.

.github/workflows/ci.yml (1)

32-37: LGTM!

Media test dependencies installed with appropriate fallback handling. The || true on pymupdf ensures CI continues with Poppler backend if pip install fails.

site/src/styles/custom_backup.css (1)

1-38: LGTM!

Monokai-inspired theme variables and presentational tweaks. No issues.

site/astro.config.mjs (1)

17-21: LGTM!

Standard Starlight component override for custom PageTitle. Path is correct relative to astro.config.mjs.

debian/rules (1)

28-35: LGTM!

fread-media.py installed to /usr/share/fsuite/ with correct permissions. The comment clearly explains why memory-ingest.js is excluded from the Debian package (runtime Node.js dependency constraints).

site/src/content/docs/architecture/chains.md (1)

1-127: LGTM.

Documentation with visual pipeline diagrams. Static analysis spelling warnings for "fcase" and "Replay" are false positives — these are tool names.

site/src/content/docs/index.mdx (1)

1-191: LGTM.

Marketing content and visual diagrams. The "They do not..." repetition flagged by static analysis is intentional rhetorical emphasis, not a grammar error.

install.sh (1)

274-293: LGTM.

MCP helper installation logic correctly handles the node_modules dependency. Creates target directory first, then copies the JS, package.json, and node_modules together. Warning path when node_modules is missing prevents broken installs.

site/src/content/docs/getting-started/mental-model.md (1)

1-115: LGTM.

Documentation with updated reflex translation table and visual diagrams. Consistent with updated fedit --symbol terminology.

fread (2)

772-948: LGTM on media_dispatch.

Previous review issues addressed: safe array expansion (line 831), empty output returns 1 (lines 833-836), heredoc comment (lines 985-987). Error path returns 0 with structured error (lines 842-851) is intentional for correlation.

950-1014: LGTM on _media_maybe_ingest.

Detached spawn with proper cleanup. Helper resolution falls back gracefully. Temp file cleanup in both setsid and non-setsid paths.

site/src/styles/custom.css (1)

1-1175: LGTM.

Comprehensive theme overhaul. Stylelint warnings are stylistic preferences — font family casing should be preserved for cross-platform compatibility, @import url() is valid CSS syntax.
site/src/pages/[...slug].md.ts (1)
40-43: No action required. entry.body is properly available in @astrojs/starlight's docsLoader — retainBody defaults to true as of Astro 5.17.0. The fallback (?? '') is already present and correct.
			> Likely an incorrect or invalid review comment.
tests/test_fread.sh (1)
1956-1965: The command() shim reliably forces the no-setsid branch—no issue here.

The probe confirms the exported function is properly used in child bash processes: command_v_status=42 shows the function's custom exit code, not the builtin's behavior. The stderr noise (id: command not found) confirms the override is active. The test mechanism works as intended.
			> Likely an incorrect or invalid review comment.

coderabbitai · 2026-04-30T00:42:58Z

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 13 file(s) based on 11 unresolved review comments.

Files modified:

docs/internal/specs/2026-04-29-claude-design-site-brief.md
fread
fread-media.py
mcp/index.js
mcp/memory-ingest.js
mcp/package-lock.json
site/package-lock.json
site/src/components/PageActions.astro
site/src/content/docs/commands/fbash.md
site/src/content/docs/commands/fread.md
tests/run_all_tests.sh
tests/test_fread.sh
tests/test_memory_ingest.sh

Commit: 5547fa23e8489500559535ef210cc1d67135eba8

The changes have been pushed to the codex/fread-media-pdf-image branch.

Time taken: 6m 29s

Fixed 13 file(s) based on 11 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

lliWcWill added 8 commits April 29, 2026 02:17

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread Outdated

Comment thread fread-media.py Outdated

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread mcp/index.js Outdated

Comment thread mcp/memory-ingest.js Outdated

lliWcWill commented Apr 29, 2026

View reviewed changes

Comment thread tests/test_memory_ingest.sh

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread-media.py Outdated

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread Outdated

Comment thread fread Outdated

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread mcp/memory-ingest.js Outdated

Comment thread tests/test_memory_ingest.sh Outdated

lliWcWill added 2 commits April 29, 2026 04:41

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread install.sh Outdated

lliWcWill added 2 commits April 29, 2026 04:50

fix(install): remove duplicate brace from setup_share_files

6f44495

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread

Comment thread fread Outdated

Comment thread debian/rules Outdated

lliWcWill added 2 commits April 29, 2026 05:08

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread

Comment thread mcp/index.js Outdated

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread site/src/components/PageActions.astro Outdated

fix(fread): honor token budget for images

d57da1d

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread Outdated

fix: preserve markdown base paths and image no-truncate

7b43ffa

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread mcp/index.js Outdated

Comment thread site/src/components/PageActions.astro Outdated

Comment thread mcp/memory-ingest.js Outdated

fix: address PR media budgets and config paths

b8270b4

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread mcp/index.js Outdated

fix: preserve text chunks in mixed media fread

509cc3c

lliWcWill mentioned this pull request Apr 29, 2026

Docs: site revamp (rounds 1-6) — themed Starlight site with story, architecture, cheat sheet, command preambles #39

Merged

7 tasks

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread mcp/index.js

Comment thread mcp/index.js Outdated

fix: align media MCP caps with full mode

dceb006

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread-media.py Outdated

fix: surface Poppler pdfinfo failures

52d25b9

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread mcp/index.js Outdated

Comment thread fread-media.py Outdated

fix: harden media truncation and Poppler extraction errors

7db0361

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread Outdated

fix: count media JSON budgets by UTF-8 bytes

a1bde5d

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread Outdated

fix: preserve default PDF token budget

301c3cc

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread Outdated

Comment thread fread Outdated

fix: honor zero media budgets

1920b2b

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread fread Outdated

Comment thread fread Outdated

fix: harden media pretty budgets

0bbc615

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

coderabbitai Bot and others added 2 commits April 30, 2026 00:49

fix: apply CodeRabbit auto-fixes

5547fa2

Fixed 13 file(s) based on 11 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

fix: restore executable bits after review autofix

c4da473

lliWcWill merged commit 80e0e97 into master Apr 30, 2026
2 checks passed

Conversation

lliWcWill commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fread media reading: from binary skip to document-aware agent I/O

Current state

Executive summary

Why this matters

What shipped

1. fread-media.py: media engine

2. fread: CLI dispatch and contract preservation

3. MCP integration

4. Memory ingest: ShieldCortex bridge, best-effort by design

5. Tests and fixtures

6. Packaging and install parity

7. Documentation and site maturity

Review hardening timeline

Foundation review fixes

Codex review rounds

CodeRabbit review and autofix round

Commit ledger

File-level map

Core runtime

Install/package

Tests

Docs/site

Important design decisions

PDF default is text, not render

Media budgets are part of the contract

Memory ingest is best effort

MCP output must be native MCP, not copied from another API

Source checkout behavior matters as much as installed behavior

Verification

Local verification

GitHub Actions

Review triage

What we learned

1. Media reading is not just base64 transport

2. PDF rendering must be explicit

3. Failure shape is part of product quality

4. Mixed batches are a first-class workflow

5. Memory should enrich reads, not hold them hostage

6. Review bots are useful, but verification owns the result

Out of scope

How to try it

Closing note

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

lliWcWill commented Apr 29, 2026 •

edited

Loading

1. `fread-media.py`: media engine

2. `fread`: CLI dispatch and contract preservation

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading