Skip to content

feat(fread): image and PDF reading via fread-media engine#38

Merged
lliWcWill merged 42 commits into
masterfrom
codex/fread-media-pdf-image
Apr 30, 2026
Merged

feat(fread): image and PDF reading via fread-media engine#38
lliWcWill merged 42 commits into
masterfrom
codex/fread-media-pdf-image

Conversation

@lliWcWill
Copy link
Copy Markdown
Owner

@lliWcWill lliWcWill commented Apr 29, 2026

fread media reading: from binary skip to document-aware agent I/O

This PR turns fread from a text-only reader that skipped binary files into a first-class document and media reader for agents. The original seam was small and blunt: fread detected binary content and stepped aside. This branch replaces that limitation with a layered media path that can inspect images, extract PDF text, optionally render PDF pages, return MCP-native image blocks, and record best-effort document memories without making the read path fragile.

It also became a full hardening episode for fsuite itself. The first version added the capability. The review rounds turned it into something sharper: byte-accurate budgets, UTF-8-safe truncation, deterministic failure modes, source/install parity, MCP schema correctness, CI-proof executable bits, and a much better docs/site surface for how agents should use the suite.

Current state

  • PR: feat(fread): image and PDF reading via fread-media engine #38
  • Branch: codex/fread-media-pdf-image
  • Base: master
  • Head: c4da4735e61512c84e577c84edbdd6342931541b
  • GitHub PR metadata: 34 changed files, +6,562 / -443
  • Local diff stat: 34 files changed, 6,890 insertions, 443 deletions
  • Commit train: 42 commits from initial engine build through review hardening, CodeRabbit autofix fallout, and final CI recovery
  • Review threads inspected/triaged: 62 total, 0 unresolved
  • Latest CI: run 25142119629 success on c4da4735e61512c84e577c84edbdd6342931541b
  • Local full runner: bash tests/run_all_tests.sh passed 22/22 suites

Executive summary

fread can now read:

  • Images: PNG, JPEG, GIF, WEBP
  • PDFs as text by default
  • PDFs as metadata with --meta-only
  • PDFs as rendered page images with --render
  • Mixed batches containing both text and media
  • Media through CLI and MCP without dumping raw base64 into normal pretty output

The feature is intentionally conservative:

  • PDF default is text extraction, because rendering pages is far more expensive in model context.
  • PDF rendering is opt-in and page-capped.
  • Images are resized/compressed to fit token budget unless the caller explicitly requests no truncation.
  • Media reads have structured statuses, errors, and diagnostics so downstream callers can tell the difference between read, skipped, budget-skipped, and error states.
  • Memory ingest is best-effort and never allowed to break a read.

Why this matters

Before this PR, agents had to leave fsuite to inspect screenshots, diagrams, PDFs, invoices, papers, or scanned-ish documents. That broke the workflow contract in three ways:

  1. The tool suite stopped being the primary reconnaissance surface.
  2. Telemetry lost the actual read operation because the agent had to switch tools.
  3. MCP consumers either got no content or had to handle base64-shaped noise outside the intended fread experience.

After this PR, media is part of the same fread flow as source code:

fread docs/spec.pdf
fread screenshot.png
fread paper.pdf --pages 2:4
fread diagram.pdf --render --pages 1:2
fsearch -o paths '*.pdf' docs | fread --from-stdin --stdin-format=paths -o json

This keeps the doctrine intact: fsuite remains the agent's fast, structured, auditable read layer.

What shipped

1. fread-media.py: media engine

New Python engine for image/PDF handling:

  • Magic-byte detection before extension trust.
  • Image metadata extraction: format, dimensions, byte size, MIME type.
  • Image budget enforcement with resize/compression loop.
  • PDF backend abstraction with PyMuPDF primary and Poppler fallback.
  • PDF text extraction by default.
  • PDF render mode for page images.
  • PDF metadata mode.
  • Page-range parsing and validation.
  • Encrypted PDF detection.
  • Backend-specific error envelopes.
  • Poppler subprocess timeouts.
  • JSON-first output for fread and MCP consumers.

The engine deliberately separates document reading from shell formatting. Python owns binary/media behavior; bash owns CLI orchestration.

2. fread: CLI dispatch and contract preservation

fread now dispatches image/PDF files before the old binary skip path:

  • Media files go to fread-media.py.
  • Non-media binaries still follow the existing binary skip behavior.
  • Pretty output shows useful summaries and extracted PDF text, not raw base64.
  • JSON output preserves structured media payloads.
  • Multi-file reads support mixed text/media batches.
  • Media statuses are reflected in files[] so -o json and -o paths consumers can make honest decisions.

New/expanded flags include:

  • --render
  • --pages
  • --meta-only
  • --no-resize
  • --max-pages
  • --max-tokens
  • --token-budget
  • --no-ingest
  • --no-truncate behavior for media paths

The final review rounds hardened the budget semantics heavily: byte caps count bytes, not characters; UTF-8 truncation preserves character boundaries; PDF defaults preserve the engine's 25k token guard; zero budget correctly means unlimited; and pretty-mode media output respects remaining global line/byte budgets across batches.

3. MCP integration

mcp/index.js now understands fread media output and emits MCP-native content blocks:

  • Image reads return type: "image" blocks plus text metadata.
  • PDF text reads return text content.
  • PDF render reads return page image blocks plus metadata.
  • Mixed text/media batches merge both instead of dropping one side.
  • Full mode disables preview caps consistently.
  • Failed fread requests no longer get rerun just to format errors.
  • Media flags are exposed in the MCP schema.

Important contract decision: the MCP path uses the MCP content shape (mimeType, top-level data), not Anthropic Messages API's nested source / media_type shape. That distinction matters because copying the wrong shape would silently make the feature unusable for MCP clients.

4. Memory ingest: ShieldCortex bridge, best-effort by design

New mcp/memory-ingest.js helper gives media reads a path into ShieldCortex without making reads depend on memory health.

Behavior:

  • Reads an ingest payload from stdin.
  • Locates ShieldCortex configuration from environment or known agent config files.
  • Uses hash-based recall before write to avoid flooding memory with duplicate document reads.
  • Times out instead of hanging the caller.
  • Logs failures to ~/.cache/fsuite/memory-ingest.log.
  • Never blocks or fails the fread operation.
  • Supports opt-out through FSUITE_MEMORY_INGEST=0 or --no-ingest.

This is the first concrete instance of the larger ShieldCortex/fsuite integration direction: fsuite reads become searchable memory events, while still behaving as reliable CLI operations when memory is unavailable.

5. Tests and fixtures

This PR adds a large media-focused test surface:

  • Image pretty output.
  • Image JSON payload shape.
  • Image token budget forwarding.
  • Image no-truncate / zero-budget behavior.
  • Image meta-only behavior.
  • PDF text extraction.
  • PDF JSON payload shape.
  • PDF metadata mode.
  • PDF render mode.
  • PDF render page caps.
  • Invalid page ranges.
  • Encrypted PDF errors.
  • Poppler fallback behavior.
  • Poppler pdfinfo failures.
  • Poppler extraction failures.
  • Media error statuses in files[].
  • Budget-skipped media behavior.
  • Pretty output global budget enforcement.
  • UTF-8 byte and character-boundary behavior.
  • Engine probe behavior.
  • Install hints/self-check behavior.
  • Memory-ingest helper invalid input, missing config, config parsing, and timeout handling.
  • MCP structured parity for media flags, media blocks, mixed batches, full mode, and error envelopes.

Fixtures added:

  • tests/fixtures/media/sample.png
  • tests/fixtures/media/sample.jpg
  • tests/fixtures/media/sample.pdf
  • tests/fixtures/media/big.pdf

6. Packaging and install parity

This PR wires the new media pieces into install and package flows:

  • install.sh installs fread-media.py.
  • install.sh installs mcp/memory-ingest.js and its node_modules when available.
  • Debian packaging includes the new engine/helper paths.
  • CI verifies installed tools remain executable.
  • Optional dependency messaging calls out PyMuPDF for faster PDF rendering while keeping Poppler fallback usable.

The final CodeRabbit autofix accidentally stripped executable bits from six source scripts. That was caught by CI, fixed in c4da473, and then verified by the full local suite and GitHub Actions. The lesson is simple and durable: source checkout executability is part of the CLI contract, not decorative metadata.

7. Documentation and site maturity

This PR also expands the public/docs surface around the feature and around the suite's identity:

  • README.md documents image/PDF reads.
  • AGENTS.md documents fbash MCP unlock and CLI pipe patterns.
  • site/src/content/docs/commands/fread.md documents media flags and behavior.
  • site/src/content/docs/commands/fbash.md references the new chain behavior.
  • site/src/content/docs/architecture/chains.md clarifies command-chain thinking.
  • site/src/content/docs/getting-started/mental-model.md improves the suite mental model.
  • site/src/content/docs/reference/changelog.md records the work.
  • The docs site gets CAVE-theme work, page title/action affordances, Markdown routes, and Copy/Open-in-LLM actions.
  • GitHub Pages base-path handling was hardened for Markdown action URLs.
  • ARIA menu semantics were corrected so the UI does not advertise a menu contract it does not implement.

This matters because the feature is not only a backend capability. Agents and humans both need to understand the operating model: when to read text, when to render, when budgets apply, and why media reads stay structured.

Review hardening timeline

The first pass made the feature real. The review cycles made it trustworthy.

Foundation review fixes

Early review tightened the media engine before it became the public contract:

  • Replaced extension trust with magic-byte detection.
  • Added strict page range validation.
  • Added encrypted PDF detection.
  • Added subprocess timeouts for Poppler.
  • Hardened backend selection and error reporting.
  • Kept render mode opt-in rather than default.
  • Preserved text-first PDF behavior for context efficiency.

Codex review rounds

Codex feedback repeatedly found edge cases that matter for agent consumers:

  • Budget-skipped media files must not be reported as successfully read.
  • MCP should not rerun failed fread calls just to format an error.
  • MCP schema needed all media flags surfaced.
  • Media JSON could not be treated as single-file-only; batch reads need arrays.
  • Mixed text/media reads must merge both outputs.
  • Media-only MCP outputs must preserve diagnostics.
  • UTF-8 truncation must not slice multibyte characters.
  • JSON byte budgets must count UTF-8 bytes, not shell string characters.
  • Default PDF token caps must remain in force unless explicitly disabled.
  • --token-budget 0 and --no-truncate must mean unlimited for images too.
  • Pretty-mode media output must respect remaining global line/byte budgets in multi-file reads.
  • Missing setsid must not leak temp payloads.
  • Memory ingest helper exit status must be logged correctly.
  • Poppler pdfinfo and extraction failures must surface as errors, not empty successful documents.
  • Debian/install dependencies must include the new helper/engine reality.
  • Paths mode and status semantics needed to remain honest.

CodeRabbit review and autofix round

CodeRabbit caught additional items in tests, docs, accessibility, and parsing:

  • Test helper crashes should not be swallowed.
  • INT/TERM should not be converted into success.
  • Type annotation mismatches should not drift from runtime behavior.
  • Stale proof counts in docs needed updating.
  • Tracebacks should not leak in JSON by default.
  • MCP diagnostics should not disappear for media-only responses.
  • TOML args parsing needed quoted-argument awareness.
  • ARIA roles needed to match actual keyboard behavior.
  • Test runner counters needed scope/indentation cleanup.

Then CodeRabbit's own autofix introduced a source-mode regression by stripping executable bits from CLI/test/MCP entrypoints. That regression was isolated, fixed, and verified. It is now part of the story of this PR because it made the source/install parity contract explicit.

Commit ledger

Commit Area What it did
4672113 Phase 1 Added the first fread-media.py image/PDF engine.
9c1525c Phase 1 hardening Applied foundation engine review fixes.
0c06c81 Phase 2 Wired fread dispatch for image/PDF files.
59ed869 Phase 2 follow-up Tightened media dispatch behavior.
57d9033 Phase 3 Added MCP image/text block output for fread media.
f374bfc Phase 4 Added memory-ingest helper for media reads.
3918f92 Phase 5 Added media and memory-ingest test coverage.
17909ec Phase 6 Documented media reading and packaging.
65e9185 Review hardening Addressed Codex P1/P2 review findings.
c4cb6e4 Review hardening Addressed CodeRabbit and Codex feedback on PR #38.
9c9ec34 Review hardening Fixed Codex P1 truncation and encrypted-PDF CI skip.
b897c3c Review hardening Fixed token-budget zero and multi-file media payload behavior.
96ac97b Review hardening Addressed Codex P2 follow-ups round 3.
eca3030 Tests Isolated PATH stub for memory-ingest no-config test.
f80a0ab Install Shipped memory-ingest helper with MCP dependencies.
6f44495 Install Fixed duplicate brace in share-file setup.
965f783 Review hardening Fixed paths mode, byte caps, and Debian dependencies.
4dc67da Budgeting Avoided double-enforcing media chunk token budgets.
dd02d20 Review hardening Fixed budget-skip status, MCP rerun behavior, schema flags, and help/docs probes.
29297ce Style Normalized fread MCP schema indentation.
4b18006 Install Skipped flat memory-ingest install when node_modules are absent.
0e16bc3 Error status Recorded files[].status = media_error on engine failure.
da50f34 Docs Added README media-reading docs.
b700299 Docs/site Refreshed guide command, internal cheatsheet, and design brief.
2346902 Budgets/batches Enforced media output caps and iterated media payload arrays.
b2b72b7 Agent docs Documented fbash MCP unlock for chain pipes.
cc92e22 Memory ingest Preserved helper failure status in logs.
e2edb7a Site Added CAVE theme and per-page Copy/Open-in-LLM dropdown.
1d62cbc Site/docs Added chain diagram and native-vs-fsuite documentation block.
d57da1d Budgets Honored token budgets for images.
7b43ffa Site/budgets Preserved Markdown base paths and image no-truncate behavior.
b8270b4 Budgets/config Addressed PR media budgets and config paths.
509cc3c MCP batches Preserved text chunks in mixed media fread.
dceb006 MCP caps Aligned media MCP caps with full mode.
52d25b9 PDF errors Surfaced Poppler pdfinfo failures.
7db0361 Truncation/errors Hardened media truncation and Poppler extraction errors.
a1bde5d Budgets Counted media JSON budgets by UTF-8 bytes.
301c3cc PDF budget Preserved default PDF token budget.
1920b2b Budgets Honored zero media budgets.
0bbc615 Pretty budgets Hardened media pretty-output budgets.
5547fa2 CodeRabbit autofix Applied CodeRabbit autofixes, then exposed source executable-bit regression.
c4da473 CI recovery Restored executable bits and returned CI to green.

File-level map

Core runtime

  • fread: media dispatch, flags, output formatting, budgets, statuses, memory ingest launch.
  • fread-media.py: image/PDF engine, backend abstraction, parsing, rendering, metadata, error envelopes.
  • mcp/index.js: schema flags, media block conversion, mixed output merging, caps, error handling.
  • mcp/memory-ingest.js: best-effort ShieldCortex write path.

Install/package

  • install.sh: installs media engine/helper and reports optional media dependencies.
  • debian/rules: includes the new media runtime pieces.
  • .github/workflows/ci.yml: supports the expanded suite/dependency needs.

Tests

  • tests/test_fread.sh: main media/budget/error regression suite.
  • tests/test_memory_ingest.sh: memory helper behavior.
  • mcp/structured-parity.test.mjs: MCP schema/content parity.
  • tests/run_all_tests.sh: final aggregate runner.
  • tests/fixtures/media/*: deterministic media fixtures.

Docs/site

  • README.md
  • AGENTS.md
  • docs/internal/readme-drafts/04-cheatsheet-changelog.md
  • docs/internal/specs/2026-04-29-claude-design-site-brief.md
  • site/src/content/docs/commands/fread.md
  • site/src/content/docs/commands/fbash.md
  • site/src/content/docs/architecture/chains.md
  • site/src/content/docs/getting-started/mental-model.md
  • site/src/content/docs/reference/changelog.md
  • site/src/components/PageActions.astro
  • site/src/pages/[...slug].md.ts
  • site/src/styles/custom.css

Important design decisions

PDF default is text, not render

Rendering PDF pages is expensive. Text extraction gives agents the information they usually need at a fraction of the token cost. Render mode exists for layouts, diagrams, signatures, scanned-ish pages, and visual inspection, but it is explicit:

fread file.pdf --render --pages 1:2

Media budgets are part of the contract

Budget behavior is not just a UI nicety. MCP callers, fbash, and downstream agents depend on it. Review hardening made this explicit:

  • Use byte counts where the contract says bytes.
  • Preserve UTF-8 character boundaries.
  • Respect remaining global budgets across multi-file reads.
  • Preserve default PDF token caps.
  • Treat zero budget/no-truncate as unlimited only when explicitly requested.

Memory ingest is best effort

A read tool must stay reliable even if memory is offline, misconfigured, slow, or missing dependencies. Memory ingest therefore runs detached, logs failures, supports opt-out, and does not change the success/failure of the read itself.

MCP output must be native MCP, not copied from another API

This branch avoided the easy wrong move: copying Anthropic Messages API image blocks into MCP. The MCP adapter emits MCP content blocks with the correct field names and shape.

Source checkout behavior matters as much as installed behavior

CI caught this hard: if scripts are executable in installed form but not executable in source form, the development and MCP paths break. The final commit exists solely to restore that source-level contract.

Verification

Local verification

  • git diff --check passed.
  • bash tests/test_fread.sh passed 96/96 during final repair validation.
  • bash tests/test_fread_symbols.sh passed 12/12 during final repair validation.
  • bash tests/test_coderabbit_fixes.sh passed 14/14 during final repair validation.
  • bash tests/test_memory_ingest.sh passed 5/5 after local MCP deps install.
  • cd mcp && npm test passed 49/49 after local MCP deps install.
  • bash tests/run_all_tests.sh passed 22/22 suites.

GitHub Actions

Latest run:

Review triage

  • 62 total PR review threads inspected/triaged.
  • 0 unresolved threads remaining after final replies/resolution state settled.
  • CodeRabbit status is green.
  • Codex/CodeRabbit feedback produced concrete hardening rather than churn: each wave narrowed real contracts around budgets, errors, MCP content, install parity, and test truthfulness.

What we learned

1. Media reading is not just base64 transport

The real feature is not "can return an image." The real feature is controlled, inspectable, budget-aware document reading that behaves consistently across CLI, JSON, MCP, and tests.

2. PDF rendering must be explicit

Default-rendering PDFs would burn context and make fread feel flashy but irresponsible. Text-first preserves fsuite's token-budget-first identity.

3. Failure shape is part of product quality

Silent empty PDFs, swallowed Poppler errors, tracebacks in JSON, and success statuses without emitted content all produce bad agent behavior. This PR hardens those states into explicit errors/statuses.

4. Mixed batches are a first-class workflow

Agents do not always read one file at a time. A useful fread must handle batches with code, docs, images, and PDFs together without silently dropping part of the response.

5. Memory should enrich reads, not hold them hostage

ShieldCortex ingest is valuable, but the read operation is the primary contract. If memory fails, the read still succeeds.

6. Review bots are useful, but verification owns the result

The review bots found real issues. They also introduced a real regression. The mature workflow is not blind acceptance or defensive rejection; it is verify, fix, prove, and keep the commit trail honest.

Out of scope

This PR intentionally does not add:

  • OCR.
  • PDF table extraction.
  • PyMuPDF4LLM extraction.
  • Pandas/Excel/CSV/TSV analytics.
  • Notebook .ipynb reading.
  • A new ShieldCortex document category.
  • Full semantic memory search over read documents.

Those are good next chapters, but this PR establishes the media/document foundation first.

How to try it

# Image metadata/preview path
fread tests/fixtures/media/sample.png

# Image JSON/MCP-friendly payload
fread tests/fixtures/media/sample.png -o json

# PDF text default
fread tests/fixtures/media/sample.pdf

# PDF metadata
fread tests/fixtures/media/sample.pdf --meta-only

# PDF render, bounded by page range
fread tests/fixtures/media/sample.pdf --render --pages 1:2 -o json

# Mixed batch through stdin
printf '%s\n' README.md tests/fixtures/media/sample.png tests/fixtures/media/sample.pdf | \
  fread --from-stdin --stdin-format=paths -o json

# Disable memory ingest for sensitive media
FSUITE_MEMORY_INGEST=0 fread tests/fixtures/media/sample.pdf

Closing note

This started as "let fread read pictures and PDFs." It ended as a broader maturity pass on the suite's agent contract: media is structured, budgets are real, errors are explicit, MCP blocks are native, installs are complete, source scripts remain executable, docs explain the workflow, and CI proves it.

That is the shape of a tool growing up: not just more capability, but fewer lies at the boundaries.

Implements fread-media.py with probe/image/pdf subcommands. Supports
PyMuPDF (primary) and Poppler (fallback) PDF backends, Pillow image
backend with stdlib fallback, magic-byte detection, iterative resize
to token budget, ingest_payload assembly, and structured JSON output
matching the downstream spec for Phase 2 (bash) and Phase 3 (MCP).

Adds test fixtures: sample.png, sample.jpg, sample.pdf (5 pages with
extractable text), big.pdf (12 pages for render-refusal testing).
Critical: explicit returns after emit(); subprocess timeouts.
Important: PDF encryption detection; truthful Poppler metadata;
strict page-range parser; dimension-based token estimates.
Spec gaps: invalid backend rejection; no-subcommand JSON error;
file perms 755. Minor: dedup resize loop; Poppler stderr capture;
symlink-aware paths.

Reviewers: claude-opus (spec), feature-dev:code-reviewer (quality).
Adds media_dispatch() before is_binary_file skip at fread:727.
Pretty mode shows metadata only; JSON mode passes through engine JSON.
New flags: --render, --pages, --meta-only, --no-resize, --max-pages,
--token-budget, --max-tokens, --no-ingest.
Post-call memory ingest spawned detached via setsid; helper at
mcp/memory-ingest.js (created in Phase 4) is best-effort.
- Pretty mode token estimates now read from engine JSON (was using
  byte-proxy formula, off by ~8x for images).
- Error message field name was 'message' but engine emits 'error'.
- New top-level "media_payload" key in JSON output: verbatim engine
  JSON for Phase 3 MCP integration to read without parsing chunk
  content strings.
Adds buildMediaContent() helper that maps fread's media_payload
field to MCP 2025-11-25 content blocks. The fread MCP handler now
short-circuits to image content for image/pdf-pages and to text
for pdf-text/meta — bypassing cli()'s text-only renderer.

Snake_case mime_type translated to camelCase mimeType per spec.
Text-file path unchanged.
Adds mcp/memory-ingest.js — a Node MCP stdio client that ingests
engine ingest_payload JSON via ShieldCortex remember(). Best-effort
with 3s timeout, recall-by-hash dedupe, and graceful failure (exits
non-zero with stderr diagnostic; never blocks the parent fread call).

ShieldCortex command resolution: FSUITE_SHIELDCORTEX_CMD env →
which shieldcortex-mcp → ~/.claude.json mcpServers → ~/.codex/config.toml.

Phase 2 already wired the fread bash spawn; this fills the helper seam.
…se 5)

Adds 16 new test functions to tests/test_fread.sh covering image
pretty/JSON/meta/no-resize, PDF text/meta/render/page-cap/invalid-
pages/encrypted/backend-fallback/budget-truncation paths.

Adds tests/test_memory_ingest.sh for the Phase 4 helper covering
empty stdin, malformed JSON, no-config, and timeout scenarios.

Skips Poppler-dependent tests gracefully when pdftotext is absent.
- site/src/content/docs/commands/fread.md: media flags, output modes,
  token-cost table, backend selection, memory MCP integration, examples
- site/src/content/docs/reference/changelog.md: Unreleased entry
- debian/rules: install fread-media.py and mcp/memory-ingest.js
  to /usr/share/fsuite/
- install.sh: same paths + PyMuPDF detect-and-suggest hint
- fread: add fallback resolver for memory-ingest.js installed paths
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Walkthrough

Adds first-class image/PDF handling to fread via a new Python engine (fread-media.py), a Node.js async memory ingester (mcp/memory-ingest.js), packaging/install updates, MCP adapter changes to render media blocks, new CLI flags for media control, and extensive tests and docs. Media reads optionally trigger best-effort ShieldCortex ingestion.

Changes

Cohort / File(s) Summary
Media Engine
fread-media.py
New CLI engine: format probing, image/pdf backends (Pillow/stdlib, PyMuPDF/Poppler), probe/image/pdf commands, token-budgeting, resizing/rendering, explicit error codes, emits JSON with media_payload and optional ingest_payload.
Fread CLI Integration
fread
Dispatches binaries to fread-media.py (probe → run), constructs engine args from new media flags, integrates budgeted pretty/JSON outputs (media_payload/media_payloads), records media errors/budget skips, and optionally launches detached memory ingest.
Memory Ingest Helper
mcp/memory-ingest.js
New Node.js ingester: validates payload, resolves ShieldCortex MCP command from env/which/config files, optional dedupe via recall (SHA256/hash tag), remember on miss, enforces 3s timeout, logs to stderr.
MCP Adapter
mcp/index.js
Parses media_payload/media_payloads, builds MCP content blocks (images, pdf pages/meta, budgeted text), short-circuits text-only CLI rendering for media results, adds formatExecError helper and exposes __test__ helpers.
Packaging & Install
debian/rules, install.sh
Installs fread-media.py and copies MCP helper (mcp/memory-ingest.js) into shared paths; marks executables; install summary mentions optional PyMuPDF hint. Note: memory-ingest.js not shipped in .deb per comments.
Tests & CI
tests/test_fread.sh, tests/test_memory_ingest.sh, tests/run_all_tests.sh, .github/workflows/ci.yml
Adds comprehensive media test suite (images/PDFs, budgets, truncation, backends, ingest/no-ingest), new memory-ingest tests, CI apt/pip installs for poppler/Pillow/pymupdf (non-fatal).
Docs & Site
site/src/content/docs/commands/fread.md, site/src/content/docs/reference/changelog.md, README.md
Documents media flags/behaviors, JSON media_payload shape, ingest semantics and opt-outs, backend selection and error codes, and changelog/cheatsheet updates.
Site Components & Styling
site/src/components/PageActions.astro, site/src/components/PageTitle.astro, site/src/styles/custom.css, ...
Adds page actions component, title override, new theme CSS and various site content updates and a markdown raw endpoint (site/src/pages/[...slug].md.ts).

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant fread as "fread CLI"
    participant engine as "fread-media.py"
    participant ingest as "memory-ingest.js"
    participant mcp as "ShieldCortex MCP"

    User->>fread: run fread <file> + media flags
    fread->>engine: probe file
    engine-->>fread: probe result (type, backend, ingest_payload?)
    alt media detected
        fread->>engine: run engine subcommand (image/pdf) with flags
        engine-->>fread: JSON result (media_payload, ingest_payload, errors)
        fread->>User: pretty or JSON output (includes media_payload)
        alt ingest enabled
            fread->>ingest: spawn detached with ingest_payload
            ingest->>ingest: validate payload, resolve MCP command
            ingest->>mcp: recall (dedupe)
            mcp-->>ingest: recall result
            alt not duplicate
                ingest->>mcp: remember (write memory)
                mcp-->>ingest: ack
            end
        end
    else engine missing/error
        fread->>User: warn/skip and fall back to prior binary-skip behavior
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title directly summarizes the main change: adding image and PDF reading to fread via a new media engine. Concise and specific.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description directly addresses the changeset: media reading for images/PDFs via fread-media.py engine, MCP integration, memory ingest helper, test coverage, and documentation. It explains design decisions, tradeoffs (text-first PDFs, best-effort memory), and verification results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fread-media-pdf-image

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17909ec36f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread Outdated
Comment thread fread-media.py Outdated
P1 (fread:904): inline Python heredoc had `print(json.dumps(ip))` at
column 0 instead of indented under `if ip is not None:`. SyntaxError
on every media read — payload file was never populated, so detached
memory ingest silently short-circuited. Now correctly indented.
Verified end-to-end: ShieldCortex log line lands after fread call.

P2 (fread-media.py:685): stdlib fallback condition
`if estimate > args.max_tokens and not args.no_resize` was inverted
relative to the documented `--no-resize` behavior ("refuse if over
token budget"). Stdlib has no resize capability, so over-budget
images should be rejected regardless of the flag. Drop the
`and not args.no_resize` guard.

Both findings from chatgpt-codex-connector on PR #38.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@fread`:
- Line 774: The command invocation may pass an unintended empty-string argument
when engine_args is empty; update the call that sets engine_json (the invocation
of python3 "$ENGINE_MEDIA" "$subcmd" "$path" with "${engine_args[@]}" and
capturing engine_exit) to use the safe array-expansion pattern so no spurious
empty argument is passed (use the ${engine_args[@]+"${engine_args[@]}"} pattern
in place of "${engine_args[@]}"), preserving the existing error redirection and
the engine_exit capture.
- Around line 899-905: The inline Python passed to python3 -c relies on
flush-left indentation and is fragile if surrounding Bash code is reindented;
add a clear comment next to the printf | python3 -c invocation warning that the
Python must remain left-aligned, and better yet replace the inline -c snippet
with a heredoc or a quoted heredoc (so the Python body can be safely indented)
while keeping the same behavior of reading engine_json and writing
ingest_payload to payload_file; reference the existing usage of engine_json,
payload_file and the python3 invocation (the small Python snippet that loads
stdin and prints d.get('ingest_payload')) so reviewers know exactly where to
make the change.
- Around line 776-779: The branch that detects empty engine output currently
calls add_warning with "$path" and returns 0; change that return to 1 so that an
empty engine_json triggers the binary-skip fallback and emits the explicit
"binary file skipped" behavior; do the same for the later error branch (the
other return 0 case) so both the engine_json empty check and the error path
return 1 instead of 0, referencing the engine_json variable and the add_warning
call to locate the spots to update.

In `@fread-media.py`:
- Around line 169-204: All methods that call fitz.open (page_count,
extract_text, render_page, get_page_dims) must ensure the Document is always
closed even if an exception occurs: wrap fitz.open(path) usage in a try/finally
or use a context manager so doc.close() is guaranteed in the finally block;
preserve existing encryption checks from extract_text/render_page (check
doc.is_encrypted and doc.needs_pass and raise) and add the same check to
get_page_dims; also validate page index bounds in render_page and get_page_dims
similar to extract_text to avoid index errors before accessing doc[page].

In `@mcp/memory-ingest.js`:
- Around line 154-160: The timeout branch of the Promise.race can leave doIngest
and its StdioClientTransport child running and orphaned; modify doIngest to
accept an AbortSignal (via AbortController) and ensure the caller aborts the
controller when the timeout fires, or explicitly call/await a transport shutdown
method (e.g., StdioClientTransport.kill()/close()/destroy()) before calling
process.exit; update the Promise.race usage to create an AbortController, pass
controller.signal into doIngest, call controller.abort() in the timeout handler
(and/or call client.transport.kill()) so the child process is cleanly terminated
before exiting.
- Around line 110-118: The arguments object is including source: payload.source
even when payload.source is undefined; change the construction in the call that
builds the arguments object (the block around arguments: {...}) to only include
the source key when payload.source is defined (e.g., conditionally spread in {
source: payload.source } only if payload.source !== undefined or truthy) so the
MCP tool never receives an explicit undefined source value.
- Around line 163-165: The ternary in the log call is redundant (msg ===
"timeout" ? "error" : "error"); update the log level selection so it conveys the
intended severity: replace the ternary with the correct conditional behavior
(for example, use "warn" when msg === "timeout" and "error" otherwise) or simply
use a single literal "error" if no special-case is required; modify the call to
log(...) accordingly where msg is defined and used in memory-ingest.js
(referencing the variables msg and the log(...) invocation).

In `@site/src/content/docs/commands/fread.md`:
- Around line 166-167: Add a single blank line between the preceding fenced code
block and the "## See also" heading in fread.md so the heading is separated from
the code block; locate the "## See also" heading in the file (near the end of
the fread documentation) and insert one empty line immediately above it.

In `@site/src/content/docs/reference/changelog.md`:
- Around line 18-19: Update the changelog entry to mention both flags and
clarify their difference: add `--token-budget` to the listed new flags and a
short parenthetical or sentence explaining that `--token-budget` is fread's
global budget (also forwarded to the engine) while `--max-tokens` is a
media-specific alias that applies per-media; reference the existing flag names
`--max-tokens` and `--token-budget` so readers can see both and understand which
is global vs media-scoped.

In `@tests/run_all_tests.sh`:
- Around line 268-277: The block that invokes run_test_suite for the
memory-ingest helper (inside main()) is misindented compared to other suite
blocks; align its lines starting from the echo through the TOTAL_TESTS update to
use the same 2-space indentation as other suites so the echo, if/then/else,
TOTAL_PASSED/TOTAL_FAILED increments and failed_suites+=("memory-ingest") lines
match the file's established style and keep the
run_test_suite/TOTAL_PASSED/TOTAL_FAILED/failed_suites/TOTAL_TESTS logic
unchanged.

In `@tests/test_fread.sh`:
- Around line 1502-1509: The test uses a default list for pages_returned which
masks missing-field failures; change the extraction so pr =
f.get('pages_returned') (no default) and update the ok condition to explicitly
verify pr is a list and len(pr) < 5 and that f.get('truncated') is True, failing
when pages_returned is None or not a list; reference the variables mp, f, pr and
the keys 'pages_returned' and 'truncated' to locate where to update the
assertion logic.

In `@tests/test_memory_ingest.sh`:
- Around line 31-35: The run_test function currently accepts a test name as $1
then immediately shifts it away and never logs it; update run_test (and
reference TESTS_RUN and the run_test function) to capture the test name before
shift and print or log it (e.g., echo "Running test: $TEST_NAME") so each
invocation records the human-readable test name for debugging while preserving
the increment of TESTS_RUN and executing the remaining command arguments as
before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3c762780-231b-47ea-8f5d-8d9ba4104ec5

📥 Commits

Reviewing files that changed from the base of the PR and between c745101 and 17909ec.

⛔ Files ignored due to path filters (4)
  • tests/fixtures/media/big.pdf is excluded by !**/*.pdf
  • tests/fixtures/media/sample.jpg is excluded by !**/*.jpg
  • tests/fixtures/media/sample.pdf is excluded by !**/*.pdf
  • tests/fixtures/media/sample.png is excluded by !**/*.png
📒 Files selected for processing (11)
  • debian/rules
  • fread
  • fread-media.py
  • install.sh
  • mcp/index.js
  • mcp/memory-ingest.js
  • site/src/content/docs/commands/fread.md
  • site/src/content/docs/reference/changelog.md
  • tests/run_all_tests.sh
  • tests/test_fread.sh
  • tests/test_memory_ingest.sh
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test-suite
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fsuite` for the suite-level mental model, then reach for operational tools (fs, ftree, fls, fsearch, fcontent, fmap, fread, fedit, fwrite, fbash, fprobe, fcase, freplay, fmetrics) for filesystem reconnaissance, bounded reading, and surgical editing

Applied to files:

  • debian/rules
  • site/src/content/docs/reference/changelog.md
  • install.sh
🪛 markdownlint-cli2 (0.22.1)
site/src/content/docs/commands/fread.md

[warning] 166-166: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🪛 Shellcheck (0.11.0)
tests/test_memory_ingest.sh

[info] 18-21: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 23-29: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 40-52: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 55-67: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 70-85: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 89-102: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

tests/test_fread.sh

[info] 1096-1116: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1119-1152: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1155-1180: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1183-1195: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1198-1216: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1219-1246: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1249-1270: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1273-1301: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1304-1330: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1333-1356: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1359-1419: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1422-1453: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1456-1472: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1475-1486: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1494-1516: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1519-1577: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

🔇 Additional comments (8)
debian/rules (1)

28-29: LGTM.

Packaging correctly mirrors install.sh: both fread-media.py and memory-ingest.js are installed to /usr/share/fsuite/ with executable permissions.

mcp/index.js (2)

1402-1434: LGTM — media content builder handles all payload types.

The buildMediaContent function correctly maps engine output to MCP content blocks. The snake_case to camelCase translation (mime_typemimeType) is handled properly, and each payload type produces appropriate content block structures.


1608-1625: Media short-circuit correctly falls through on failure.

The try-catch ensures any engine error or missing media_payload gracefully falls back to the standard cli() text path. This is the right pattern for optional feature enablement.

install.sh (2)

274-277: LGTM.

Memory-ingest.js installation follows the same pattern as other shared files, with proper error handling via die if the file is missing.


744-748: LGTM — conditional hint is user-friendly.

The PyMuPDF hint only appears when the module isn't installed, avoiding noise for users who already have it.

tests/test_memory_ingest.sh (1)

89-102: Good timeout coverage.

Using FSUITE_SHIELDCORTEX_CMD="sleep 10" to trigger the internal 3s timeout is an effective way to test the timeout path without external dependencies.

fread (1)

913-916: Detached spawn pattern is solid.

Using setsid bash -c '...' with positional arguments correctly isolates the background process. The temp file cleanup (rm -f "$1") inside the subshell ensures no orphaned files. The </dev/null >/dev/null 2>&1 & properly detaches from the parent's file descriptors.

tests/test_fread.sh (1)

1090-1116: Tests look good.

The media test suite comprehensively covers the new engine dispatch behavior, including error paths for encryption, invalid backends, and token budget enforcement.

Comment thread fread Outdated
Comment thread fread Outdated
Comment thread fread
Comment thread fread-media.py Outdated
Comment thread mcp/memory-ingest.js
Comment thread site/src/content/docs/commands/fread.md Outdated
Comment thread site/src/content/docs/reference/changelog.md Outdated
Comment thread tests/run_all_tests.sh Outdated
Comment thread tests/test_fread.sh
Comment thread tests/test_memory_ingest.sh
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65e9185e12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread mcp/index.js Outdated
Comment thread mcp/memory-ingest.js Outdated
P1: mcp/index.js no longer double-runs fread for non-media files;
    fread-media.py wraps every fitz.open() in try/finally with
    encryption checks across all PyMuPDFBackend methods;
    fread media_dispatch returns 1 on empty engine output so binary
    skip path can fire as fallback.

P2: mcp/memory-ingest.js TOML parser preserves the args array.

Nits: test_fread.sh strict pages_returned check; memory-ingest.js
spread-omit undefined source, kill orphan subprocess on timeout,
warn instead of error log level for timeouts; markdown blank line
in fread.md; clearer flag listing in changelog.md; consistent
indentation in run_all_tests.sh; safe empty-array expansion in
fread.

CI: poppler-utils, python3-pil, pymupdf installed for media tests.
Comment thread tests/test_memory_ingest.sh
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4cb6e47fd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread-media.py Outdated
P1 (fread-media.py:891): PDF text truncation appended the full page
when remaining<=0, blowing past the configured token budget. Now
drops the page entirely when no budget is left, so total tokens
stay under the cap.

CI encrypted-PDF test: SKIP cleanly when fitz is not importable
instead of FAIL. Phase 5 spec called for skip-not-fail; was missed.
Also harden ci.yml's pymupdf install via sudo --break-system-packages
so system Python can find it (previous --user install put it on a
path that the test's `python3 -c 'import fitz'` couldn't see).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9c9ec34b6a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread Outdated
Comment thread fread Outdated
…yload)

P2 fread:767: --no-truncate sets TOKEN_BUDGET=0, but the engine forward
guard `(( TOKEN_BUDGET > 0 ))` dropped that case to the engine default
(25000), still truncating. Engine now treats token_budget=0 as unlimited
(via sys.maxsize); fread forwards the value unconditionally.

P2 fread:803: MEDIA_PAYLOAD_JSON was a single global; multi-file batches
overwrote earlier files. Added MEDIA_PAYLOADS_JSON array, emitted as
top-level "media_payloads" plural in render_json. Singular "media_payload"
preserved for single-file Phase 3 MCP backward-compat.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b897c3cd37

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread mcp/memory-ingest.js Outdated
Comment thread tests/test_memory_ingest.sh Outdated
P2 memory-ingest.js:74: TOML section keys with quoted names like
[mcp_servers."shield.cortex"] retained the quotes when matched
against nameRe, so quoted server names were silently skipped. Strip
surrounding quotes before name match.

P2 test_memory_ingest.sh:77: hardcoded PATH dropped Node on
toolcache/nvm/asdf setups, causing exit 127 instead of exercising
the no-config branch. Prepend the directory of the active node
binary to PATH so the helper invocation always finds it.
Previous attempt prepended dirname(node) to PATH, but on machines where
node and shieldcortex-mcp share a directory (common for ~/.local/bin),
the stub still resolved shieldcortex-mcp and the test exercised the
wrong branch. Switch to an mktemp dir with a node symlink only — works
on toolcache/nvm/asdf without leaking other binaries.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eca30301e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread install.sh Outdated
memory-ingest.js imports @modelcontextprotocol/sdk at module load
time. Previous install copied just the script flat to
\${share_dir}/memory-ingest.js, leaving the SDK unresolvable —
ERR_MODULE_NOT_FOUND on every memory ingest in production installs.

install.sh now copies the entire mcp/ tree (script + package.json +
node_modules) to \${share_dir}/mcp/ when node_modules exists. fread's
helper resolver prepends <share>/mcp/memory-ingest.js to the
candidate list so the colocated SDK loads correctly.

Falls back to flat install with a warning when node_modules is
missing, surfacing as 'helper not found' / 'helper exit code 1' in
the ingest log rather than silently breaking.

debian packaging unchanged for now — shipping node_modules in a
.deb is too heavy. Debian users who need memory ingest should run
'npm ci --prefix mcp' under the shared dir or use install.sh.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f44495184

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread
Comment thread fread Outdated
Comment thread debian/rules Outdated
P1 fread:811: media JSON chunk now goes through check_budget_preadd
so --max-bytes/--max-lines apply uniformly. Previously media files
emitted full base64 regardless of byte cap.

P2 fread:800: -o paths mode now skips the media formatter entirely;
render_paths emits the path alone. Previously the pretty summary line
contaminated machine-readable paths output.

P2 debian/rules:29: dropped the standalone memory-ingest.js install
line. Shipping node_modules in a .deb is too heavy; install.sh
handles the full mcp/ tree for users who want detached ingest.
Debian users get a clean 'helper not found' log instead of
ERR_MODULE_NOT_FOUND.
check_budget_preadd checks both MAX_BYTES/MAX_LINES AND TOKEN_BUDGET.
For PDF text reads with --token-budget, the engine already truncated
internally, so bash gating on token budget rejected the truncated
result entirely. Switched to manual MAX_BYTES/MAX_LINES check; the
engine owns token budget for PDFs.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dc67dad6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread
Comment thread mcp/index.js Outdated
…s, help)

- fread: budget-blocked media now records status=budget_skipped and returns
  before mark_file_read/_media_maybe_ingest, so over-budget reads no longer
  count as successful reads or trigger memory ingest.
- mcp/index.js: factor cli() catch body into formatExecError() helper so the
  fread MCP handler can reuse it on engine error instead of re-spawning fread,
  cutting double-latency and duplicate telemetry on missing/invalid inputs.
- mcp/index.js: expose --meta-only, --render, --pages, --no-resize,
  --max-pages, --max-tokens, --no-ingest in the fread MCP inputSchema and
  forward them to the CLI so MCP agents can drive PDF render / image meta /
  ingest opt-out paths.
- fread: document media flags in usage(), --install-hints (Pillow / pymupdf /
  poppler-utils), and --self-check (optional PIL / fitz / pdftotext probe
  that does not flip dependency-ok).
- tests: add test_fread.sh coverage for budget_skipped status + media-aware
  --self-check / --install-hints, and structured-parity coverage for the
  new fread schema fields and the formatExecError envelope shape (isError
  with non-empty content[text]).

PR #38, Codex P2 round 2.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d62cbc984

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread site/src/components/PageActions.astro Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d57da1d729

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b43ffadea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread mcp/index.js Outdated
Comment thread site/src/components/PageActions.astro Outdated
Comment thread mcp/memory-ingest.js Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8270b45c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread mcp/index.js Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 509cc3c262

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread mcp/index.js
Comment thread mcp/index.js Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dceb0069b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread-media.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52d25b9489

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread mcp/index.js Outdated
Comment thread fread-media.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7db03612d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1bde5dcb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 301c3cc36e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread Outdated
Comment thread fread Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1920b2bfa9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread fread Outdated
Comment thread fread Outdated
@lliWcWill
Copy link
Copy Markdown
Owner Author

@coderabbitai Full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

♻️ Duplicate comments (2)
fread-media.py (2)

203-211: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing page index bounds check in get_page_dims.

Same issue as render_pagedoc[page] is accessed without bounds validation. An invalid index will raise an IndexError.

Add bounds validation
     def get_page_dims(self, path: str, page: int) -> dict:
         doc = fitz.open(path)
         try:
             if doc.is_encrypted and doc.needs_pass:
                 raise RuntimeError("PDF is encrypted and requires a password")
+            if page < 0 or page >= doc.page_count:
+                raise RuntimeError(f"Page {page} out of range (0-{doc.page_count - 1})")
             r = doc[page].rect
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fread-media.py` around lines 203 - 211, get_page_dims currently accesses
doc[page] without validating the page index (same issue as render_page); add a
bounds check against doc.page_count (or len(doc)) before accessing doc[page] and
raise a clear exception (e.g., IndexError or ValueError) when page is negative
or >= doc.page_count, ensuring the existing finally doc.close() still executes;
reference get_page_dims, doc.page_count (or len(doc)), and doc[page].rect when
making the change.

191-201: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing page index bounds check in render_page.

PyMuPDFBackend.render_page accesses doc[page] at line 196 without validating that page is within [0, doc.page_count). If an out-of-bounds index is passed, fitz will raise an IndexError that surfaces as a generic traceback rather than a clean error message.

Add bounds validation
     def render_page(self, path: str, page: int, dpi: int = 100):
         doc = fitz.open(path)
         try:
             if doc.is_encrypted and doc.needs_pass:
                 raise RuntimeError("PDF is encrypted and requires a password")
+            if page < 0 or page >= doc.page_count:
+                raise RuntimeError(f"Page {page} out of range (0-{doc.page_count - 1})")
             pix = doc[page].get_pixmap(dpi=dpi)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fread-media.py` around lines 191 - 201, The render_page method currently
indexes doc[page] without validating bounds; add a check using doc.page_count to
ensure 0 <= page < doc.page_count before accessing doc[page], and if invalid
raise a clear exception (e.g., ValueError or IndexError) with a descriptive
message including the requested page and total pages; keep the existing finally:
doc.close() behavior so the document is always closed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/internal/specs/2026-04-29-claude-design-site-brief.md`:
- Around line 125-136: The proof-point line "**~516 tests across 16 suites**" is
stale—replace it with the authoritative test count by reading the
source-of-truth (e.g., the test-summary/CI artifact or the repo's test-report)
and update the bullet to the current value (e.g., the new "~470 tests across 15
suites" if that's what the source shows); locate and edit the exact bullet
string "**~516 tests across 16 suites**" in the spec list and ensure the updated
count is used programmatically or documented so future briefs pull from that
source automatically.

In `@fread`:
- Around line 213-222: The current truncate_utf8_bytes implementation slices raw
bytes and decodes with "ignore", which can drop partial multibyte characters;
update truncate_utf8_bytes so Python reads sys.stdin.buffer.read(), and if
len(data) > limit find the largest prefix <= limit that successfully decodes as
UTF-8 (e.g., try decoding data[:limit] with strict error handling and on
UnicodeDecodeError decrement the end index until decode succeeds), then write
that decoded string to stdout; reference truncate_utf8_bytes, the max variable,
and the sys.stdin.buffer.read() usage to locate where to implement the
safe-prefix decoding loop.

In `@fread-media.py`:
- Around line 1034-1046: The global exception handler (the "except Exception as
exc:" block) currently always adds the full traceback to the JSON error output;
change it to include the "traceback" field only when a debug flag or env var is
set (e.g., use an argparse boolean --debug stored as args.debug or an
environment variable like FREAD_DEBUG via os.environ.get), otherwise omit or
redact the traceback value; update the handler to read that flag (e.g.,
debug_mode = args.debug or bool(os.environ.get("FREAD_DEBUG"))) and
conditionally add "traceback": _tb.format_exc() to the dict before json.dump,
leaving other fields ("type","error","code") intact and still writing the JSON
and exiting with sys.exit(1).
- Around line 332-369: The function render_page currently declares return type
-> bytes but actually returns a tuple (data, width, height); update the type
annotation to reflect the actual return type (e.g., -> Tuple[bytes, int, int])
and add the necessary typing import (from typing import Tuple) at module scope;
locate the render_page definition and adjust its signature and imports
accordingly, and run static checks to ensure callers expect the tuple form.

In `@mcp/index.js`:
- Around line 1535-1596: The MCP output currently drops fread diagnostics when
only media blocks remain because textOnlyFreadPayload returns null and
buildFreadMcpContent only returns media content; fix by preserving and copying
diagnostic/meta fields from the original parsed fread result into the final
returned object even when textPayload is null. In buildFreadMcpContent (and/or
adjust textOnlyFreadPayload) ensure the final return value includes
parsed.warnings, parsed.errors, parsed.next_hint and the parsed.files array (or
files with updated .status) or include slim/structuredContent created via
renderFreadTextResult/normalizeStructuredContent so that
warnings/errors/files[].status are propagated alongside the media content
instead of being dropped.

In `@mcp/memory-ingest.js`:
- Around line 91-98: The TOML args extraction in memory-ingest.js that uses
argsRegex and then am[1].split(',') (see cmdMatch / argsRegex / am / args)
breaks on commas inside quoted strings; replace this ad-hoc split with either a
proper TOML parser (e.g., parse the whole block with a toml library to read the
args array) or implement a quoted-string-aware tokenizer for am[1] that respects
escaped quotes and commas inside quotes, then build args from those tokens and
preserve existing trimming/unquoting logic.

In `@site/src/components/PageActions.astro`:
- Around line 78-125: The markup advertises an ARIA menu but lacks full
keyboard/menuitem behavior; remove the misleading roles instead: delete
role="menu" from the div.page-actions__menu and remove role="menuitem" from all
child buttons (e.g., the buttons with data-action="copy-md", "copy-plain",
"view-md", "open-claude", "open-chatgpt") and from the anchor
(href={ghSourceUrl}) so these remain plain buttons/links, or alternatively
implement the complete menu keyboard model if you prefer to keep ARIA; prefer
the former for a minimal fix.

In `@tests/run_all_tests.sh`:
- Line 277: Move the TOTAL_TESTS assignment into the main() function so it
executes after TOTAL_PASSED and TOTAL_FAILED are finalized; specifically, locate
the global line setting TOTAL_TESTS=$((TOTAL_PASSED + TOTAL_FAILED)) and
indent/relocate it inside the main() block (after all suites run and after
TOTAL_PASSED/TOTAL_FAILED are updated) so the calculation happens at runtime
rather than at parse time.

In `@tests/test_fread.sh`:
- Around line 1913-1915: The test harness copies only "${FREAD}" into the fake
sandbox, but the runtime fread script unconditionally sources _fsuite_common.sh
and needs that helper present; update the three copy sites that use cp
"${FREAD}" "${fake_dir}/fread" (the occurrences around the mkdir/cp blocks that
reference "${fake_dir}", "${fake_bin}", and "${payload_dir}") to also copy the
helper from the test scripts directory by adding cp
"${SCRIPT_DIR}/../_fsuite_common.sh" "${fake_dir}/_fsuite_common.sh" so the fake
sandbox contains both fread and _fsuite_common.sh (repeat the same change for
the other two similar cp occurrences).

In `@tests/test_memory_ingest.sh`:
- Around line 159-160: The current trap in main() uses "trap 'true' EXIT INT
TERM", which swallows INT/TERM; change it to preserve cancellation semantics by
only trapping EXIT (e.g., "trap 'true' EXIT") and ensure INT/TERM remain default
(either remove INT/TERM from the trap or explicitly reset them with "trap - INT
TERM"); update the trap invocation that references main() and the line "trap
'true' EXIT INT TERM" accordingly.
- Around line 31-37: The run_test helper currently swallows test failures via
`"${@}" || true`, so unexpected crashes never increment TESTS_FAILED; change
run_test (function run_test) to execute the test command directly, capture its
exit code ($?) after calling the test body, and if non-zero increment
TESTS_FAILED and report the failure (e.g., echo the exit code and test name) so
that unexpected exits are treated as failures instead of being ignored; ensure
TESTS_RUN is still incremented and existing pass()/fail() logic remains
compatible.

---

Duplicate comments:
In `@fread-media.py`:
- Around line 203-211: get_page_dims currently accesses doc[page] without
validating the page index (same issue as render_page); add a bounds check
against doc.page_count (or len(doc)) before accessing doc[page] and raise a
clear exception (e.g., IndexError or ValueError) when page is negative or >=
doc.page_count, ensuring the existing finally doc.close() still executes;
reference get_page_dims, doc.page_count (or len(doc)), and doc[page].rect when
making the change.
- Around line 191-201: The render_page method currently indexes doc[page]
without validating bounds; add a check using doc.page_count to ensure 0 <= page
< doc.page_count before accessing doc[page], and if invalid raise a clear
exception (e.g., ValueError or IndexError) with a descriptive message including
the requested page and total pages; keep the existing finally: doc.close()
behavior so the document is always closed.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6d400cf4-74f0-466c-a95d-fea2040ded84

📥 Commits

Reviewing files that changed from the base of the PR and between c745101 and 0bbc615.

⛔ Files ignored due to path filters (5)
  • AGENTS.md is excluded by !AGENTS.md
  • tests/fixtures/media/big.pdf is excluded by !**/*.pdf
  • tests/fixtures/media/sample.jpg is excluded by !**/*.jpg
  • tests/fixtures/media/sample.pdf is excluded by !**/*.pdf
  • tests/fixtures/media/sample.png is excluded by !**/*.png
📒 Files selected for processing (26)
  • .github/workflows/ci.yml
  • README.md
  • debian/rules
  • docs/internal/readme-drafts/04-cheatsheet-changelog.md
  • docs/internal/specs/2026-04-29-claude-design-site-brief.md
  • fread
  • fread-media.py
  • fsuite
  • install.sh
  • mcp/index.js
  • mcp/memory-ingest.js
  • mcp/structured-parity.test.mjs
  • site/astro.config.mjs
  • site/src/components/PageActions.astro
  • site/src/components/PageTitle.astro
  • site/src/content/docs/architecture/chains.md
  • site/src/content/docs/commands/fread.md
  • site/src/content/docs/getting-started/mental-model.md
  • site/src/content/docs/index.mdx
  • site/src/content/docs/reference/changelog.md
  • site/src/pages/[...slug].md.ts
  • site/src/styles/custom.css
  • site/src/styles/custom_backup.css
  • tests/run_all_tests.sh
  • tests/test_fread.sh
  • tests/test_memory_ingest.sh
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
docs/**

⚙️ CodeRabbit configuration file

docs/**: Check for: accurate CLI examples, no hardcoded personal paths or credentials,
and clear explanations of flags and defaults.

Files:

  • docs/internal/specs/2026-04-29-claude-design-site-brief.md
  • docs/internal/readme-drafts/04-cheatsheet-changelog.md
README.md

⚙️ CodeRabbit configuration file

README.md: Check for: accurate examples, correct filenames/paths, and no stale references.

Files:

  • README.md
🧠 Learnings (16)
📓 Common learnings
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fsuite` as the primary suite-level mental model for filesystem operations
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Prefer `-o json` for programmatic decisions in fsuite tool calls
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Prefer `-o paths` when piping fsuite tool output into another tool
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `-q` flag for existence checks and silent control flow in fsuite tools
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fbash` to enable Unix pipe chains when combining two or more producer/consumer steps in fsuite
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fmap` and `fread --symbol` before using broad `fcontent` searches
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fcontent` for exact-text confirmation only after narrowing search scope, not as the first conceptual search
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Open `fcase init` at the start of any non-trivial investigation and close with `fcase resolve`
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Check `fcase find --status all --deep` before starting new work to discover past investigations
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fread --symbol NAME` or `--lines START:END` instead of reading entire files
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fedit --lines` when line numbers are available from fread for fastest editing with zero ambiguity
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fedit --function_name` for symbol-scoped edits without requiring large unique context strings
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Never edit code blind; always inspect context with `fread` before calling `fedit`
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Run `fsuite` once at the beginning of work to establish the mental model
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Run `ftree --snapshot` once to establish territory and avoid rediscovering the repository unless the target changes
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fs` as the default search entry point and let it route to the appropriate fsuite tool
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fcase` investigation continuity ledger to track findings, evidence, and handoff state across context compaction
Learnt from: CR
Repo: lliWcWill/fsuite

Timestamp: 2026-04-29T23:56:08.317Z
Learning: Pipe fsuite tool output producers (`fsearch -o paths`, `fcontent -o paths`) into tool consumers for efficient chaining
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fsuite` for the suite-level mental model, then reach for operational tools (fs, ftree, fls, fsearch, fcontent, fmap, fread, fedit, fwrite, fbash, fprobe, fcase, freplay, fmetrics) for filesystem reconnaissance, bounded reading, and surgical editing

Applied to files:

  • debian/rules
  • tests/run_all_tests.sh
  • site/src/components/PageActions.astro
  • fread-media.py
  • site/src/content/docs/getting-started/mental-model.md
  • mcp/memory-ingest.js
  • install.sh
  • site/src/content/docs/index.mdx
  • site/src/content/docs/commands/fread.md
  • site/src/content/docs/architecture/chains.md
  • tests/test_memory_ingest.sh
  • docs/internal/readme-drafts/04-cheatsheet-changelog.md
  • site/src/content/docs/reference/changelog.md
  • fsuite
  • fread
  • tests/test_fread.sh
  • mcp/index.js
  • README.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Run `fsuite` once at the start to establish the suite-level mental model; do not re-read it

Applied to files:

  • debian/rules
  • site/src/content/docs/getting-started/mental-model.md
  • site/src/content/docs/index.mdx
  • fsuite
📚 Learning: 2026-04-29T22:12:50.298Z
Learnt from: lliWcWill
Repo: lliWcWill/fsuite PR: 39
File: site/src/pages/[...slug].md.ts:18-23
Timestamp: 2026-04-29T22:12:50.298Z
Learning: In `site/src/pages/[...slug].md.ts` (fsuite repo), the `index.mdx` entry intentionally maps to slug `index` → URL `/fsuite/index.md`. PageActions is fixed (commit a5a25ca) to target that exact URL. Do NOT suggest normalizing the index slug to an empty string, as that would produce the non-standard `/fsuite/.md` URL.

Applied to files:

  • site/astro.config.mjs
  • site/src/pages/[...slug].md.ts
  • site/src/components/PageActions.astro
  • site/src/content/docs/index.mdx
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fs` as the default search entry point and let it auto-route to the appropriate tool

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
  • site/src/content/docs/index.mdx
  • fsuite
  • mcp/index.js
  • README.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `fmap` + `fread --symbol` before broad `fcontent` searches

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
  • site/src/content/docs/index.mdx
  • fsuite
  • fread
  • mcp/index.js
  • README.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Run `ftree --snapshot` once to establish territory and map the codebase structure. Do not rediscover the repo unless the target changes

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
  • fsuite
  • README.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Never edit blindly. Always inspect context with `fread` before calling `fedit`

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
  • README.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fedit --lines` when you have specific line numbers from `fread`. This provides the fastest edit mode with zero ambiguity

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
  • fread
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fedit --function_name` for symbol-scoped edits without needing large unique context strings

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Open `fcase init` at the start of any non-trivial investigation and close with `fcase resolve`

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
  • site/src/content/docs/index.mdx
  • site/src/content/docs/architecture/chains.md
  • README.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fread --symbol NAME` or `--lines START:END` for targeted reading. Do not read whole files without bounds

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
  • site/src/content/docs/index.mdx
  • fread
  • README.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Always check `fcase find --status all --deep` before starting new work to see if past investigations have already answered the question

Applied to files:

  • site/src/content/docs/getting-started/mental-model.md
  • README.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `-o paths` when piping output into another tool

Applied to files:

  • fread
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `-o json` for programmatic decisions and parsing

Applied to files:

  • fread
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `pretty` format only for human terminal output

Applied to files:

  • fread
🪛 LanguageTool
docs/internal/specs/2026-04-29-claude-design-site-brief.md

[style] ~144-~144: Consider using an alternative to strengthen your wording.
Context: ...ware." 3. AI agent tool authors who want to learn from fsuite's design decisions...

(WANT_KEEN)


[grammar] ~164-~164: Ensure spelling is correct
Context: ...hero diagram* of the sensor metaphor (fsuite tools as drones, target repo as terrain...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~164-~164: Ensure spelling is correct
Context: ...flexes"** mapping native tool habits to fsuite equivalents. This is the page that, if ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~183-~183: Ensure spelling is correct
Context: ... the new media reading. - Diagram for fcase lifecycle — shows the init → note → r...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~191-~191: This phrasing can be overused. Try elevating your writing with a more formal alternative.
Context: ...) is correct. Reorder within sections if you want, but the top-level shape stays. - Don't...

(IF_YOU_WANT)


[grammar] ~199-~199: Ensure spelling is correct
Context: ...r GitHub Pages serving. --- ## 8. The fsuite voice (so your copy sounds right) - **Direct...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~222-~222: Ensure spelling is correct
Context: ...ns"; Episode 3 is the latest milestone. These are the real story content. - `docs/interna...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~275-~275: Consider using a different verb to strengthen your wording.
Context: ...master or merge the PR yourself If you find a problem outside your scope (e.g., a t...

(FIND_ENCOUNTER)

site/src/content/docs/index.mdx

[style] ~42-~42: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...output. They do not rank their results. They do not know what a token costs. **fsui...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

site/src/content/docs/architecture/chains.md

[grammar] ~94-~94: Ensure spelling is correct
Context: ...-branch">fprobe patch ## fcase lifecycle The investigation ledger isn...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~108-~108: Ensure spelling is correct
Context: ...close + archive ## Replay chain Rerun a traced investigation ste...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.22.1)
docs/internal/specs/2026-04-29-claude-design-site-brief.md

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 88-88: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 96-96: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 100-100: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 105-105: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 108-108: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 111-111: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 114-114: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🪛 Shellcheck (0.11.0)
tests/test_memory_ingest.sh

[info] 18-21: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 23-29: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 42-54: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 57-69: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 72-98: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 100-138: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 142-155: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

tests/test_fread.sh

[info] 1096-1116: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1119-1152: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1154-1239: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1241-1325: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1327-1352: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1354-1391: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1393-1430: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1433-1458: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1461-1473: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1476-1494: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1497-1524: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1527-1548: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1551-1579: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1582-1608: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1611-1634: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1637-1657: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1660-1726: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1729-1760: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1763-1793: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1796-1838: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1841-1857: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1860-1871: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1873-1904: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1906-1984: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 1992-2014: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 2017-2036: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 2038-2092: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 2094-2148: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 2150-2208: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 2211-2238: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 2241-2249: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)


[info] 2252-2260: This function is never invoked. Check usage (or ignored if invoked indirectly).

(SC2329)

🪛 Stylelint (17.9.0)
site/src/styles/custom.css

[error] 12-12: Expected "url('https://fonts.googleapis.com/css2?family=Geist:wght@300;400;500;600;700;800;900&family=JetBrains+Mono:wght@300;400;500;600;700&display=swap')" to be "'https://fonts.googleapis.com/css2?family=Geist:wght@300;400;500;600;700;800;900&family=JetBrains+Mono:wght@300;400;500;600;700&display=swap'" (import-notation)

(import-notation)


[error] 50-50: Expected "SFMono-Regular" to be "sfmono-regular" (value-keyword-case)

(value-keyword-case)


[error] 50-50: Expected "Menlo" to be "menlo" (value-keyword-case)

(value-keyword-case)


[error] 50-50: Expected "Consolas" to be "consolas" (value-keyword-case)

(value-keyword-case)


[error] 104-104: Expected "optimizeLegibility" to be "optimizelegibility" (value-keyword-case)

(value-keyword-case)

🔇 Additional comments (15)
fsuite (1)

23-24: LGTM!

Tool roster updated to 14, and the fread description correctly reflects the new image/PDF media engine dispatch. Help text is consistent and accurate.

Also applies to: 39-59

docs/internal/readme-drafts/04-cheatsheet-changelog.md (1)

381-427: LGTM!

Changelog accurately documents the v2.4.0 media reading features, new flags, status contract fixes, and test expansions. No hardcoded paths or credentials. Flag explanations are clear.

.github/workflows/ci.yml (1)

32-37: LGTM!

Media test dependencies installed with appropriate fallback handling. The || true on pymupdf ensures CI continues with Poppler backend if pip install fails.

site/src/styles/custom_backup.css (1)

1-38: LGTM!

Monokai-inspired theme variables and presentational tweaks. No issues.

site/astro.config.mjs (1)

17-21: LGTM!

Standard Starlight component override for custom PageTitle. Path is correct relative to astro.config.mjs.

debian/rules (1)

28-35: LGTM!

fread-media.py installed to /usr/share/fsuite/ with correct permissions. The comment clearly explains why memory-ingest.js is excluded from the Debian package (runtime Node.js dependency constraints).

site/src/content/docs/architecture/chains.md (1)

1-127: LGTM.

Documentation with visual pipeline diagrams. Static analysis spelling warnings for "fcase" and "Replay" are false positives — these are tool names.

site/src/content/docs/index.mdx (1)

1-191: LGTM.

Marketing content and visual diagrams. The "They do not..." repetition flagged by static analysis is intentional rhetorical emphasis, not a grammar error.

install.sh (1)

274-293: LGTM.

MCP helper installation logic correctly handles the node_modules dependency. Creates target directory first, then copies the JS, package.json, and node_modules together. Warning path when node_modules is missing prevents broken installs.

site/src/content/docs/getting-started/mental-model.md (1)

1-115: LGTM.

Documentation with updated reflex translation table and visual diagrams. Consistent with updated fedit --symbol terminology.

fread (2)

772-948: LGTM on media_dispatch.

Previous review issues addressed: safe array expansion (line 831), empty output returns 1 (lines 833-836), heredoc comment (lines 985-987). Error path returns 0 with structured error (lines 842-851) is intentional for correlation.


950-1014: LGTM on _media_maybe_ingest.

Detached spawn with proper cleanup. Helper resolution falls back gracefully. Temp file cleanup in both setsid and non-setsid paths.

site/src/styles/custom.css (1)

1-1175: LGTM.

Comprehensive theme overhaul. Stylelint warnings are stylistic preferences — font family casing should be preserved for cross-platform compatibility, @import url() is valid CSS syntax.

site/src/pages/[...slug].md.ts (1)

40-43: No action required. entry.body is properly available in @astrojs/starlight's docsLoaderretainBody defaults to true as of Astro 5.17.0. The fallback (?? '') is already present and correct.

			> Likely an incorrect or invalid review comment.
tests/test_fread.sh (1)

1956-1965: The command() shim reliably forces the no-setsid branch—no issue here.

The probe confirms the exported function is properly used in child bash processes: command_v_status=42 shows the function's custom exit code, not the builtin's behavior. The stderr noise (id: command not found) confirms the override is active. The test mechanism works as intended.

			> Likely an incorrect or invalid review comment.

Comment thread docs/internal/specs/2026-04-29-claude-design-site-brief.md
Comment thread fread
Comment thread fread-media.py Outdated
Comment thread fread-media.py
Comment thread mcp/index.js Outdated
Comment thread site/src/components/PageActions.astro Outdated
Comment thread tests/run_all_tests.sh Outdated
Comment thread tests/test_fread.sh
Comment thread tests/test_memory_ingest.sh
Comment thread tests/test_memory_ingest.sh Outdated
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 13 file(s) based on 11 unresolved review comments.

Files modified:

  • docs/internal/specs/2026-04-29-claude-design-site-brief.md
  • fread
  • fread-media.py
  • mcp/index.js
  • mcp/memory-ingest.js
  • mcp/package-lock.json
  • site/package-lock.json
  • site/src/components/PageActions.astro
  • site/src/content/docs/commands/fbash.md
  • site/src/content/docs/commands/fread.md
  • tests/run_all_tests.sh
  • tests/test_fread.sh
  • tests/test_memory_ingest.sh

Commit: 5547fa23e8489500559535ef210cc1d67135eba8

The changes have been pushed to the codex/fread-media-pdf-image branch.

Time taken: 6m 29s

coderabbitai Bot and others added 2 commits April 30, 2026 00:49
Fixed 13 file(s) based on 11 unresolved review comments.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
@lliWcWill lliWcWill merged commit 80e0e97 into master Apr 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant