feat(fread): image and PDF reading via fread-media engine#38
Conversation
Implements fread-media.py with probe/image/pdf subcommands. Supports PyMuPDF (primary) and Poppler (fallback) PDF backends, Pillow image backend with stdlib fallback, magic-byte detection, iterative resize to token budget, ingest_payload assembly, and structured JSON output matching the downstream spec for Phase 2 (bash) and Phase 3 (MCP). Adds test fixtures: sample.png, sample.jpg, sample.pdf (5 pages with extractable text), big.pdf (12 pages for render-refusal testing).
Critical: explicit returns after emit(); subprocess timeouts. Important: PDF encryption detection; truthful Poppler metadata; strict page-range parser; dimension-based token estimates. Spec gaps: invalid backend rejection; no-subcommand JSON error; file perms 755. Minor: dedup resize loop; Poppler stderr capture; symlink-aware paths. Reviewers: claude-opus (spec), feature-dev:code-reviewer (quality).
Adds media_dispatch() before is_binary_file skip at fread:727. Pretty mode shows metadata only; JSON mode passes through engine JSON. New flags: --render, --pages, --meta-only, --no-resize, --max-pages, --token-budget, --max-tokens, --no-ingest. Post-call memory ingest spawned detached via setsid; helper at mcp/memory-ingest.js (created in Phase 4) is best-effort.
- Pretty mode token estimates now read from engine JSON (was using byte-proxy formula, off by ~8x for images). - Error message field name was 'message' but engine emits 'error'. - New top-level "media_payload" key in JSON output: verbatim engine JSON for Phase 3 MCP integration to read without parsing chunk content strings.
Adds buildMediaContent() helper that maps fread's media_payload field to MCP 2025-11-25 content blocks. The fread MCP handler now short-circuits to image content for image/pdf-pages and to text for pdf-text/meta — bypassing cli()'s text-only renderer. Snake_case mime_type translated to camelCase mimeType per spec. Text-file path unchanged.
Adds mcp/memory-ingest.js — a Node MCP stdio client that ingests engine ingest_payload JSON via ShieldCortex remember(). Best-effort with 3s timeout, recall-by-hash dedupe, and graceful failure (exits non-zero with stderr diagnostic; never blocks the parent fread call). ShieldCortex command resolution: FSUITE_SHIELDCORTEX_CMD env → which shieldcortex-mcp → ~/.claude.json mcpServers → ~/.codex/config.toml. Phase 2 already wired the fread bash spawn; this fills the helper seam.
…se 5) Adds 16 new test functions to tests/test_fread.sh covering image pretty/JSON/meta/no-resize, PDF text/meta/render/page-cap/invalid- pages/encrypted/backend-fallback/budget-truncation paths. Adds tests/test_memory_ingest.sh for the Phase 4 helper covering empty stdin, malformed JSON, no-config, and timeout scenarios. Skips Poppler-dependent tests gracefully when pdftotext is absent.
- site/src/content/docs/commands/fread.md: media flags, output modes, token-cost table, backend selection, memory MCP integration, examples - site/src/content/docs/reference/changelog.md: Unreleased entry - debian/rules: install fread-media.py and mcp/memory-ingest.js to /usr/share/fsuite/ - install.sh: same paths + PyMuPDF detect-and-suggest hint - fread: add fallback resolver for memory-ingest.js installed paths
WalkthroughAdds first-class image/PDF handling to fread via a new Python engine ( Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant fread as "fread CLI"
participant engine as "fread-media.py"
participant ingest as "memory-ingest.js"
participant mcp as "ShieldCortex MCP"
User->>fread: run fread <file> + media flags
fread->>engine: probe file
engine-->>fread: probe result (type, backend, ingest_payload?)
alt media detected
fread->>engine: run engine subcommand (image/pdf) with flags
engine-->>fread: JSON result (media_payload, ingest_payload, errors)
fread->>User: pretty or JSON output (includes media_payload)
alt ingest enabled
fread->>ingest: spawn detached with ingest_payload
ingest->>ingest: validate payload, resolve MCP command
ingest->>mcp: recall (dedupe)
mcp-->>ingest: recall result
alt not duplicate
ingest->>mcp: remember (write memory)
mcp-->>ingest: ack
end
end
else engine missing/error
fread->>User: warn/skip and fall back to prior binary-skip behavior
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17909ec36f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 (fread:904): inline Python heredoc had `print(json.dumps(ip))` at
column 0 instead of indented under `if ip is not None:`. SyntaxError
on every media read — payload file was never populated, so detached
memory ingest silently short-circuited. Now correctly indented.
Verified end-to-end: ShieldCortex log line lands after fread call.
P2 (fread-media.py:685): stdlib fallback condition
`if estimate > args.max_tokens and not args.no_resize` was inverted
relative to the documented `--no-resize` behavior ("refuse if over
token budget"). Stdlib has no resize capability, so over-budget
images should be rejected regardless of the flag. Drop the
`and not args.no_resize` guard.
Both findings from chatgpt-codex-connector on PR #38.
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@fread`:
- Line 774: The command invocation may pass an unintended empty-string argument
when engine_args is empty; update the call that sets engine_json (the invocation
of python3 "$ENGINE_MEDIA" "$subcmd" "$path" with "${engine_args[@]}" and
capturing engine_exit) to use the safe array-expansion pattern so no spurious
empty argument is passed (use the ${engine_args[@]+"${engine_args[@]}"} pattern
in place of "${engine_args[@]}"), preserving the existing error redirection and
the engine_exit capture.
- Around line 899-905: The inline Python passed to python3 -c relies on
flush-left indentation and is fragile if surrounding Bash code is reindented;
add a clear comment next to the printf | python3 -c invocation warning that the
Python must remain left-aligned, and better yet replace the inline -c snippet
with a heredoc or a quoted heredoc (so the Python body can be safely indented)
while keeping the same behavior of reading engine_json and writing
ingest_payload to payload_file; reference the existing usage of engine_json,
payload_file and the python3 invocation (the small Python snippet that loads
stdin and prints d.get('ingest_payload')) so reviewers know exactly where to
make the change.
- Around line 776-779: The branch that detects empty engine output currently
calls add_warning with "$path" and returns 0; change that return to 1 so that an
empty engine_json triggers the binary-skip fallback and emits the explicit
"binary file skipped" behavior; do the same for the later error branch (the
other return 0 case) so both the engine_json empty check and the error path
return 1 instead of 0, referencing the engine_json variable and the add_warning
call to locate the spots to update.
In `@fread-media.py`:
- Around line 169-204: All methods that call fitz.open (page_count,
extract_text, render_page, get_page_dims) must ensure the Document is always
closed even if an exception occurs: wrap fitz.open(path) usage in a try/finally
or use a context manager so doc.close() is guaranteed in the finally block;
preserve existing encryption checks from extract_text/render_page (check
doc.is_encrypted and doc.needs_pass and raise) and add the same check to
get_page_dims; also validate page index bounds in render_page and get_page_dims
similar to extract_text to avoid index errors before accessing doc[page].
In `@mcp/memory-ingest.js`:
- Around line 154-160: The timeout branch of the Promise.race can leave doIngest
and its StdioClientTransport child running and orphaned; modify doIngest to
accept an AbortSignal (via AbortController) and ensure the caller aborts the
controller when the timeout fires, or explicitly call/await a transport shutdown
method (e.g., StdioClientTransport.kill()/close()/destroy()) before calling
process.exit; update the Promise.race usage to create an AbortController, pass
controller.signal into doIngest, call controller.abort() in the timeout handler
(and/or call client.transport.kill()) so the child process is cleanly terminated
before exiting.
- Around line 110-118: The arguments object is including source: payload.source
even when payload.source is undefined; change the construction in the call that
builds the arguments object (the block around arguments: {...}) to only include
the source key when payload.source is defined (e.g., conditionally spread in {
source: payload.source } only if payload.source !== undefined or truthy) so the
MCP tool never receives an explicit undefined source value.
- Around line 163-165: The ternary in the log call is redundant (msg ===
"timeout" ? "error" : "error"); update the log level selection so it conveys the
intended severity: replace the ternary with the correct conditional behavior
(for example, use "warn" when msg === "timeout" and "error" otherwise) or simply
use a single literal "error" if no special-case is required; modify the call to
log(...) accordingly where msg is defined and used in memory-ingest.js
(referencing the variables msg and the log(...) invocation).
In `@site/src/content/docs/commands/fread.md`:
- Around line 166-167: Add a single blank line between the preceding fenced code
block and the "## See also" heading in fread.md so the heading is separated from
the code block; locate the "## See also" heading in the file (near the end of
the fread documentation) and insert one empty line immediately above it.
In `@site/src/content/docs/reference/changelog.md`:
- Around line 18-19: Update the changelog entry to mention both flags and
clarify their difference: add `--token-budget` to the listed new flags and a
short parenthetical or sentence explaining that `--token-budget` is fread's
global budget (also forwarded to the engine) while `--max-tokens` is a
media-specific alias that applies per-media; reference the existing flag names
`--max-tokens` and `--token-budget` so readers can see both and understand which
is global vs media-scoped.
In `@tests/run_all_tests.sh`:
- Around line 268-277: The block that invokes run_test_suite for the
memory-ingest helper (inside main()) is misindented compared to other suite
blocks; align its lines starting from the echo through the TOTAL_TESTS update to
use the same 2-space indentation as other suites so the echo, if/then/else,
TOTAL_PASSED/TOTAL_FAILED increments and failed_suites+=("memory-ingest") lines
match the file's established style and keep the
run_test_suite/TOTAL_PASSED/TOTAL_FAILED/failed_suites/TOTAL_TESTS logic
unchanged.
In `@tests/test_fread.sh`:
- Around line 1502-1509: The test uses a default list for pages_returned which
masks missing-field failures; change the extraction so pr =
f.get('pages_returned') (no default) and update the ok condition to explicitly
verify pr is a list and len(pr) < 5 and that f.get('truncated') is True, failing
when pages_returned is None or not a list; reference the variables mp, f, pr and
the keys 'pages_returned' and 'truncated' to locate where to update the
assertion logic.
In `@tests/test_memory_ingest.sh`:
- Around line 31-35: The run_test function currently accepts a test name as $1
then immediately shifts it away and never logs it; update run_test (and
reference TESTS_RUN and the run_test function) to capture the test name before
shift and print or log it (e.g., echo "Running test: $TEST_NAME") so each
invocation records the human-readable test name for debugging while preserving
the increment of TESTS_RUN and executing the remaining command arguments as
before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3c762780-231b-47ea-8f5d-8d9ba4104ec5
⛔ Files ignored due to path filters (4)
tests/fixtures/media/big.pdfis excluded by!**/*.pdftests/fixtures/media/sample.jpgis excluded by!**/*.jpgtests/fixtures/media/sample.pdfis excluded by!**/*.pdftests/fixtures/media/sample.pngis excluded by!**/*.png
📒 Files selected for processing (11)
debian/rulesfreadfread-media.pyinstall.shmcp/index.jsmcp/memory-ingest.jssite/src/content/docs/commands/fread.mdsite/src/content/docs/reference/changelog.mdtests/run_all_tests.shtests/test_fread.shtests/test_memory_ingest.sh
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test-suite
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fsuite` for the suite-level mental model, then reach for operational tools (fs, ftree, fls, fsearch, fcontent, fmap, fread, fedit, fwrite, fbash, fprobe, fcase, freplay, fmetrics) for filesystem reconnaissance, bounded reading, and surgical editing
Applied to files:
debian/rulessite/src/content/docs/reference/changelog.mdinstall.sh
🪛 markdownlint-cli2 (0.22.1)
site/src/content/docs/commands/fread.md
[warning] 166-166: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🪛 Shellcheck (0.11.0)
tests/test_memory_ingest.sh
[info] 18-21: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 23-29: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 40-52: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 55-67: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 70-85: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 89-102: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
tests/test_fread.sh
[info] 1096-1116: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1119-1152: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1155-1180: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1183-1195: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1198-1216: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1219-1246: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1249-1270: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1273-1301: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1304-1330: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1333-1356: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1359-1419: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1422-1453: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1456-1472: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1475-1486: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1494-1516: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1519-1577: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
🔇 Additional comments (8)
debian/rules (1)
28-29: LGTM.Packaging correctly mirrors
install.sh: bothfread-media.pyandmemory-ingest.jsare installed to/usr/share/fsuite/with executable permissions.mcp/index.js (2)
1402-1434: LGTM — media content builder handles all payload types.The
buildMediaContentfunction correctly maps engine output to MCP content blocks. The snake_case to camelCase translation (mime_type→mimeType) is handled properly, and each payload type produces appropriate content block structures.
1608-1625: Media short-circuit correctly falls through on failure.The try-catch ensures any engine error or missing
media_payloadgracefully falls back to the standardcli()text path. This is the right pattern for optional feature enablement.install.sh (2)
274-277: LGTM.Memory-ingest.js installation follows the same pattern as other shared files, with proper error handling via
dieif the file is missing.
744-748: LGTM — conditional hint is user-friendly.The PyMuPDF hint only appears when the module isn't installed, avoiding noise for users who already have it.
tests/test_memory_ingest.sh (1)
89-102: Good timeout coverage.Using
FSUITE_SHIELDCORTEX_CMD="sleep 10"to trigger the internal 3s timeout is an effective way to test the timeout path without external dependencies.fread (1)
913-916: Detached spawn pattern is solid.Using
setsid bash -c '...'with positional arguments correctly isolates the background process. The temp file cleanup (rm -f "$1") inside the subshell ensures no orphaned files. The</dev/null >/dev/null 2>&1 &properly detaches from the parent's file descriptors.tests/test_fread.sh (1)
1090-1116: Tests look good.The media test suite comprehensively covers the new engine dispatch behavior, including error paths for encryption, invalid backends, and token budget enforcement.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 65e9185e12
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1: mcp/index.js no longer double-runs fread for non-media files;
fread-media.py wraps every fitz.open() in try/finally with
encryption checks across all PyMuPDFBackend methods;
fread media_dispatch returns 1 on empty engine output so binary
skip path can fire as fallback.
P2: mcp/memory-ingest.js TOML parser preserves the args array.
Nits: test_fread.sh strict pages_returned check; memory-ingest.js
spread-omit undefined source, kill orphan subprocess on timeout,
warn instead of error log level for timeouts; markdown blank line
in fread.md; clearer flag listing in changelog.md; consistent
indentation in run_all_tests.sh; safe empty-array expansion in
fread.
CI: poppler-utils, python3-pil, pymupdf installed for media tests.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c4cb6e47fd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 (fread-media.py:891): PDF text truncation appended the full page when remaining<=0, blowing past the configured token budget. Now drops the page entirely when no budget is left, so total tokens stay under the cap. CI encrypted-PDF test: SKIP cleanly when fitz is not importable instead of FAIL. Phase 5 spec called for skip-not-fail; was missed. Also harden ci.yml's pymupdf install via sudo --break-system-packages so system Python can find it (previous --user install put it on a path that the test's `python3 -c 'import fitz'` couldn't see).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9c9ec34b6a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…yload) P2 fread:767: --no-truncate sets TOKEN_BUDGET=0, but the engine forward guard `(( TOKEN_BUDGET > 0 ))` dropped that case to the engine default (25000), still truncating. Engine now treats token_budget=0 as unlimited (via sys.maxsize); fread forwards the value unconditionally. P2 fread:803: MEDIA_PAYLOAD_JSON was a single global; multi-file batches overwrote earlier files. Added MEDIA_PAYLOADS_JSON array, emitted as top-level "media_payloads" plural in render_json. Singular "media_payload" preserved for single-file Phase 3 MCP backward-compat.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b897c3cd37
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P2 memory-ingest.js:74: TOML section keys with quoted names like [mcp_servers."shield.cortex"] retained the quotes when matched against nameRe, so quoted server names were silently skipped. Strip surrounding quotes before name match. P2 test_memory_ingest.sh:77: hardcoded PATH dropped Node on toolcache/nvm/asdf setups, causing exit 127 instead of exercising the no-config branch. Prepend the directory of the active node binary to PATH so the helper invocation always finds it.
Previous attempt prepended dirname(node) to PATH, but on machines where node and shieldcortex-mcp share a directory (common for ~/.local/bin), the stub still resolved shieldcortex-mcp and the test exercised the wrong branch. Switch to an mktemp dir with a node symlink only — works on toolcache/nvm/asdf without leaking other binaries.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eca30301e3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
memory-ingest.js imports @modelcontextprotocol/sdk at module load
time. Previous install copied just the script flat to
\${share_dir}/memory-ingest.js, leaving the SDK unresolvable —
ERR_MODULE_NOT_FOUND on every memory ingest in production installs.
install.sh now copies the entire mcp/ tree (script + package.json +
node_modules) to \${share_dir}/mcp/ when node_modules exists. fread's
helper resolver prepends <share>/mcp/memory-ingest.js to the
candidate list so the colocated SDK loads correctly.
Falls back to flat install with a warning when node_modules is
missing, surfacing as 'helper not found' / 'helper exit code 1' in
the ingest log rather than silently breaking.
debian packaging unchanged for now — shipping node_modules in a
.deb is too heavy. Debian users who need memory ingest should run
'npm ci --prefix mcp' under the shared dir or use install.sh.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f44495184
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 fread:811: media JSON chunk now goes through check_budget_preadd so --max-bytes/--max-lines apply uniformly. Previously media files emitted full base64 regardless of byte cap. P2 fread:800: -o paths mode now skips the media formatter entirely; render_paths emits the path alone. Previously the pretty summary line contaminated machine-readable paths output. P2 debian/rules:29: dropped the standalone memory-ingest.js install line. Shipping node_modules in a .deb is too heavy; install.sh handles the full mcp/ tree for users who want detached ingest. Debian users get a clean 'helper not found' log instead of ERR_MODULE_NOT_FOUND.
check_budget_preadd checks both MAX_BYTES/MAX_LINES AND TOKEN_BUDGET. For PDF text reads with --token-budget, the engine already truncated internally, so bash gating on token budget rejected the truncated result entirely. Switched to manual MAX_BYTES/MAX_LINES check; the engine owns token budget for PDFs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4dc67dad6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…s, help) - fread: budget-blocked media now records status=budget_skipped and returns before mark_file_read/_media_maybe_ingest, so over-budget reads no longer count as successful reads or trigger memory ingest. - mcp/index.js: factor cli() catch body into formatExecError() helper so the fread MCP handler can reuse it on engine error instead of re-spawning fread, cutting double-latency and duplicate telemetry on missing/invalid inputs. - mcp/index.js: expose --meta-only, --render, --pages, --no-resize, --max-pages, --max-tokens, --no-ingest in the fread MCP inputSchema and forward them to the CLI so MCP agents can drive PDF render / image meta / ingest opt-out paths. - fread: document media flags in usage(), --install-hints (Pillow / pymupdf / poppler-utils), and --self-check (optional PIL / fitz / pdftotext probe that does not flip dependency-ok). - tests: add test_fread.sh coverage for budget_skipped status + media-aware --self-check / --install-hints, and structured-parity coverage for the new fread schema fields and the formatExecError envelope shape (isError with non-empty content[text]). PR #38, Codex P2 round 2.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d62cbc984
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d57da1d729
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b43ffadea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8270b45c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 509cc3c262
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dceb0069b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 52d25b9489
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7db03612d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1bde5dcb7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 301c3cc36e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1920b2bfa9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@coderabbitai Full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 11
♻️ Duplicate comments (2)
fread-media.py (2)
203-211:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing page index bounds check in
get_page_dims.Same issue as
render_page—doc[page]is accessed without bounds validation. An invalid index will raise an IndexError.Add bounds validation
def get_page_dims(self, path: str, page: int) -> dict: doc = fitz.open(path) try: if doc.is_encrypted and doc.needs_pass: raise RuntimeError("PDF is encrypted and requires a password") + if page < 0 or page >= doc.page_count: + raise RuntimeError(f"Page {page} out of range (0-{doc.page_count - 1})") r = doc[page].rect🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fread-media.py` around lines 203 - 211, get_page_dims currently accesses doc[page] without validating the page index (same issue as render_page); add a bounds check against doc.page_count (or len(doc)) before accessing doc[page] and raise a clear exception (e.g., IndexError or ValueError) when page is negative or >= doc.page_count, ensuring the existing finally doc.close() still executes; reference get_page_dims, doc.page_count (or len(doc)), and doc[page].rect when making the change.
191-201:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing page index bounds check in
render_page.
PyMuPDFBackend.render_pageaccessesdoc[page]at line 196 without validating thatpageis within[0, doc.page_count). If an out-of-bounds index is passed, fitz will raise an IndexError that surfaces as a generic traceback rather than a clean error message.Add bounds validation
def render_page(self, path: str, page: int, dpi: int = 100): doc = fitz.open(path) try: if doc.is_encrypted and doc.needs_pass: raise RuntimeError("PDF is encrypted and requires a password") + if page < 0 or page >= doc.page_count: + raise RuntimeError(f"Page {page} out of range (0-{doc.page_count - 1})") pix = doc[page].get_pixmap(dpi=dpi)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fread-media.py` around lines 191 - 201, The render_page method currently indexes doc[page] without validating bounds; add a check using doc.page_count to ensure 0 <= page < doc.page_count before accessing doc[page], and if invalid raise a clear exception (e.g., ValueError or IndexError) with a descriptive message including the requested page and total pages; keep the existing finally: doc.close() behavior so the document is always closed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/internal/specs/2026-04-29-claude-design-site-brief.md`:
- Around line 125-136: The proof-point line "**~516 tests across 16 suites**" is
stale—replace it with the authoritative test count by reading the
source-of-truth (e.g., the test-summary/CI artifact or the repo's test-report)
and update the bullet to the current value (e.g., the new "~470 tests across 15
suites" if that's what the source shows); locate and edit the exact bullet
string "**~516 tests across 16 suites**" in the spec list and ensure the updated
count is used programmatically or documented so future briefs pull from that
source automatically.
In `@fread`:
- Around line 213-222: The current truncate_utf8_bytes implementation slices raw
bytes and decodes with "ignore", which can drop partial multibyte characters;
update truncate_utf8_bytes so Python reads sys.stdin.buffer.read(), and if
len(data) > limit find the largest prefix <= limit that successfully decodes as
UTF-8 (e.g., try decoding data[:limit] with strict error handling and on
UnicodeDecodeError decrement the end index until decode succeeds), then write
that decoded string to stdout; reference truncate_utf8_bytes, the max variable,
and the sys.stdin.buffer.read() usage to locate where to implement the
safe-prefix decoding loop.
In `@fread-media.py`:
- Around line 1034-1046: The global exception handler (the "except Exception as
exc:" block) currently always adds the full traceback to the JSON error output;
change it to include the "traceback" field only when a debug flag or env var is
set (e.g., use an argparse boolean --debug stored as args.debug or an
environment variable like FREAD_DEBUG via os.environ.get), otherwise omit or
redact the traceback value; update the handler to read that flag (e.g.,
debug_mode = args.debug or bool(os.environ.get("FREAD_DEBUG"))) and
conditionally add "traceback": _tb.format_exc() to the dict before json.dump,
leaving other fields ("type","error","code") intact and still writing the JSON
and exiting with sys.exit(1).
- Around line 332-369: The function render_page currently declares return type
-> bytes but actually returns a tuple (data, width, height); update the type
annotation to reflect the actual return type (e.g., -> Tuple[bytes, int, int])
and add the necessary typing import (from typing import Tuple) at module scope;
locate the render_page definition and adjust its signature and imports
accordingly, and run static checks to ensure callers expect the tuple form.
In `@mcp/index.js`:
- Around line 1535-1596: The MCP output currently drops fread diagnostics when
only media blocks remain because textOnlyFreadPayload returns null and
buildFreadMcpContent only returns media content; fix by preserving and copying
diagnostic/meta fields from the original parsed fread result into the final
returned object even when textPayload is null. In buildFreadMcpContent (and/or
adjust textOnlyFreadPayload) ensure the final return value includes
parsed.warnings, parsed.errors, parsed.next_hint and the parsed.files array (or
files with updated .status) or include slim/structuredContent created via
renderFreadTextResult/normalizeStructuredContent so that
warnings/errors/files[].status are propagated alongside the media content
instead of being dropped.
In `@mcp/memory-ingest.js`:
- Around line 91-98: The TOML args extraction in memory-ingest.js that uses
argsRegex and then am[1].split(',') (see cmdMatch / argsRegex / am / args)
breaks on commas inside quoted strings; replace this ad-hoc split with either a
proper TOML parser (e.g., parse the whole block with a toml library to read the
args array) or implement a quoted-string-aware tokenizer for am[1] that respects
escaped quotes and commas inside quotes, then build args from those tokens and
preserve existing trimming/unquoting logic.
In `@site/src/components/PageActions.astro`:
- Around line 78-125: The markup advertises an ARIA menu but lacks full
keyboard/menuitem behavior; remove the misleading roles instead: delete
role="menu" from the div.page-actions__menu and remove role="menuitem" from all
child buttons (e.g., the buttons with data-action="copy-md", "copy-plain",
"view-md", "open-claude", "open-chatgpt") and from the anchor
(href={ghSourceUrl}) so these remain plain buttons/links, or alternatively
implement the complete menu keyboard model if you prefer to keep ARIA; prefer
the former for a minimal fix.
In `@tests/run_all_tests.sh`:
- Line 277: Move the TOTAL_TESTS assignment into the main() function so it
executes after TOTAL_PASSED and TOTAL_FAILED are finalized; specifically, locate
the global line setting TOTAL_TESTS=$((TOTAL_PASSED + TOTAL_FAILED)) and
indent/relocate it inside the main() block (after all suites run and after
TOTAL_PASSED/TOTAL_FAILED are updated) so the calculation happens at runtime
rather than at parse time.
In `@tests/test_fread.sh`:
- Around line 1913-1915: The test harness copies only "${FREAD}" into the fake
sandbox, but the runtime fread script unconditionally sources _fsuite_common.sh
and needs that helper present; update the three copy sites that use cp
"${FREAD}" "${fake_dir}/fread" (the occurrences around the mkdir/cp blocks that
reference "${fake_dir}", "${fake_bin}", and "${payload_dir}") to also copy the
helper from the test scripts directory by adding cp
"${SCRIPT_DIR}/../_fsuite_common.sh" "${fake_dir}/_fsuite_common.sh" so the fake
sandbox contains both fread and _fsuite_common.sh (repeat the same change for
the other two similar cp occurrences).
In `@tests/test_memory_ingest.sh`:
- Around line 159-160: The current trap in main() uses "trap 'true' EXIT INT
TERM", which swallows INT/TERM; change it to preserve cancellation semantics by
only trapping EXIT (e.g., "trap 'true' EXIT") and ensure INT/TERM remain default
(either remove INT/TERM from the trap or explicitly reset them with "trap - INT
TERM"); update the trap invocation that references main() and the line "trap
'true' EXIT INT TERM" accordingly.
- Around line 31-37: The run_test helper currently swallows test failures via
`"${@}" || true`, so unexpected crashes never increment TESTS_FAILED; change
run_test (function run_test) to execute the test command directly, capture its
exit code ($?) after calling the test body, and if non-zero increment
TESTS_FAILED and report the failure (e.g., echo the exit code and test name) so
that unexpected exits are treated as failures instead of being ignored; ensure
TESTS_RUN is still incremented and existing pass()/fail() logic remains
compatible.
---
Duplicate comments:
In `@fread-media.py`:
- Around line 203-211: get_page_dims currently accesses doc[page] without
validating the page index (same issue as render_page); add a bounds check
against doc.page_count (or len(doc)) before accessing doc[page] and raise a
clear exception (e.g., IndexError or ValueError) when page is negative or >=
doc.page_count, ensuring the existing finally doc.close() still executes;
reference get_page_dims, doc.page_count (or len(doc)), and doc[page].rect when
making the change.
- Around line 191-201: The render_page method currently indexes doc[page]
without validating bounds; add a check using doc.page_count to ensure 0 <= page
< doc.page_count before accessing doc[page], and if invalid raise a clear
exception (e.g., ValueError or IndexError) with a descriptive message including
the requested page and total pages; keep the existing finally: doc.close()
behavior so the document is always closed.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6d400cf4-74f0-466c-a95d-fea2040ded84
⛔ Files ignored due to path filters (5)
AGENTS.mdis excluded by!AGENTS.mdtests/fixtures/media/big.pdfis excluded by!**/*.pdftests/fixtures/media/sample.jpgis excluded by!**/*.jpgtests/fixtures/media/sample.pdfis excluded by!**/*.pdftests/fixtures/media/sample.pngis excluded by!**/*.png
📒 Files selected for processing (26)
.github/workflows/ci.ymlREADME.mddebian/rulesdocs/internal/readme-drafts/04-cheatsheet-changelog.mddocs/internal/specs/2026-04-29-claude-design-site-brief.mdfreadfread-media.pyfsuiteinstall.shmcp/index.jsmcp/memory-ingest.jsmcp/structured-parity.test.mjssite/astro.config.mjssite/src/components/PageActions.astrosite/src/components/PageTitle.astrosite/src/content/docs/architecture/chains.mdsite/src/content/docs/commands/fread.mdsite/src/content/docs/getting-started/mental-model.mdsite/src/content/docs/index.mdxsite/src/content/docs/reference/changelog.mdsite/src/pages/[...slug].md.tssite/src/styles/custom.csssite/src/styles/custom_backup.csstests/run_all_tests.shtests/test_fread.shtests/test_memory_ingest.sh
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
docs/**
⚙️ CodeRabbit configuration file
docs/**: Check for: accurate CLI examples, no hardcoded personal paths or credentials,
and clear explanations of flags and defaults.
Files:
docs/internal/specs/2026-04-29-claude-design-site-brief.mddocs/internal/readme-drafts/04-cheatsheet-changelog.md
README.md
⚙️ CodeRabbit configuration file
README.md: Check for: accurate examples, correct filenames/paths, and no stale references.
Files:
README.md
🧠 Learnings (16)
📓 Common learnings
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fsuite` as the primary suite-level mental model for filesystem operations
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Prefer `-o json` for programmatic decisions in fsuite tool calls
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Prefer `-o paths` when piping fsuite tool output into another tool
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `-q` flag for existence checks and silent control flow in fsuite tools
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fbash` to enable Unix pipe chains when combining two or more producer/consumer steps in fsuite
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fmap` and `fread --symbol` before using broad `fcontent` searches
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fcontent` for exact-text confirmation only after narrowing search scope, not as the first conceptual search
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Open `fcase init` at the start of any non-trivial investigation and close with `fcase resolve`
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Check `fcase find --status all --deep` before starting new work to discover past investigations
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fread --symbol NAME` or `--lines START:END` instead of reading entire files
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fedit --lines` when line numbers are available from fread for fastest editing with zero ambiguity
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fedit --function_name` for symbol-scoped edits without requiring large unique context strings
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Never edit code blind; always inspect context with `fread` before calling `fedit`
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Run `fsuite` once at the beginning of work to establish the mental model
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Run `ftree --snapshot` once to establish territory and avoid rediscovering the repository unless the target changes
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fs` as the default search entry point and let it route to the appropriate fsuite tool
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Use `fcase` investigation continuity ledger to track findings, evidence, and handoff state across context compaction
Learnt from: CR
Repo: lliWcWill/fsuite
Timestamp: 2026-04-29T23:56:08.317Z
Learning: Pipe fsuite tool output producers (`fsearch -o paths`, `fcontent -o paths`) into tool consumers for efficient chaining
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fsuite` for the suite-level mental model, then reach for operational tools (fs, ftree, fls, fsearch, fcontent, fmap, fread, fedit, fwrite, fbash, fprobe, fcase, freplay, fmetrics) for filesystem reconnaissance, bounded reading, and surgical editing
Applied to files:
debian/rulestests/run_all_tests.shsite/src/components/PageActions.astrofread-media.pysite/src/content/docs/getting-started/mental-model.mdmcp/memory-ingest.jsinstall.shsite/src/content/docs/index.mdxsite/src/content/docs/commands/fread.mdsite/src/content/docs/architecture/chains.mdtests/test_memory_ingest.shdocs/internal/readme-drafts/04-cheatsheet-changelog.mdsite/src/content/docs/reference/changelog.mdfsuitefreadtests/test_fread.shmcp/index.jsREADME.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Run `fsuite` once at the start to establish the suite-level mental model; do not re-read it
Applied to files:
debian/rulessite/src/content/docs/getting-started/mental-model.mdsite/src/content/docs/index.mdxfsuite
📚 Learning: 2026-04-29T22:12:50.298Z
Learnt from: lliWcWill
Repo: lliWcWill/fsuite PR: 39
File: site/src/pages/[...slug].md.ts:18-23
Timestamp: 2026-04-29T22:12:50.298Z
Learning: In `site/src/pages/[...slug].md.ts` (fsuite repo), the `index.mdx` entry intentionally maps to slug `index` → URL `/fsuite/index.md`. PageActions is fixed (commit a5a25ca) to target that exact URL. Do NOT suggest normalizing the index slug to an empty string, as that would produce the non-standard `/fsuite/.md` URL.
Applied to files:
site/astro.config.mjssite/src/pages/[...slug].md.tssite/src/components/PageActions.astrosite/src/content/docs/index.mdx
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fs` as the default search entry point and let it auto-route to the appropriate tool
Applied to files:
site/src/content/docs/getting-started/mental-model.mdsite/src/content/docs/index.mdxfsuitemcp/index.jsREADME.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `fmap` + `fread --symbol` before broad `fcontent` searches
Applied to files:
site/src/content/docs/getting-started/mental-model.mdsite/src/content/docs/index.mdxfsuitefreadmcp/index.jsREADME.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Run `ftree --snapshot` once to establish territory and map the codebase structure. Do not rediscover the repo unless the target changes
Applied to files:
site/src/content/docs/getting-started/mental-model.mdfsuiteREADME.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Never edit blindly. Always inspect context with `fread` before calling `fedit`
Applied to files:
site/src/content/docs/getting-started/mental-model.mdREADME.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fedit --lines` when you have specific line numbers from `fread`. This provides the fastest edit mode with zero ambiguity
Applied to files:
site/src/content/docs/getting-started/mental-model.mdfread
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fedit --function_name` for symbol-scoped edits without needing large unique context strings
Applied to files:
site/src/content/docs/getting-started/mental-model.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Open `fcase init` at the start of any non-trivial investigation and close with `fcase resolve`
Applied to files:
site/src/content/docs/getting-started/mental-model.mdsite/src/content/docs/index.mdxsite/src/content/docs/architecture/chains.mdREADME.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Use `fread --symbol NAME` or `--lines START:END` for targeted reading. Do not read whole files without bounds
Applied to files:
site/src/content/docs/getting-started/mental-model.mdsite/src/content/docs/index.mdxfreadREADME.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Always check `fcase find --status all --deep` before starting new work to see if past investigations have already answered the question
Applied to files:
site/src/content/docs/getting-started/mental-model.mdREADME.md
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `-o paths` when piping output into another tool
Applied to files:
fread
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `-o json` for programmatic decisions and parsing
Applied to files:
fread
📚 Learning: 2026-04-10T17:12:34.291Z
Learnt from: CR
Repo: lliWcWill/fsuite PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-10T17:12:34.291Z
Learning: Prefer `pretty` format only for human terminal output
Applied to files:
fread
🪛 LanguageTool
docs/internal/specs/2026-04-29-claude-design-site-brief.md
[style] ~144-~144: Consider using an alternative to strengthen your wording.
Context: ...ware." 3. AI agent tool authors who want to learn from fsuite's design decisions...
(WANT_KEEN)
[grammar] ~164-~164: Ensure spelling is correct
Context: ...hero diagram* of the sensor metaphor (fsuite tools as drones, target repo as terrain...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~164-~164: Ensure spelling is correct
Context: ...flexes"** mapping native tool habits to fsuite equivalents. This is the page that, if ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~183-~183: Ensure spelling is correct
Context: ... the new media reading. - Diagram for fcase lifecycle — shows the init → note → r...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~191-~191: This phrasing can be overused. Try elevating your writing with a more formal alternative.
Context: ...) is correct. Reorder within sections if you want, but the top-level shape stays. - Don't...
(IF_YOU_WANT)
[grammar] ~199-~199: Ensure spelling is correct
Context: ...r GitHub Pages serving. --- ## 8. The fsuite voice (so your copy sounds right) - **Direct...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~222-~222: Ensure spelling is correct
Context: ...ns"; Episode 3 is the latest milestone. These are the real story content. - `docs/interna...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~275-~275: Consider using a different verb to strengthen your wording.
Context: ...master or merge the PR yourself If you find a problem outside your scope (e.g., a t...
(FIND_ENCOUNTER)
site/src/content/docs/index.mdx
[style] ~42-~42: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...output. They do not rank their results. They do not know what a token costs. **fsui...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
site/src/content/docs/architecture/chains.md
[grammar] ~94-~94: Ensure spelling is correct
Context: ...-branch">fprobe patch ## fcase lifecycle The investigation ledger isn...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~108-~108: Ensure spelling is correct
Context: ...close + archive ## Replay chain Rerun a traced investigation ste...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.22.1)
docs/internal/specs/2026-04-29-claude-design-site-brief.md
[warning] 32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 88-88: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 96-96: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 100-100: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 105-105: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 108-108: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 111-111: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 114-114: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🪛 Shellcheck (0.11.0)
tests/test_memory_ingest.sh
[info] 18-21: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 23-29: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 42-54: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 57-69: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 72-98: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 100-138: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 142-155: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
tests/test_fread.sh
[info] 1096-1116: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1119-1152: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1154-1239: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1241-1325: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1327-1352: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1354-1391: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1393-1430: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1433-1458: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1461-1473: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1476-1494: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1497-1524: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1527-1548: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1551-1579: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1582-1608: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1611-1634: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1637-1657: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1660-1726: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1729-1760: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1763-1793: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1796-1838: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1841-1857: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1860-1871: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1873-1904: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1906-1984: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 1992-2014: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 2017-2036: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 2038-2092: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 2094-2148: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 2150-2208: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 2211-2238: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 2241-2249: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
[info] 2252-2260: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
🪛 Stylelint (17.9.0)
site/src/styles/custom.css
[error] 12-12: Expected "url('https://fonts.googleapis.com/css2?family=Geist:wght@300;400;500;600;700;800;900&family=JetBrains+Mono:wght@300;400;500;600;700&display=swap')" to be "'https://fonts.googleapis.com/css2?family=Geist:wght@300;400;500;600;700;800;900&family=JetBrains+Mono:wght@300;400;500;600;700&display=swap'" (import-notation)
(import-notation)
[error] 50-50: Expected "SFMono-Regular" to be "sfmono-regular" (value-keyword-case)
(value-keyword-case)
[error] 50-50: Expected "Menlo" to be "menlo" (value-keyword-case)
(value-keyword-case)
[error] 50-50: Expected "Consolas" to be "consolas" (value-keyword-case)
(value-keyword-case)
[error] 104-104: Expected "optimizeLegibility" to be "optimizelegibility" (value-keyword-case)
(value-keyword-case)
🔇 Additional comments (15)
fsuite (1)
23-24: LGTM!Tool roster updated to 14, and the
freaddescription correctly reflects the new image/PDF media engine dispatch. Help text is consistent and accurate.Also applies to: 39-59
docs/internal/readme-drafts/04-cheatsheet-changelog.md (1)
381-427: LGTM!Changelog accurately documents the v2.4.0 media reading features, new flags, status contract fixes, and test expansions. No hardcoded paths or credentials. Flag explanations are clear.
.github/workflows/ci.yml (1)
32-37: LGTM!Media test dependencies installed with appropriate fallback handling. The
|| trueon pymupdf ensures CI continues with Poppler backend if pip install fails.site/src/styles/custom_backup.css (1)
1-38: LGTM!Monokai-inspired theme variables and presentational tweaks. No issues.
site/astro.config.mjs (1)
17-21: LGTM!Standard Starlight component override for custom PageTitle. Path is correct relative to astro.config.mjs.
debian/rules (1)
28-35: LGTM!
fread-media.pyinstalled to/usr/share/fsuite/with correct permissions. The comment clearly explains whymemory-ingest.jsis excluded from the Debian package (runtime Node.js dependency constraints).site/src/content/docs/architecture/chains.md (1)
1-127: LGTM.Documentation with visual pipeline diagrams. Static analysis spelling warnings for "fcase" and "Replay" are false positives — these are tool names.
site/src/content/docs/index.mdx (1)
1-191: LGTM.Marketing content and visual diagrams. The "They do not..." repetition flagged by static analysis is intentional rhetorical emphasis, not a grammar error.
install.sh (1)
274-293: LGTM.MCP helper installation logic correctly handles the node_modules dependency. Creates target directory first, then copies the JS, package.json, and node_modules together. Warning path when node_modules is missing prevents broken installs.
site/src/content/docs/getting-started/mental-model.md (1)
1-115: LGTM.Documentation with updated reflex translation table and visual diagrams. Consistent with updated
fedit --symbolterminology.fread (2)
772-948: LGTM on media_dispatch.Previous review issues addressed: safe array expansion (line 831), empty output returns 1 (lines 833-836), heredoc comment (lines 985-987). Error path returns 0 with structured error (lines 842-851) is intentional for correlation.
950-1014: LGTM on _media_maybe_ingest.Detached spawn with proper cleanup. Helper resolution falls back gracefully. Temp file cleanup in both setsid and non-setsid paths.
site/src/styles/custom.css (1)
1-1175: LGTM.Comprehensive theme overhaul. Stylelint warnings are stylistic preferences — font family casing should be preserved for cross-platform compatibility,
@import url()is valid CSS syntax.site/src/pages/[...slug].md.ts (1)
40-43: No action required.entry.bodyis properly available in@astrojs/starlight'sdocsLoader—retainBodydefaults totrueas of Astro 5.17.0. The fallback (?? '') is already present and correct.> Likely an incorrect or invalid review comment.tests/test_fread.sh (1)
1956-1965: Thecommand()shim reliably forces the no-setsidbranch—no issue here.The probe confirms the exported function is properly used in child bash processes:
command_v_status=42shows the function's custom exit code, not the builtin's behavior. The stderr noise (id: command not found) confirms the override is active. The test mechanism works as intended.> Likely an incorrect or invalid review comment.
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 13 file(s) based on 11 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 13 file(s) based on 11 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
fread media reading: from binary skip to document-aware agent I/O
This PR turns
freadfrom a text-only reader that skipped binary files into a first-class document and media reader for agents. The original seam was small and blunt:freaddetected binary content and stepped aside. This branch replaces that limitation with a layered media path that can inspect images, extract PDF text, optionally render PDF pages, return MCP-native image blocks, and record best-effort document memories without making the read path fragile.It also became a full hardening episode for fsuite itself. The first version added the capability. The review rounds turned it into something sharper: byte-accurate budgets, UTF-8-safe truncation, deterministic failure modes, source/install parity, MCP schema correctness, CI-proof executable bits, and a much better docs/site surface for how agents should use the suite.
Current state
codex/fread-media-pdf-imagemasterc4da4735e61512c84e577c84edbdd6342931541bc4da4735e61512c84e577c84edbdd6342931541bbash tests/run_all_tests.shpassed 22/22 suitesExecutive summary
freadcan now read:--meta-only--renderThe feature is intentionally conservative:
Why this matters
Before this PR, agents had to leave fsuite to inspect screenshots, diagrams, PDFs, invoices, papers, or scanned-ish documents. That broke the workflow contract in three ways:
freadexperience.After this PR, media is part of the same
freadflow as source code:This keeps the doctrine intact: fsuite remains the agent's fast, structured, auditable read layer.
What shipped
1.
fread-media.py: media engineNew Python engine for image/PDF handling:
freadand MCP consumers.The engine deliberately separates document reading from shell formatting. Python owns binary/media behavior; bash owns CLI orchestration.
2.
fread: CLI dispatch and contract preservationfreadnow dispatches image/PDF files before the old binary skip path:fread-media.py.files[]so-o jsonand-o pathsconsumers can make honest decisions.New/expanded flags include:
--render--pages--meta-only--no-resize--max-pages--max-tokens--token-budget--no-ingest--no-truncatebehavior for media pathsThe final review rounds hardened the budget semantics heavily: byte caps count bytes, not characters; UTF-8 truncation preserves character boundaries; PDF defaults preserve the engine's 25k token guard; zero budget correctly means unlimited; and pretty-mode media output respects remaining global line/byte budgets across batches.
3. MCP integration
mcp/index.jsnow understandsfreadmedia output and emits MCP-native content blocks:type: "image"blocks plus text metadata.freadrequests no longer get rerun just to format errors.Important contract decision: the MCP path uses the MCP content shape (
mimeType, top-leveldata), not Anthropic Messages API's nestedsource/media_typeshape. That distinction matters because copying the wrong shape would silently make the feature unusable for MCP clients.4. Memory ingest: ShieldCortex bridge, best-effort by design
New
mcp/memory-ingest.jshelper gives media reads a path into ShieldCortex without making reads depend on memory health.Behavior:
~/.cache/fsuite/memory-ingest.log.freadoperation.FSUITE_MEMORY_INGEST=0or--no-ingest.This is the first concrete instance of the larger ShieldCortex/fsuite integration direction: fsuite reads become searchable memory events, while still behaving as reliable CLI operations when memory is unavailable.
5. Tests and fixtures
This PR adds a large media-focused test surface:
pdfinfofailures.files[].Fixtures added:
tests/fixtures/media/sample.pngtests/fixtures/media/sample.jpgtests/fixtures/media/sample.pdftests/fixtures/media/big.pdf6. Packaging and install parity
This PR wires the new media pieces into install and package flows:
install.shinstallsfread-media.py.install.shinstallsmcp/memory-ingest.jsand itsnode_moduleswhen available.The final CodeRabbit autofix accidentally stripped executable bits from six source scripts. That was caught by CI, fixed in
c4da473, and then verified by the full local suite and GitHub Actions. The lesson is simple and durable: source checkout executability is part of the CLI contract, not decorative metadata.7. Documentation and site maturity
This PR also expands the public/docs surface around the feature and around the suite's identity:
README.mddocuments image/PDF reads.AGENTS.mddocuments fbash MCP unlock and CLI pipe patterns.site/src/content/docs/commands/fread.mddocuments media flags and behavior.site/src/content/docs/commands/fbash.mdreferences the new chain behavior.site/src/content/docs/architecture/chains.mdclarifies command-chain thinking.site/src/content/docs/getting-started/mental-model.mdimproves the suite mental model.site/src/content/docs/reference/changelog.mdrecords the work.This matters because the feature is not only a backend capability. Agents and humans both need to understand the operating model: when to read text, when to render, when budgets apply, and why media reads stay structured.
Review hardening timeline
The first pass made the feature real. The review cycles made it trustworthy.
Foundation review fixes
Early review tightened the media engine before it became the public contract:
Codex review rounds
Codex feedback repeatedly found edge cases that matter for agent consumers:
freadcalls just to format an error.--token-budget 0and--no-truncatemust mean unlimited for images too.setsidmust not leak temp payloads.pdfinfoand extraction failures must surface as errors, not empty successful documents.CodeRabbit review and autofix round
CodeRabbit caught additional items in tests, docs, accessibility, and parsing:
Then CodeRabbit's own autofix introduced a source-mode regression by stripping executable bits from CLI/test/MCP entrypoints. That regression was isolated, fixed, and verified. It is now part of the story of this PR because it made the source/install parity contract explicit.
Commit ledger
4672113fread-media.pyimage/PDF engine.9c1525c0c06c81freaddispatch for image/PDF files.59ed86957d9033f374bfc3918f9217909ec65e9185c4cb6e49c9ec34b897c3c96ac97beca3030f80a0ab6f44495965f7834dc67dadd02d2029297ce4b180060e16bc3files[].status = media_erroron engine failure.da50f34b7002992346902b2b72b7cc92e22e2edb7a1d62cbcd57da1d7b43ffab8270b4509cc3cdceb00652d25b9pdfinfofailures.7db0361a1bde5d301c3cc1920b2b0bbc6155547fa2c4da473File-level map
Core runtime
fread: media dispatch, flags, output formatting, budgets, statuses, memory ingest launch.fread-media.py: image/PDF engine, backend abstraction, parsing, rendering, metadata, error envelopes.mcp/index.js: schema flags, media block conversion, mixed output merging, caps, error handling.mcp/memory-ingest.js: best-effort ShieldCortex write path.Install/package
install.sh: installs media engine/helper and reports optional media dependencies.debian/rules: includes the new media runtime pieces..github/workflows/ci.yml: supports the expanded suite/dependency needs.Tests
tests/test_fread.sh: main media/budget/error regression suite.tests/test_memory_ingest.sh: memory helper behavior.mcp/structured-parity.test.mjs: MCP schema/content parity.tests/run_all_tests.sh: final aggregate runner.tests/fixtures/media/*: deterministic media fixtures.Docs/site
README.mdAGENTS.mddocs/internal/readme-drafts/04-cheatsheet-changelog.mddocs/internal/specs/2026-04-29-claude-design-site-brief.mdsite/src/content/docs/commands/fread.mdsite/src/content/docs/commands/fbash.mdsite/src/content/docs/architecture/chains.mdsite/src/content/docs/getting-started/mental-model.mdsite/src/content/docs/reference/changelog.mdsite/src/components/PageActions.astrosite/src/pages/[...slug].md.tssite/src/styles/custom.cssImportant design decisions
PDF default is text, not render
Rendering PDF pages is expensive. Text extraction gives agents the information they usually need at a fraction of the token cost. Render mode exists for layouts, diagrams, signatures, scanned-ish pages, and visual inspection, but it is explicit:
Media budgets are part of the contract
Budget behavior is not just a UI nicety. MCP callers,
fbash, and downstream agents depend on it. Review hardening made this explicit:Memory ingest is best effort
A read tool must stay reliable even if memory is offline, misconfigured, slow, or missing dependencies. Memory ingest therefore runs detached, logs failures, supports opt-out, and does not change the success/failure of the read itself.
MCP output must be native MCP, not copied from another API
This branch avoided the easy wrong move: copying Anthropic Messages API image blocks into MCP. The MCP adapter emits MCP content blocks with the correct field names and shape.
Source checkout behavior matters as much as installed behavior
CI caught this hard: if scripts are executable in installed form but not executable in source form, the development and MCP paths break. The final commit exists solely to restore that source-level contract.
Verification
Local verification
git diff --checkpassed.bash tests/test_fread.shpassed 96/96 during final repair validation.bash tests/test_fread_symbols.shpassed 12/12 during final repair validation.bash tests/test_coderabbit_fixes.shpassed 14/14 during final repair validation.bash tests/test_memory_ingest.shpassed 5/5 after local MCP deps install.cd mcp && npm testpassed 49/49 after local MCP deps install.bash tests/run_all_tests.shpassed 22/22 suites.GitHub Actions
Latest run:
c4da4735e61512c84e577c84edbdd6342931541btest-suiteReview triage
What we learned
1. Media reading is not just base64 transport
The real feature is not "can return an image." The real feature is controlled, inspectable, budget-aware document reading that behaves consistently across CLI, JSON, MCP, and tests.
2. PDF rendering must be explicit
Default-rendering PDFs would burn context and make
freadfeel flashy but irresponsible. Text-first preserves fsuite's token-budget-first identity.3. Failure shape is part of product quality
Silent empty PDFs, swallowed Poppler errors, tracebacks in JSON, and success statuses without emitted content all produce bad agent behavior. This PR hardens those states into explicit errors/statuses.
4. Mixed batches are a first-class workflow
Agents do not always read one file at a time. A useful
freadmust handle batches with code, docs, images, and PDFs together without silently dropping part of the response.5. Memory should enrich reads, not hold them hostage
ShieldCortex ingest is valuable, but the read operation is the primary contract. If memory fails, the read still succeeds.
6. Review bots are useful, but verification owns the result
The review bots found real issues. They also introduced a real regression. The mature workflow is not blind acceptance or defensive rejection; it is verify, fix, prove, and keep the commit trail honest.
Out of scope
This PR intentionally does not add:
.ipynbreading.documentcategory.Those are good next chapters, but this PR establishes the media/document foundation first.
How to try it
Closing note
This started as "let
freadread pictures and PDFs." It ended as a broader maturity pass on the suite's agent contract: media is structured, budgets are real, errors are explicit, MCP blocks are native, installs are complete, source scripts remain executable, docs explain the workflow, and CI proves it.That is the shape of a tool growing up: not just more capability, but fewer lies at the boundaries.