[ANE-2795] Fix UTF-8 encoding for ficus output on Windows#1646
[ANE-2795] Fix UTF-8 encoding for ficus output on Windows#1646Conor-FOSSA wants to merge 3 commits intomasterfrom
Conversation
Use Conduit with decodeUtf8Lenient for stderr handling, matching the approach already used for stdout. This prevents crashes on Windows where the default system encoding (CP1252) cannot decode UTF-8 characters like box-drawing characters (U+2501) used in ficus progress output. The decodeUtf8Lenient function safely handles any byte sequence by replacing invalid UTF-8 with the Unicode replacement character (U+FFFD) instead of throwing an exception. Fixes: ANE-2795 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Set explicit UTF-8 encoding on debug log file handles. On Windows, file handles default to the system code page (CP1252), which cannot encode Unicode characters like box-drawing characters (U+2501) used in ficus's Rust SPANTRACE output. Without this fix, running `fossa analyze --x-vendetta --debug` on Windows crashes with: commitBuffer: invalid argument (cannot encode character '\9473') This complements the previous fix for reading ficus output by ensuring the writing side also handles UTF-8 correctly. Fixes: ANE-2795 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughThe pull request refactors stderr handling to use a Conduit-based streaming pipeline with lenient UTF-8 decoding, line splitting, optional tee to a debug file, timestamping, and line prefixing with "STDERR". It accumulates and retains only the most recent 50 lines. The change sets UTF-8 encoding on debug file handles and aligns stderr processing with the existing Conduit-based stdout pipeline. No public or exported interfaces were changed. 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/App/Fossa/Ficus/Analyze.hs (1)
358-389: CoredecodeUtf8Lenientapproach is correct; count tracking andStringintermediates can be cleaned up.The switch to
CC.sourceHandle .| CC.decodeUtf8Lenient .| CC.linesUnboundeddirectly mirrors howstreamFicusOutputhandles stdout and is the right fix for Windows CP encoding issues.Two optional cleanups worth considering:
1. Redundant count tracking. The
_counttuple element is discarded at the call-site ((acc, _count) <-), so it is never used outside the fold. The claim that trackingcountgives O(1) truncation doesn't hold:take 50on a list of ≤50+1 elements is already O(50) = O(1), matching the constant bound. The fold state can be simplified to just[Text]:♻️ Simplify accumulator to drop redundant count
- (acc, _count) <- - Conduit.runConduit $ - CC.sourceHandle handle - .| CC.decodeUtf8Lenient - .| CC.linesUnbounded - .| CC.foldM - ( \(acc, count) line -> do - -- Tee raw line to file if debug mode - traverse_ (\fileH -> hPutStrLn fileH (toString line)) maybeFile - now <- getCurrentTime - let ts = formatTime defaultTimeLocale "%H:%M:%S.%3q" now - let msg = toText $ "[" ++ ts ++ "] STDERR " <> toString line - -- Keep at most the last 50 lines of stderr - -- I came up with 50 lines by looking at a few different error traces and making - -- sure that we captured all of the relevant error output, and then going a bit higher - -- to make sure that we didn't miss anything. I'd rather capture a bit too much than not enough. - -- Use cons (:) for O(1) prepending, track count explicitly for O(1) truncation - let newAcc = - if count >= 50 - then take 50 (msg : acc) - else msg : acc - let newCount = min (count + 1) 50 - pure (newAcc, newCount) - ) - ([], 0 :: Int) - pure (reverse acc) -- Reverse at the end to get correct order + acc <- + Conduit.runConduit $ + CC.sourceHandle handle + .| CC.decodeUtf8Lenient + .| CC.linesUnbounded + .| CC.foldM + ( \acc line -> do + -- Tee raw line to file if debug mode + traverse_ (\fileH -> hPutStrLn fileH (toString line)) maybeFile + now <- getCurrentTime + -- Keep at most the last 50 lines of stderr (newest first during accumulation) + let ts = Text.pack (formatTime defaultTimeLocale "%H:%M:%S.%3q" now) + let msg = "[" <> ts <> "] STDERR " <> line + pure (take 50 (msg : acc)) + ) + [] + pure (reverse acc) -- Reverse at the end to get correct order2.
Stringintermediates inmsgconstruction. Line 376 buildsmsgviatoText $ "[" ++ ts ++ "] STDERR " <> toString line, mixing++/<>onStringbefore converting back toText. This violates the coding guideline to preferTextoverString. The fix above also addresses this by constructingmsgentirely inTextusingText.packon theformatTimeresult and<>onTextvalues.As per coding guidelines: "In Haskell, use
Textinstead ofString."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/App/Fossa/Ficus/Analyze.hs` around lines 358 - 389, The fold in consumeStderr can be simplified: drop the explicit count from the fold state and use a single [Text] accumulator (remove the (acc,count) tuple and _count binding), then prepend each new Text line and trim to 50 via take 50 (fold state: [Text]); also avoid String intermediates when building msg by packing the timestamp (formatTime result) to Text (e.g. Text.pack) and constructing msg entirely with Text concatenation/<> instead of using ++ or toString conversions—update the CC.foldM lambda (currently using (acc, count) and toText/toString) to work with Text-only values and the simplified accumulator.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/App/Fossa/Ficus/Analyze.hs`:
- Around line 358-389: The fold in consumeStderr can be simplified: drop the
explicit count from the fold state and use a single [Text] accumulator (remove
the (acc,count) tuple and _count binding), then prepend each new Text line and
trim to 50 via take 50 (fold state: [Text]); also avoid String intermediates
when building msg by packing the timestamp (formatTime result) to Text (e.g.
Text.pack) and constructing msg entirely with Text concatenation/<> instead of
using ++ or toString conversions—update the CC.foldM lambda (currently using
(acc, count) and toText/toString) to work with Text-only values and the
simplified accumulator.
- Remove redundant count tracking from stderr fold (take 50 on a bounded list is already O(1)) - Use Text throughout msg construction instead of String intermediates - Remove trailing blank line to satisfy formatter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/App/Fossa/Ficus/Analyze.hs (1)
370-377: Optional: avoidText → Stringround-trip when teeing to file.Both here and in
streamFicusOutput(line 317),hPutStrLn fileH (toString line)convertsTexttoStringonly forSystem.IO.hPutStrLnto re-encode it. UsingData.Text.IO.hPutStrLnwould writeTextdirectly through the handle's UTF-8 encoder, avoiding the intermediateStringallocation.♻️ Suggested change (applies to both line 317 and 372)
Add the import:
import Data.Text.IO qualified as Text.IOThen in both tee sites:
- traverse_ (\fileH -> hPutStrLn fileH (toString line)) maybeFile + traverse_ (\fileH -> Text.IO.hPutStrLn fileH line) maybeFileAs per coding guidelines, "In Haskell, use
Textinstead ofString".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/App/Fossa/Ficus/Analyze.hs` around lines 370 - 377, The tee currently converts Text to String via toString when writing to handles (see the hPutStrLn usage inside the lambda that updates stderr accumulators and the similar site in streamFicusOutput), causing an unnecessary Text→String→encoding round-trip; import Data.Text.IO qualified as Text.IO and replace the System.IO.hPutStrLn calls that pass (toString line) with Text.IO.hPutStrLn using the Text value (e.g. write Text.IO.hPutStrLn fileH line), keeping the rest of the logic (maybeFile, acc update, timestamping) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/App/Fossa/Ficus/Analyze.hs`:
- Around line 370-377: The tee currently converts Text to String via toString
when writing to handles (see the hPutStrLn usage inside the lambda that updates
stderr accumulators and the similar site in streamFicusOutput), causing an
unnecessary Text→String→encoding round-trip; import Data.Text.IO qualified as
Text.IO and replace the System.IO.hPutStrLn calls that pass (toString line) with
Text.IO.hPutStrLn using the Text value (e.g. write Text.IO.hPutStrLn fileH
line), keeping the rest of the logic (maybeFile, acc update, timestamping)
unchanged.
Overview
This PR fixes UTF-8 encoding issues when running
fossa analyze --x-vendetta --debugon native Windows. The CLI was crashing when ficus output contained Unicode characters (like box-drawing characters from Rust's SPANTRACE formatting).Acceptance criteria
When users run
fossa analyze --x-vendetta --debugon native Windows behind a corporate TLS proxy, the CLI should:commitBuffer: invalid argument (cannot encode character '\9473')Testing plan
fossa analyze --x-vendetta --debugagainst a project━), the CLI should handle it gracefullyfossa.ficus-stdout.logandfossa.ficus-stderr.logare created without encoding errorsNote: Full reproduction requires a Windows machine behind a corporate TLS-inspecting proxy to trigger the ficus TLS error that produces the problematic SPANTRACE output.
Risks
Low risk - this is a targeted fix that only affects:
decodeUtf8Lenient)Metrics
N/A
References
Checklist
docs/.docs/README.msand gave consideration to how discoverable or not my documentation is.Changelog.md. If this PR did not mark a release, I added my changes into an## Unreleasedsection at the top..fossa.ymlorfossa-deps.{json.yml}, I updateddocs/references/files/*.schema.jsonAND I have updated example files used byfossa initcommand. You may also need to update these if you have added/removed new dependency type (e.g.pip) or analysis target type (e.g.poetry).docs/references/subcommands/<subcommand>.md.🤖 Generated with Claude Code