[ANE-2795] Fix UTF-8 encoding for ficus output on Windows by Conor-FOSSA · Pull Request #1646 · fossas/fossa-cli

Conor-FOSSA · 2026-02-19T17:43:58Z

Overview

This PR fixes UTF-8 encoding issues when running fossa analyze --x-vendetta --debug on native Windows. The CLI was crashing when ficus output contained Unicode characters (like box-drawing characters from Rust's SPANTRACE formatting).

Acceptance criteria

When users run fossa analyze --x-vendetta --debug on native Windows behind a corporate TLS proxy, the CLI should:

Successfully read ficus stdout/stderr containing Unicode characters
Successfully write debug log files containing Unicode characters
No longer crash with commitBuffer: invalid argument (cannot encode character '\9473')

Testing plan

On a native Windows machine (not WSL), run fossa analyze --x-vendetta --debug against a project
If ficus encounters an error with SPANTRACE output containing box-drawing characters (U+2501 ━), the CLI should handle it gracefully
Check that fossa.ficus-stdout.log and fossa.ficus-stderr.log are created without encoding errors

Note: Full reproduction requires a Windows machine behind a corporate TLS-inspecting proxy to trigger the ficus TLS error that produces the problematic SPANTRACE output.

Risks

Low risk - this is a targeted fix that only affects:

How stderr is read from the ficus subprocess (now uses Conduit with decodeUtf8Lenient)
How debug log files are opened (now explicitly set to UTF-8 encoding)

Metrics

N/A

References

ANE-2795: Vendetta analysis fails on native Windows with encoding error

Checklist

I added tests for this PR's change (or explained in the PR description why tests don't make sense).
- This is difficult to test automatically as it requires Windows with a specific code page setting. The fix is straightforward and based on well-documented Haskell behavior.
If this PR introduced a user-visible change, I added documentation into docs/.
- No user-facing documentation needed - this is a bug fix.
If this PR added docs, I added links as appropriate to the user manual's ToC in docs/README.ms and gave consideration to how discoverable or not my documentation is.
- N/A
If this change is externally visible, I updated Changelog.md. If this PR did not mark a release, I added my changes into an ## Unreleased section at the top.
- Will add changelog entry.
If I made changes to .fossa.yml or fossa-deps.{json.yml}, I updated docs/references/files/*.schema.json AND I have updated example files used by fossa init command. You may also need to update these if you have added/removed new dependency type (e.g. pip) or analysis target type (e.g. poetry).
- N/A
If I made changes to a subcommand's options, I updated docs/references/subcommands/<subcommand>.md.
- N/A

🤖 Generated with Claude Code

Use Conduit with decodeUtf8Lenient for stderr handling, matching the approach already used for stdout. This prevents crashes on Windows where the default system encoding (CP1252) cannot decode UTF-8 characters like box-drawing characters (U+2501) used in ficus progress output. The decodeUtf8Lenient function safely handles any byte sequence by replacing invalid UTF-8 with the Unicode replacement character (U+FFFD) instead of throwing an exception. Fixes: ANE-2795 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Set explicit UTF-8 encoding on debug log file handles. On Windows, file handles default to the system code page (CP1252), which cannot encode Unicode characters like box-drawing characters (U+2501) used in ficus's Rust SPANTRACE output. Without this fix, running `fossa analyze --x-vendetta --debug` on Windows crashes with: commitBuffer: invalid argument (cannot encode character '\9473') This complements the previous fix for reading ficus output by ensuring the writing side also handles UTF-8 correctly. Fixes: ANE-2795 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2026-02-19T17:50:10Z

Walkthrough

The pull request refactors stderr handling to use a Conduit-based streaming pipeline with lenient UTF-8 decoding, line splitting, optional tee to a debug file, timestamping, and line prefixing with "STDERR". It accumulates and retains only the most recent 50 lines. The change sets UTF-8 encoding on debug file handles and aligns stderr processing with the existing Conduit-based stdout pipeline. No public or exported interfaces were changed.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing UTF-8 encoding for ficus output on Windows, which is the primary objective of this PR.
Description check	✅ Passed	The description follows the template with all major sections completed: Overview, Acceptance criteria, Testing plan, Risks, References, and a thorough Checklist with explanations for each item.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/App/Fossa/Ficus/Analyze.hs (1)

358-389: Core decodeUtf8Lenient approach is correct; count tracking and String intermediates can be cleaned up.

The switch to CC.sourceHandle .| CC.decodeUtf8Lenient .| CC.linesUnbounded directly mirrors how streamFicusOutput handles stdout and is the right fix for Windows CP encoding issues.

Two optional cleanups worth considering:

1. Redundant count tracking. The _count tuple element is discarded at the call-site ((acc, _count) <-), so it is never used outside the fold. The claim that tracking count gives O(1) truncation doesn't hold: take 50 on a list of ≤50+1 elements is already O(50) = O(1), matching the constant bound. The fold state can be simplified to just [Text]:

♻️ Simplify accumulator to drop redundant count

-      (acc, _count) <-
-        Conduit.runConduit $
-          CC.sourceHandle handle
-            .| CC.decodeUtf8Lenient
-            .| CC.linesUnbounded
-            .| CC.foldM
-              ( \(acc, count) line -> do
-                  -- Tee raw line to file if debug mode
-                  traverse_ (\fileH -> hPutStrLn fileH (toString line)) maybeFile
-                  now <- getCurrentTime
-                  let ts = formatTime defaultTimeLocale "%H:%M:%S.%3q" now
-                  let msg = toText $ "[" ++ ts ++ "] STDERR " <> toString line
-                  -- Keep at most the last 50 lines of stderr
-                  -- I came up with 50 lines by looking at a few different error traces and making
-                  -- sure that we captured all of the relevant error output, and then going a bit higher
-                  -- to make sure that we didn't miss anything. I'd rather capture a bit too much than not enough.
-                  -- Use cons (:) for O(1) prepending, track count explicitly for O(1) truncation
-                  let newAcc =
-                        if count >= 50
-                          then take 50 (msg : acc)
-                          else msg : acc
-                  let newCount = min (count + 1) 50
-                  pure (newAcc, newCount)
-              )
-              ([], 0 :: Int)
-      pure (reverse acc) -- Reverse at the end to get correct order
+      acc <-
+        Conduit.runConduit $
+          CC.sourceHandle handle
+            .| CC.decodeUtf8Lenient
+            .| CC.linesUnbounded
+            .| CC.foldM
+              ( \acc line -> do
+                  -- Tee raw line to file if debug mode
+                  traverse_ (\fileH -> hPutStrLn fileH (toString line)) maybeFile
+                  now <- getCurrentTime
+                  -- Keep at most the last 50 lines of stderr (newest first during accumulation)
+                  let ts = Text.pack (formatTime defaultTimeLocale "%H:%M:%S.%3q" now)
+                  let msg = "[" <> ts <> "] STDERR " <> line
+                  pure (take 50 (msg : acc))
+              )
+              []
+      pure (reverse acc) -- Reverse at the end to get correct order

2. String intermediates in msg construction. Line 376 builds msg via toText $ "[" ++ ts ++ "] STDERR " <> toString line, mixing ++/<> on String before converting back to Text. This violates the coding guideline to prefer Text over String. The fix above also addresses this by constructing msg entirely in Text using Text.pack on the formatTime result and <> on Text values.

As per coding guidelines: "In Haskell, use Text instead of String."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/App/Fossa/Ficus/Analyze.hs` around lines 358 - 389, The fold in
consumeStderr can be simplified: drop the explicit count from the fold state and
use a single [Text] accumulator (remove the (acc,count) tuple and _count
binding), then prepend each new Text line and trim to 50 via take 50 (fold
state: [Text]); also avoid String intermediates when building msg by packing the
timestamp (formatTime result) to Text (e.g. Text.pack) and constructing msg
entirely with Text concatenation/<> instead of using ++ or toString
conversions—update the CC.foldM lambda (currently using (acc, count) and
toText/toString) to work with Text-only values and the simplified accumulator.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/App/Fossa/Ficus/Analyze.hs`:
- Around line 358-389: The fold in consumeStderr can be simplified: drop the
explicit count from the fold state and use a single [Text] accumulator (remove
the (acc,count) tuple and _count binding), then prepend each new Text line and
trim to 50 via take 50 (fold state: [Text]); also avoid String intermediates
when building msg by packing the timestamp (formatTime result) to Text (e.g.
Text.pack) and constructing msg entirely with Text concatenation/<> instead of
using ++ or toString conversions—update the CC.foldM lambda (currently using
(acc, count) and toText/toString) to work with Text-only values and the
simplified accumulator.

- Remove redundant count tracking from stderr fold (take 50 on a bounded list is already O(1)) - Use Text throughout msg construction instead of String intermediates - Remove trailing blank line to satisfy formatter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

src/App/Fossa/Ficus/Analyze.hs (1)
370-377: Optional: avoid Text → String round-trip when teeing to file.

Both here and in streamFicusOutput (line 317), hPutStrLn fileH (toString line) converts Text to String only for System.IO.hPutStrLn to re-encode it. Using Data.Text.IO.hPutStrLn would write Text directly through the handle's UTF-8 encoder, avoiding the intermediate String allocation.
♻️ Suggested change (applies to both line 317 and 372)

Add the import:
import Data.Text.IO qualified as Text.IO
Then in both tee sites:
-  traverse_ (\fileH -> hPutStrLn fileH (toString line)) maybeFile
+  traverse_ (\fileH -> Text.IO.hPutStrLn fileH line) maybeFile
As per coding guidelines, "In Haskell, use Text instead of String".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/App/Fossa/Ficus/Analyze.hs` around lines 370 - 377, The tee currently
converts Text to String via toString when writing to handles (see the hPutStrLn
usage inside the lambda that updates stderr accumulators and the similar site in
streamFicusOutput), causing an unnecessary Text→String→encoding round-trip;
import Data.Text.IO qualified as Text.IO and replace the System.IO.hPutStrLn
calls that pass (toString line) with Text.IO.hPutStrLn using the Text value
(e.g. write Text.IO.hPutStrLn fileH line), keeping the rest of the logic
(maybeFile, acc update, timestamping) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/App/Fossa/Ficus/Analyze.hs`:
- Around line 370-377: The tee currently converts Text to String via toString
when writing to handles (see the hPutStrLn usage inside the lambda that updates
stderr accumulators and the similar site in streamFicusOutput), causing an
unnecessary Text→String→encoding round-trip; import Data.Text.IO qualified as
Text.IO and replace the System.IO.hPutStrLn calls that pass (toString line) with
Text.IO.hPutStrLn using the Text value (e.g. write Text.IO.hPutStrLn fileH
line), keeping the rest of the logic (maybeFile, acc update, timestamping)
unchanged.

Conor-FOSSA and others added 2 commits February 13, 2026 11:04

Conor-FOSSA requested a review from a team as a code owner February 19, 2026 17:43

Conor-FOSSA requested a review from GauravB159 February 19, 2026 17:43

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[ANE-2795] Fix UTF-8 encoding for ficus output on Windows#1646

[ANE-2795] Fix UTF-8 encoding for ficus output on Windows#1646
Conor-FOSSA wants to merge 3 commits intomasterfrom
conor/ANE-2795-ficus-stderr-encoding

Conor-FOSSA commented Feb 19, 2026 •

edited by atlassian bot

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

Conor-FOSSA commented Feb 19, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Acceptance criteria

Testing plan

Risks

Metrics

References

Checklist

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conor-FOSSA commented Feb 19, 2026 •

edited by atlassian bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading