Skip to content

chore: detect encoding warnings#40

Merged
eschmidt42 merged 6 commits into
mainfrom
chore/detect-encoding-warnings
Jun 8, 2026
Merged

chore: detect encoding warnings#40
eschmidt42 merged 6 commits into
mainfrom
chore/detect-encoding-warnings

Conversation

@eschmidt42

Copy link
Copy Markdown
Owner

This pull request introduces a new utility function, check_applies, to centralize and optimize the logic for determining if a file should be parsed by a specific parser. This refactor eliminates repeated code, avoids unnecessary file reads (especially for non-matching filenames), and improves test coverage for edge cases. The changes are applied across multiple DKB and GLS parsers, and comprehensive tests are added to ensure correct and efficient behavior.

Key changes include:

Parser applicability refactor and optimization

  • Added a new helper function check_applies in fintl.etl.io.files.applies, which combines filename pattern matching with content checks and avoids unnecessary file I/O for non-matching files. This reduces redundant code and prevents spurious encoding warnings.
  • Refactored all DKB and GLS parser modules (festgeld0.py, giro202307.py, giro202312.py, tagesgeld202307.py, tagesgeld202312.py, gls/helper.py) to use the new check_applies utility in their check_if_parser_applies functions, replacing repeated manual filename/content checks. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Improved test coverage

  • Added a dedicated test module test_applies.py to verify check_applies logic, ensuring it short-circuits on non-matching filenames and correctly combines filename/content checks.
  • Added new tests to each affected parser's test suite to confirm that non-CSV files (e.g. PNGs) are rejected without reading their content, preventing unnecessary file operations and errors. [1] [2] [3] [4] [5] [6]

Minor improvements

  • Updated detect_encoding to use path.open("rb") for consistency and clarity.
  • Updated documentation and example paths in README.md for clarity and platform consistency.
  • Cleaned up unused imports in parser modules.

These changes improve maintainability, efficiency, and reliability of file parsing logic across the codebase.

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.97%. Comparing base (f3f6b6f) to head (4306514).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #40   +/-   ##
=======================================
  Coverage   99.97%   99.97%           
=======================================
  Files          99      101    +2     
  Lines        6942     6984   +42     
  Branches      257      258    +1     
=======================================
+ Hits         6940     6982   +42     
  Misses          1        1           
  Partials        1        1           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes “parser applicability” checks into a new check_applies helper to avoid unnecessary file reads (and related encoding warnings) when filenames clearly don’t match, and updates multiple DKB/GLS parsers and tests to use/cover this behavior.

Changes:

  • Added fintl.etl.io.files.applies.check_applies to combine filename regex guarding with content-based validation.
  • Refactored several DKB and GLS parsers’ check_if_parser_applies implementations to use check_applies.
  • Added/extended tests to cover short-circuit behavior for non-matching (e.g. PNG) inputs, and made a small consistency tweak in detect_encoding.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/fintl/etl/io/files/applies.py New shared helper to short-circuit applicability checks before file I/O.
src/fintl/etl/io/files/detect.py Uses Path.open("rb") for encoding detection.
src/fintl/etl/providers/dkb/festgeld0.py Switches applicability logic to check_applies.
src/fintl/etl/providers/dkb/giro202307.py Switches applicability logic to check_applies.
src/fintl/etl/providers/dkb/giro202312.py Switches applicability logic to check_applies.
src/fintl/etl/providers/dkb/tagesgeld202307.py Switches applicability logic to check_applies.
src/fintl/etl/providers/dkb/tagesgeld202312.py Switches applicability logic to check_applies.
src/fintl/etl/providers/gls/helper.py Switches applicability logic to check_applies.
tests/etl/io/files/test_applies.py New unit tests for check_applies short-circuit + content checks.
tests/etl/providers/dkb/test_dkb_festgeld0.py Adds non-CSV applicability test for short-circuiting.
tests/etl/providers/dkb/test_dkb_giro202307.py Adds non-CSV applicability test for short-circuiting.
tests/etl/providers/dkb/test_dkb_giro202312.py Adds non-CSV applicability test for short-circuiting.
tests/etl/providers/dkb/test_dkb_tagesgeld202307.py Adds non-CSV applicability test for short-circuiting.
tests/etl/providers/dkb/test_dkb_tagesgeld202312.py Adds non-CSV applicability test for short-circuiting.
tests/etl/providers/gls/test_helper.py Adds non-CSV applicability test for short-circuiting.
README.md Updates example temp paths for CLI simulation instructions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/etl/providers/dkb/test_dkb_festgeld0.py Outdated
Comment thread tests/etl/providers/dkb/test_dkb_tagesgeld202312.py Outdated
Comment thread tests/etl/providers/gls/test_helper.py Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread tests/etl/providers/dkb/test_dkb_giro202307.py Outdated
Comment thread tests/etl/providers/dkb/test_dkb_giro202312.py Outdated
Comment thread tests/etl/providers/dkb/test_dkb_tagesgeld202307.py Outdated
@eschmidt42 eschmidt42 merged commit 890e5c6 into main Jun 8, 2026
3 checks passed
@eschmidt42 eschmidt42 deleted the chore/detect-encoding-warnings branch June 8, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants