chore: detect encoding warnings#40
Merged
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #40 +/- ##
=======================================
Coverage 99.97% 99.97%
=======================================
Files 99 101 +2
Lines 6942 6984 +42
Branches 257 258 +1
=======================================
+ Hits 6940 6982 +42
Misses 1 1
Partials 1 1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR centralizes “parser applicability” checks into a new check_applies helper to avoid unnecessary file reads (and related encoding warnings) when filenames clearly don’t match, and updates multiple DKB/GLS parsers and tests to use/cover this behavior.
Changes:
- Added
fintl.etl.io.files.applies.check_appliesto combine filename regex guarding with content-based validation. - Refactored several DKB and GLS parsers’
check_if_parser_appliesimplementations to usecheck_applies. - Added/extended tests to cover short-circuit behavior for non-matching (e.g. PNG) inputs, and made a small consistency tweak in
detect_encoding.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/fintl/etl/io/files/applies.py | New shared helper to short-circuit applicability checks before file I/O. |
| src/fintl/etl/io/files/detect.py | Uses Path.open("rb") for encoding detection. |
| src/fintl/etl/providers/dkb/festgeld0.py | Switches applicability logic to check_applies. |
| src/fintl/etl/providers/dkb/giro202307.py | Switches applicability logic to check_applies. |
| src/fintl/etl/providers/dkb/giro202312.py | Switches applicability logic to check_applies. |
| src/fintl/etl/providers/dkb/tagesgeld202307.py | Switches applicability logic to check_applies. |
| src/fintl/etl/providers/dkb/tagesgeld202312.py | Switches applicability logic to check_applies. |
| src/fintl/etl/providers/gls/helper.py | Switches applicability logic to check_applies. |
| tests/etl/io/files/test_applies.py | New unit tests for check_applies short-circuit + content checks. |
| tests/etl/providers/dkb/test_dkb_festgeld0.py | Adds non-CSV applicability test for short-circuiting. |
| tests/etl/providers/dkb/test_dkb_giro202307.py | Adds non-CSV applicability test for short-circuiting. |
| tests/etl/providers/dkb/test_dkb_giro202312.py | Adds non-CSV applicability test for short-circuiting. |
| tests/etl/providers/dkb/test_dkb_tagesgeld202307.py | Adds non-CSV applicability test for short-circuiting. |
| tests/etl/providers/dkb/test_dkb_tagesgeld202312.py | Adds non-CSV applicability test for short-circuiting. |
| tests/etl/providers/gls/test_helper.py | Adds non-CSV applicability test for short-circuiting. |
| README.md | Updates example temp paths for CLI simulation instructions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new utility function,
check_applies, to centralize and optimize the logic for determining if a file should be parsed by a specific parser. This refactor eliminates repeated code, avoids unnecessary file reads (especially for non-matching filenames), and improves test coverage for edge cases. The changes are applied across multiple DKB and GLS parsers, and comprehensive tests are added to ensure correct and efficient behavior.Key changes include:
Parser applicability refactor and optimization
check_appliesinfintl.etl.io.files.applies, which combines filename pattern matching with content checks and avoids unnecessary file I/O for non-matching files. This reduces redundant code and prevents spurious encoding warnings.festgeld0.py,giro202307.py,giro202312.py,tagesgeld202307.py,tagesgeld202312.py,gls/helper.py) to use the newcheck_appliesutility in theircheck_if_parser_appliesfunctions, replacing repeated manual filename/content checks. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]Improved test coverage
test_applies.pyto verifycheck_applieslogic, ensuring it short-circuits on non-matching filenames and correctly combines filename/content checks.Minor improvements
detect_encodingto usepath.open("rb")for consistency and clarity.README.mdfor clarity and platform consistency.These changes improve maintainability, efficiency, and reliability of file parsing logic across the codebase.