Bug Description
The main block opens the .eml file with a hardcoded encoding="utf-8". Many real-world emails use Latin-1 (ISO-8859-1), Windows-1252, or other encodings. Opening such files raises an unhandled UnicodeDecodeError, making the tool unusable on a significant portion of real emails.
Expected Behavior or Results
The tool should gracefully handle emails with non-UTF-8 encodings, either by auto-detecting the encoding or by falling back to a permissive mode (errors="replace").
Reproduce Steps
- Obtain an
.eml file containing Latin-1 or Windows-1252 encoded characters (e.g. accented characters in headers)
- Run
python3 email-analyzer.py -f sample.eml
- Observe
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xXX
Desktop (please complete the following information):
- OS with Version: Any
- Python Version: Python 3.10+
- EmailAnalyzer Project Version: v2.0
Additional context
Affected code — email-analyzer.py line 396:
with open(filename, "r", encoding="utf-8") as file:
Bug Description
The main block opens the
.emlfile with a hardcodedencoding="utf-8". Many real-world emails use Latin-1 (ISO-8859-1), Windows-1252, or other encodings. Opening such files raises an unhandledUnicodeDecodeError, making the tool unusable on a significant portion of real emails.Expected Behavior or Results
The tool should gracefully handle emails with non-UTF-8 encodings, either by auto-detecting the encoding or by falling back to a permissive mode (
errors="replace").Reproduce Steps
.emlfile containing Latin-1 or Windows-1252 encoded characters (e.g. accented characters in headers)python3 email-analyzer.py -f sample.emlUnicodeDecodeError: 'utf-8' codec can't decode byte 0xXXDesktop (please complete the following information):
Additional context
Affected code —
email-analyzer.pyline 396: