Skip to content

Hardcoded UTF-8 Crashes on Non-UTF-8 Emails #30

@keraattin

Description

@keraattin

Bug Description
The main block opens the .eml file with a hardcoded encoding="utf-8". Many real-world emails use Latin-1 (ISO-8859-1), Windows-1252, or other encodings. Opening such files raises an unhandled UnicodeDecodeError, making the tool unusable on a significant portion of real emails.

Expected Behavior or Results
The tool should gracefully handle emails with non-UTF-8 encodings, either by auto-detecting the encoding or by falling back to a permissive mode (errors="replace").

Reproduce Steps

  1. Obtain an .eml file containing Latin-1 or Windows-1252 encoded characters (e.g. accented characters in headers)
  2. Run python3 email-analyzer.py -f sample.eml
  3. Observe UnicodeDecodeError: 'utf-8' codec can't decode byte 0xXX

Desktop (please complete the following information):

  • OS with Version: Any
  • Python Version: Python 3.10+
  • EmailAnalyzer Project Version: v2.0

Additional context
Affected code — email-analyzer.py line 396:

with open(filename, "r", encoding="utf-8") as file:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions