This repository contains two Python utilities for email security analysis and processing:
- Phishing Email Detection System
- Email Processing Utility
A machine learning-based system that analyzes emails to detect potential phishing attempts using a GGUF model.
- Analyzes email content, subject, sender, and return path
- Classifies emails into three categories:
- Malicious (score > 0.49)
- Suspicious (score between 0.3 and 0.49)
- Benign (score < 0.3)
- Provides detailed analysis including:
- Classification result
- Confidence percentage
- Brief explanation
- Key reasons for classification
- Python 3.x
- llama-cpp-python
- BeautifulSoup4
- email (standard library)
from phishingtest_gguf_model import process_email, process_llm
# Process an email file
email_data = process_email("path/to/email.eml")
# Analyze the email
result = process_llm(email_data)A utility for processing and cleaning email files, particularly useful for preparing emails for analysis.
- Removes HTML tags from email content
- Handles multiple email encodings (UTF-8, Latin-1, CP1252, ISO-8859-1)
- Properly unfolds email headers according to RFC 5322
- Removes X-headers
- Extracts email components:
- Subject
- Body
- Sender
- Return-Path
remove_html_tags(): Cleans HTML content from email bodyunfold_headers(): Properly unfolds email headersremove_x_headers(): Removes X-header fieldsget_email_body_from_string(): Extracts email componentstruncate_text(): Truncates text while preserving word boundaries
from phishingtest_gguf_model import get_email_body_from_string
# Process raw email string
subject, body, sender, return_path = get_email_body_from_string(raw_email_string)- Clone the repository
- Install required packages:
pip install llama-cpp-python beautifulsoup4The system uses a GGUF model file named phishingmodel.gguf. Make sure to:
- Place the model file in the project directory
- Ensure the model file is compatible with llama-cpp-python
- Verify the model has been trained for phishing detection tasks
The system outputs results in JSON format:
{
"classification": "Malicious|Suspicious|Benign",
"percentage": "0.0-1.0",
"explanation": "Brief explanation",
"reasons": ["reason1", "reason2", "reason3"]
}[Add your license information here]
[Add contribution guidelines here]