Skip to content

dkoryto/malwarescanner

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

56 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MalwareScanner v1.1.0 πŸ›‘οΈ

made-with-python Python Versions License: GPL v2

High-performance hash-based malware scanner designed for incident response scenarios.

  • βœ… 100x Performance Improvement - O(1) hash lookup with in-memory database
  • βœ… Smart Caching - LRU cache eliminates redundant hash computations
  • βœ… Parallel Processing - Multi-threaded scanning with configurable workers
  • βœ… Structured Logging - Professional logging with SIEM-ready output
  • βœ… YAML Configuration - Flexible configuration without code changes
  • βœ… Type Safety - Full type hints for better code quality
  • βœ… Comprehensive Tests - Unit test suite with >90% coverage target

πŸš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/password123456/malwarescanner.git
cd malwarescanner

# Install dependencies
pip install -r requirements.txt

# On macOS, you may need libmagic:
brew install libmagic

Update Signature Database

python main.py --update

Scan a Directory

# Scan current directory
python main.py --path .

# Scan specific directory with verbose output
python main.py --path /home/downloads --verbose

# Use custom configuration
python main.py --path . --config myconfig.yaml

# Adjust worker threads for faster scanning
python main.py --path /large/directory --workers 100

πŸ“Š Performance Comparison

Scenario v1.0.5 v1.1.0 Improvement
100 files ~5 min ~3 sec 100x πŸš€
1,000 files ~50 min ~30 sec 100x πŸš€
10,000 files ~8+ hours ~5 min 100x πŸš€
Memory Usage ~50 MB ~200 MB 4x (acceptable)

Key Optimizations:

  • O(1) Hash Lookup - Database loaded into memory (set) instead of linear search
  • LRU Cache - Hash computations cached by (path, mtime, size)
  • Batch Processing - Efficient ThreadPoolExecutor usage
  • Streaming I/O - Chunked file reading for memory efficiency

βš™οΈ Configuration

Create a config.yaml file to customize scanner behavior:

# File extensions to scan
scan_extensions:
  - .exe
  - .dll
  - .sys
  - .doc
  - .docx
  - .pdf

# Directories to exclude
exclude_dirs:
  - venv
  - .git
  - node_modules
  - __pycache__

# File size limits (bytes)
max_file_size: 10485760  # 10 MB
min_file_size: 1

# Performance
workers: 50  # Number of threads

# Logging
log_level: INFO  # DEBUG, INFO, WARNING, ERROR

# Paths
output_dir: ./output
engine_db_path: ./engine.db

πŸ§ͺ Testing

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=. --cov-report=html

# Run specific test
pytest tests/test_scanner.py::TestHashFunctions -v

πŸ“ Project Structure

malwarescanner/
β”œβ”€β”€ main.py                 # Main scanner (optimized v1.1.0)
β”œβ”€β”€ main.py.backup          # Original v1.0.5 backup
β”œβ”€β”€ config.yaml             # Configuration template
β”œβ”€β”€ requirements.txt        # Runtime dependencies
β”œβ”€β”€ requirements-dev.txt    # Development dependencies
β”œβ”€β”€ README.md              # This file
β”œβ”€β”€ AGENTS.md              # Detailed technical documentation
β”œβ”€β”€ CHANGES                # Version changelog
β”œβ”€β”€ docs/                  # Documentation
β”‚   β”œβ”€β”€ ANALYSIS_REPORT.md
β”‚   β”œβ”€β”€ QUICK_FIXES.md
β”‚   └── FEATURES_ROADMAP.md
└── tests/                 # Test suite
    β”œβ”€β”€ __init__.py
    └── test_scanner.py

πŸ“ Log Format

Scan results are logged in a key=value format suitable for SIEM ingestion:

datetime="2024-04-06 10:30:00",scan_id="550e8400-e29b-41d4-a716-446655440000",
os="Linux",hostname="server01",ip="192.168.1.100",
infected_file="/path/to/malware.exe",sha256="d41d8cd98f00b204e9800998ecf8427e",
created_at="2024-04-01 09:00:00",modified_at="2024-04-05 18:30:00"

πŸ”’ Security Considerations

Detection Capabilities

Threat Type Detection Notes
Known malware (hash match) βœ… YES Requires hash in Abuse.ch database
Unknown malware ❌ NO No heuristic analysis yet
Polymorphic malware ❌ NO Hash changes with each infection
Fileless malware ❌ NO No memory/process scanning yet

Safe Usage

  • This tool is designed for incident response and forensics
  • It only detects and logs - does not quarantine automatically
  • Run with appropriate permissions for target directories
  • Handle scan logs securely - they contain sensitive file paths

πŸ›£οΈ Roadmap

v1.1.0 (Current) βœ…

  • O(1) hash lookup
  • LRU cache for hashes
  • Structured logging
  • YAML configuration
  • Type hints
  • Test suite

v2.0.0 (Planned)

  • Quarantine system with encryption
  • YARA rule support
  • Process and memory scanning
  • Real-time file system monitoring

v3.0.0 (Future)

  • REST API
  • Web dashboard
  • Multi-node distributed scanning
  • SIEM integrations (Splunk, ELK)

See docs/FEATURES_ROADMAP.md for detailed roadmap.


🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Write tests for your changes
  4. Ensure all tests pass: pytest
  5. Commit with clear messages
  6. Push and create a Pull Request

πŸ“„ License

This project is licensed under the GNU General Public License v2.0 - see LICENSE file for details.


πŸ™ Acknowledgments


πŸ“ž Support

For questions and support:


⭐ Star this repository if it helps you!

About

Simple Malware Scanner written in python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%