A professional-grade Python automation toolkit featuring smart file organization, async data pipelines, real-time system monitoring, and automated HTML reporting.
____ _ _____ _
| _ \ __ _| |_ __ _| ___| | _____ __
| | | |/ _` | __/ _` | |_ | |/ _ \ \ /\ / /
| |_| | (_| | || (_| | _| | | (_) \ V V /
|____/ \__,_|\__\__,_|_| |_|\___/ \_/\_/
A U T O M A T O R
Automatically categorize and organize files using pluggable strategy rules:
- By Extension — maps 80+ file types to categories (code, images, documents, etc.)
- By MIME Type — fallback detection using system MIME database
- By Date — organizes into
YYYY/MM/folder structures - By File Size — buckets into small/medium/large categories
- Watch Mode — monitors a directory in real-time and auto-organizes new files
- Duplicate Detection — SHA-256 hashing prevents duplicate files
Concurrent data collection from multiple free APIs (no keys required):
- GitHub Trending — scrapes today's trending repositories
- Hacker News — fetches top stories via the Firebase API
- Weather — current conditions and 3-day forecast from wttr.in
- Quotes — inspirational quotes from ZenQuotes
All sources are fetched concurrently with asyncio.gather(), transformed through a configurable pipeline, and exported to JSON/CSV.
A full-screen Rich terminal dashboard displaying:
- CPU usage (overall + per-core) with sparkline history
- Memory & swap utilization with visual bars
- Disk partitions with usage and I/O stats
- Network throughput and connection count
- Threshold-based alerting system
Generates beautiful dark-themed HTML reports with:
- Embedded matplotlib charts (bar, pie, line)
- Per-source data tables
- Pipeline execution metrics
- Catppuccin-inspired color scheme
Built-in asyncio scheduler for recurring automation:
- Register any async task with a custom interval
- Live status dashboard showing task health
- Graceful shutdown handling
| Technology | Purpose |
|---|---|
asyncio + aiohttp |
Concurrent HTTP requests |
typer |
Modern CLI framework with rich help |
rich |
Terminal UI — tables, progress bars, live dashboards |
psutil |
Cross-platform system metrics |
watchdog |
Filesystem event monitoring |
matplotlib |
Chart generation for reports |
jinja2 |
HTML report templating |
beautifulsoup4 |
HTML parsing for web scraping |
pytest |
Testing framework |
- Strategy Pattern — pluggable file organization rules via abstract base class
- Template Method — data source interface with
fetch()/transform()contract - Observer Pattern — filesystem watcher triggers organization on file creation
- Pipeline Pattern — composable transform chain for data processing
- Factory Pattern — CLI dynamically instantiates sources and rules from flags
# Clone and install
git clone https://github.com/sfdev5904/pythonproject.git
cd pythonproject
pip install -e ".[dev]"
# Run the full demo (organizer + pipeline + report)
dataflow demo
# Or use individual commands:
dataflow --help# Organize current directory by file extension
dataflow organize ./Downloads --rules extension
# Use multiple rules with a custom target
dataflow organize ./messy-folder --target ./clean-folder --rules extension,date
# Watch mode — auto-organize new files as they appear
dataflow organize ~/Downloads --watch
# Dry run — preview without moving anything
dataflow organize ./folder --dry-run# Fetch from all sources and export to JSON/CSV
dataflow pipeline --output ./data
# Select specific sources
dataflow pipeline --sources github,hn --output ./data
# Generate an HTML report and open in browser
dataflow pipeline --sources github,hn,weather,quotes --report --open
# Custom weather city
dataflow pipeline --sources weather --city "London" --output ./data# Launch the live dashboard
dataflow monitor
# Custom refresh interval
dataflow monitor --interval 0.5
# Disable alerts
dataflow monitor --no-alerts# Generate report from existing pipeline data
dataflow report ./data --output report.html --open# Run pipeline every 5 minutes
dataflow schedule --task pipeline --every 300
# Run every 10 minutes with custom city
dataflow schedule --task pipeline --every 600 --city "Tokyo"pythonproject/
├── pyproject.toml # Modern Python packaging
├── src/dataflow/
│ ├── cli.py # Typer CLI with all subcommands
│ ├── config.py # Centralized settings (dataclass)
│ ├── organizer/ # Smart File Organizer
│ │ ├── base.py # Strategy ABC
│ │ ├── rules.py # ByExtension, ByMimeType, ByDate, BySize
│ │ ├── engine.py # Core organization + dedup engine
│ │ ├── watcher.py # Real-time directory watcher
│ │ └── report.py # Organization summary reports
│ ├── pipeline/ # Async Data Pipeline
│ │ ├── base.py # DataSource ABC
│ │ ├── sources/ # GitHub, HN, Weather, Quotes
│ │ ├── transformer.py # Data cleaning pipeline
│ │ ├── exporters.py # JSON/CSV export
│ │ └── pipeline.py # Async orchestrator
│ ├── monitor/ # System Monitor
│ │ ├── collectors.py # psutil metric dataclasses
│ │ ├── dashboard.py # Rich Live terminal dashboard
│ │ └── alerts.py # Threshold alerting
│ ├── reports/ # Report Generator
│ │ ├── charts.py # matplotlib → base64 PNG
│ │ ├── generator.py # Jinja2 HTML builder
│ │ └── templates/ # HTML templates
│ └── scheduler/ # Task Scheduler
│ └── scheduler.py # Asyncio interval scheduler
└── tests/ # pytest test suite
├── test_organizer/
├── test_pipeline/
├── test_monitor/
├── test_reports/
└── test_scheduler/
# Run all tests
pytest
# With coverage
pytest --cov=dataflow --cov-report=term-missing
# Run specific module tests
pytest tests/test_organizer/ -v- Python 3.11+
- Works on Windows, macOS, and Linux
MIT