Skip to content

Latest commit

 

History

History
209 lines (167 loc) · 6.79 KB

File metadata and controls

209 lines (167 loc) · 6.79 KB

DataFlow Automator

A professional-grade Python automation toolkit featuring smart file organization, async data pipelines, real-time system monitoring, and automated HTML reporting.

 ____        _        _____ _
|  _ \  __ _| |_ __ _|  ___| | _____      __
| | | |/ _` | __/ _` | |_  | |/ _ \ \ /\ / /
| |_| | (_| | || (_| |  _| | | (_) \ V  V /
|____/ \__,_|\__\__,_|_|   |_|\___/ \_/\_/
          A U T O M A T O R

Features

Smart File Organizer

Automatically categorize and organize files using pluggable strategy rules:

  • By Extension — maps 80+ file types to categories (code, images, documents, etc.)
  • By MIME Type — fallback detection using system MIME database
  • By Date — organizes into YYYY/MM/ folder structures
  • By File Size — buckets into small/medium/large categories
  • Watch Mode — monitors a directory in real-time and auto-organizes new files
  • Duplicate Detection — SHA-256 hashing prevents duplicate files

Async Data Pipeline

Concurrent data collection from multiple free APIs (no keys required):

  • GitHub Trending — scrapes today's trending repositories
  • Hacker News — fetches top stories via the Firebase API
  • Weather — current conditions and 3-day forecast from wttr.in
  • Quotes — inspirational quotes from ZenQuotes

All sources are fetched concurrently with asyncio.gather(), transformed through a configurable pipeline, and exported to JSON/CSV.

Real-Time System Monitor

A full-screen Rich terminal dashboard displaying:

  • CPU usage (overall + per-core) with sparkline history
  • Memory & swap utilization with visual bars
  • Disk partitions with usage and I/O stats
  • Network throughput and connection count
  • Threshold-based alerting system

Automated Report Generator

Generates beautiful dark-themed HTML reports with:

  • Embedded matplotlib charts (bar, pie, line)
  • Per-source data tables
  • Pipeline execution metrics
  • Catppuccin-inspired color scheme

Task Scheduler

Built-in asyncio scheduler for recurring automation:

  • Register any async task with a custom interval
  • Live status dashboard showing task health
  • Graceful shutdown handling

Tech Stack

Technology Purpose
asyncio + aiohttp Concurrent HTTP requests
typer Modern CLI framework with rich help
rich Terminal UI — tables, progress bars, live dashboards
psutil Cross-platform system metrics
watchdog Filesystem event monitoring
matplotlib Chart generation for reports
jinja2 HTML report templating
beautifulsoup4 HTML parsing for web scraping
pytest Testing framework

Design Patterns

  • Strategy Pattern — pluggable file organization rules via abstract base class
  • Template Method — data source interface with fetch() / transform() contract
  • Observer Pattern — filesystem watcher triggers organization on file creation
  • Pipeline Pattern — composable transform chain for data processing
  • Factory Pattern — CLI dynamically instantiates sources and rules from flags

Quick Start

# Clone and install
git clone https://github.com/sfdev5904/pythonproject.git
cd pythonproject
pip install -e ".[dev]"

# Run the full demo (organizer + pipeline + report)
dataflow demo

# Or use individual commands:
dataflow --help

Usage

Organize Files

# Organize current directory by file extension
dataflow organize ./Downloads --rules extension

# Use multiple rules with a custom target
dataflow organize ./messy-folder --target ./clean-folder --rules extension,date

# Watch mode — auto-organize new files as they appear
dataflow organize ~/Downloads --watch

# Dry run — preview without moving anything
dataflow organize ./folder --dry-run

Run Data Pipeline

# Fetch from all sources and export to JSON/CSV
dataflow pipeline --output ./data

# Select specific sources
dataflow pipeline --sources github,hn --output ./data

# Generate an HTML report and open in browser
dataflow pipeline --sources github,hn,weather,quotes --report --open

# Custom weather city
dataflow pipeline --sources weather --city "London" --output ./data

System Monitor

# Launch the live dashboard
dataflow monitor

# Custom refresh interval
dataflow monitor --interval 0.5

# Disable alerts
dataflow monitor --no-alerts

Generate Reports

# Generate report from existing pipeline data
dataflow report ./data --output report.html --open

Schedule Recurring Tasks

# Run pipeline every 5 minutes
dataflow schedule --task pipeline --every 300

# Run every 10 minutes with custom city
dataflow schedule --task pipeline --every 600 --city "Tokyo"

Project Structure

pythonproject/
├── pyproject.toml              # Modern Python packaging
├── src/dataflow/
│   ├── cli.py                  # Typer CLI with all subcommands
│   ├── config.py               # Centralized settings (dataclass)
│   ├── organizer/              # Smart File Organizer
│   │   ├── base.py             # Strategy ABC
│   │   ├── rules.py            # ByExtension, ByMimeType, ByDate, BySize
│   │   ├── engine.py           # Core organization + dedup engine
│   │   ├── watcher.py          # Real-time directory watcher
│   │   └── report.py           # Organization summary reports
│   ├── pipeline/               # Async Data Pipeline
│   │   ├── base.py             # DataSource ABC
│   │   ├── sources/            # GitHub, HN, Weather, Quotes
│   │   ├── transformer.py      # Data cleaning pipeline
│   │   ├── exporters.py        # JSON/CSV export
│   │   └── pipeline.py         # Async orchestrator
│   ├── monitor/                # System Monitor
│   │   ├── collectors.py       # psutil metric dataclasses
│   │   ├── dashboard.py        # Rich Live terminal dashboard
│   │   └── alerts.py           # Threshold alerting
│   ├── reports/                # Report Generator
│   │   ├── charts.py           # matplotlib → base64 PNG
│   │   ├── generator.py        # Jinja2 HTML builder
│   │   └── templates/          # HTML templates
│   └── scheduler/              # Task Scheduler
│       └── scheduler.py        # Asyncio interval scheduler
└── tests/                      # pytest test suite
    ├── test_organizer/
    ├── test_pipeline/
    ├── test_monitor/
    ├── test_reports/
    └── test_scheduler/

Running Tests

# Run all tests
pytest

# With coverage
pytest --cov=dataflow --cov-report=term-missing

# Run specific module tests
pytest tests/test_organizer/ -v

Requirements

  • Python 3.11+
  • Works on Windows, macOS, and Linux

License

MIT