File Analyzer

A modular Python package for analyzing folder structures and PDF files with both GUI and CLI interfaces.

Features

Dual Interface: Use either GUI (tkinter) or CLI (argparse)
Recursive Folder Scanning: Analyze entire directory trees
File Analysis: Track file paths, names, sizes (in MB)
PDF Analysis: Extract page counts and word counts from PDF files using PyMuPDF
Comprehensive Logging:
- Summary reports with grouped counts and sizes
- Separate error log for exceptions
- Detailed CSV export of all files
Flexible Filtering: Optional file extension filter
Statistics:
- Total file counts and sizes
- Grouped by folder
- Grouped by extension
- PDF-specific totals (pages, words)

Installation

Clone or download this repository
Install dependencies:

pip install -r requirements.txt

Usage

GUI Mode

Run without arguments to launch the graphical interface:

python -m file_analyzer

Or explicitly:

python -m file_analyzer.gui

GUI Features:

Browse buttons for easy folder selection
Optional file extension filter
Real-time log output
Progress indicator
Clear and intuitive interface

CLI Mode

Run with arguments for command-line operation:

python -m file_analyzer -i /path/to/input -o /path/to/output

CLI Arguments:

-i, --input: Input folder to analyze (required)
-o, --output: Output folder for logs and reports (required)
-e, --extension: File extension filter, e.g., .pdf, .txt (optional)

Examples:

# Analyze all files
python -m file_analyzer.cli -i C:\Documents -o C:\Reports

# Analyze only PDF files
python -m file_analyzer.cli -i C:\Documents -o C:\Reports -e .pdf

# Analyze only text files
python -m file_analyzer.cli -i C:\Documents -o C:\Reports -e .txt

Output Files

All output files are saved to <output_folder>/logs/:

summary_YYYYMMDD_HHMMSS.log - Comprehensive summary report with:
- Overall statistics
- Statistics by folder
- Statistics by extension
- PDF statistics (if applicable)
errors_YYYYMMDD_HHMMSS.log - Error log with exception details
file_details_YYYYMMDD_HHMMSS.csv - Detailed CSV with all file information

Package Structure

file_analyzer/
├── __init__.py          # Package initialization
├── __main__.py          # Main entry point
├── scanner.py           # File system scanning module
├── pdf_analyzer.py      # PDF analysis module
├── logger.py            # Logging and reporting module
├── cli.py               # Command-line interface
└── gui.py               # Graphical user interface

Module Overview

scanner.py

FileScanner: Recursively scans directories and collects file information
Supports optional file extension filtering
Calculates statistics by folder and extension

pdf_analyzer.py

PDFAnalyzer: Analyzes PDF files using PyMuPDF (fitz)
Extracts page counts and word counts
Handles errors gracefully for corrupted PDFs

logger.py

AnalysisLogger: Comprehensive logging system
Creates separate logs for summaries and errors
Generates CSV reports with detailed file information

cli.py

Command-line interface using argparse
Validates inputs and coordinates analysis workflow

gui.py

Graphical interface using tkinter
Threaded analysis to prevent UI freezing
Real-time log output and progress indication

Exception Handling

The package includes comprehensive exception handling:

Invalid paths are validated before processing
File access errors are logged and don't stop the scan
PDF processing errors are captured and reported separately
All exceptions are logged to the error log with full details

Requirements

Python 3.7+
PyMuPDF >= 1.23.0
tkinter (usually included with Python)

License

This project is provided as-is for educational and practical use.

Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
file_analyzer		file_analyzer
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File Analyzer

Features

Installation

Usage

GUI Mode

CLI Mode

Output Files

Package Structure

Module Overview

scanner.py

pdf_analyzer.py

logger.py

cli.py

gui.py

Exception Handling

Requirements

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

File Analyzer

Features

Installation

Usage

GUI Mode

CLI Mode

Output Files

Package Structure

Module Overview

scanner.py

pdf_analyzer.py

logger.py

cli.py

gui.py

Exception Handling

Requirements

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages