A comprehensive payroll automation system that processes timesheet data from both Excel files and image files using OCR technology.
- Excel Processing: Automatically extracts employee names and hours from Excel timesheet files
- OCR Processing: Uses Tesseract OCR to extract data from timesheet images
- Automatic Installation: Self-installing package with automatic dependency management
- Cross-Platform: Works on macOS, Linux, and Windows
- Payroll Integration: Updates payroll totals files automatically
- Unified Processing: Handles mixed file types in a single workflow
- Download the latest release:
PayrollAutomation_20251020_141202.zip - Extract the ZIP file
- Run the installer:
python install.py - Place your timesheet files in the
input_filesfolder - Run the automation:
python run_payroll_automation.py
- Clone this repository
- Install Python dependencies:
pip install -r requirements.txt - Install Tesseract OCR (see installation guide below)
- Run the automation:
python unified_payroll_processor.py
The package includes an automatic installer that handles everything:
python install.pyThis will:
- Install all Python dependencies
- Install Tesseract OCR automatically
- Set up the required folder structure
- Create example files
pip install pandas openpyxl xlrd pytesseract opencv-python PillowmacOS:
brew install tesseractLinux (Ubuntu/Debian):
sudo apt-get install tesseract-ocrLinux (CentOS/RHEL):
sudo yum install tesseractWindows: Download from: https://github.com/UB-Mannheim/tesseract/wiki
PayrollTimecardAgent/
βββ PayrollAutomation_20251020_141202.zip # Complete package
βββ PayrollAutomation_20251020_141202/ # Extracted package
β βββ unified_payroll_processor.py # Main processor
β βββ enhanced_timesheet_ocr.py # OCR functionality
β βββ extract_timesheet_data.py # Excel processor
β βββ install_tesseract.py # Tesseract installer
β βββ run_payroll_automation.py # Simple launcher
β βββ install.py # Package installer
β βββ requirements.txt # Python dependencies
β βββ README.md # Documentation
β βββ USER_GUIDE.md # User instructions
β βββ input_files/ # Place timesheets here
β βββ output/ # Results saved here
β βββ examples/ # Example files
βββ 3 - Payroll Totals.xlsx # Updated payroll file
βββ Timesheet_Summary.xlsx # Summary report
βββ extracted_timesheet_data.csv # Raw extracted data
- Prepare Files: Place your timesheet files (Excel or images) in the
input_filesfolder - Run Processing: Execute
python run_payroll_automation.py - Check Results: View results in the
outputfolder
- Excel Files:
.xlsx,.xlstimesheet files - Image Files:
.png,.jpg,.jpegtimesheet images - Mixed Processing: Handles both types in a single run
consolidated_payroll_data.csv: All extracted dataprocessing_report_[timestamp].txt: Detailed processing log3 - Payroll Totals.xlsx: Updated payroll file with new sheetTimesheet_Summary.xlsx: Clean summary of employee hours
The system can extract the following data from timesheet images:
- Employee Name: Automatically detected from the timesheet
- Week Period: Date range for the timesheet
- Daily Hours: Hours worked each day
- Total Hours: Sum of all hours
- Task Breakdown: Hours by task/project (if available)
==================================================
TIMESHEET DATA EXTRACTION RESULTS
==================================================
Employee Name: Chad Baker
Week Period: 5-14 October 2025
Total Hours: 40
Extraction Date: 2025-10-20 14:11:10
Daily Hours:
Mon 6: 8 hours
Tue 7: 8 hours
Wed 8: 8 hours
Thu 9: 8 hours
Fri 10: 8 hours
Task Breakdown:
KTLO: 19 hours
Enhancement: 7 hours
Support: 14 hours
==================================================
- UnifiedPayrollProcessor: Main orchestrator for all processing
- EnhancedTimesheetOCR: Advanced OCR with automatic Tesseract installation
- TimesheetExtractor: Excel file processing engine
- TesseractManager: Cross-platform Tesseract installation and management
- pandas: Data manipulation and analysis
- openpyxl: Excel file processing
- xlrd: Legacy Excel file support
- pytesseract: Python wrapper for Tesseract OCR
- opencv-python: Image preprocessing
- Pillow: Image handling
- Tesseract not found: Run
python install.pyto install automatically - Permission errors: Ensure you have write permissions in the output folder
- Excel file errors: Make sure Excel files are not open in another application
- OCR accuracy: Ensure timesheet images are clear and well-lit
- Check the
USER_GUIDE.mdfor detailed instructions - Review the processing report in the
outputfolder - Ensure all dependencies are properly installed
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
For issues and questions:
- Create an issue in this repository
- Check the troubleshooting section above
- Review the user guide for detailed instructions
Last Updated: October 20, 2025
Version: 2.0 (Enhanced OCR Package)
Author: Rainy City Coder