Transform any document into searchable, editable text with enterprise-grade OCR technology
Designed and Built by Beau Lewis
Enterprise-Grade OCR β’ Multi-Language β’ AI-Powered β’ Cross-Platform β’ Professional GUI
A powerful, enterprise-ready OCR (Optical Character Recognition) document converter with advanced image processing, multi-language support, and intelligent text extraction. Features Tesseract and EasyOCR engines, batch processing, and professional deployment options.
π Quick Start β’ β¨ Features β’ π Formats β’ π οΈ Installation β’ βοΈ Configuration β’ π Usage β’ π Project Structure β’ π€ Contributing
OCR Document Converter is a professional-grade, enterprise-ready OCR application that extracts text from images and documents using advanced AI-powered engines. Built with dual OCR backends (Tesseract & EasyOCR), intelligent preprocessing, and multi-language support for maximum accuracy.
- π Dual OCR Engines: Tesseract 5.0+ and EasyOCR for maximum accuracy
- π Multi-Language: Support for 80+ languages with automatic detection
- π Lightning Fast: Multi-threaded processing with intelligent caching
- π― Universal Format Support: JPG, PNG, TIFF, BMP, GIF, WebP, PDF
- π₯οΈ Cross-Platform: Native integration on Windows, macOS, and Linux
- π¨ Modern GUI: Professional interface with drag-and-drop support
- π Batch Processing: Handle multiple files simultaneously
- β‘ Smart Preprocessing: Automatic image enhancement and optimization
- πΎ Intelligent Caching: 24-hour file caching system for efficiency
- π§ Zero External APIs: Works completely offline
-
Clone this repository:
git clone https://github.com/Beaulewis1977/quick_ocr_doc_converter.git cd quick_ocr_doc_converter -
Run the automated setup:
python setup_ocr_environment.py
-
Launch the application:
python universal_document_converter_ocr.py
Or use one of the launchers:
- Windows: Double-click
run_ocr_converter.batorβ‘ Quick Launch OCR.bat - Cross-platform:
python launch_ocr.py - CLI:
python cli.py input.pdf -o output.txt -t txt --ocr
- Windows: Double-click
-
Install Python dependencies:
pip install -r requirements.txt
-
Install Tesseract OCR:
- Windows: Download from GitHub Releases
- macOS:
brew install tesseract - Linux:
sudo apt-get install tesseract-ocr
-
Install additional language packs (optional):
# Example for German and French sudo apt-get install tesseract-ocr-deu tesseract-ocr-fra
- Tesseract 5.0+: Industry-standard OCR with 100+ language support
- EasyOCR: AI-powered neural network OCR for enhanced accuracy
- Automatic Engine Selection: Chooses best engine based on image characteristics
- Fallback System: Switches engines automatically if one fails
- 80+ Languages: Including English, Spanish, French, German, Chinese, Japanese, Arabic, Russian
- Automatic Language Detection: Smart detection of document language
- Mixed Language Documents: Handles documents with multiple languages
- Custom Language Models: Support for specialized OCR models
- Smart Preprocessing: Automatic noise reduction, contrast enhancement
- Format Detection: Intelligent handling of different image formats
- Resolution Optimization: Automatic DPI adjustment for best OCR results
- Rotation Correction: Automatic text orientation detection and correction
- Skew Correction: Fixes tilted or skewed documents
- Multi-Threading: Parallel processing for batch operations
- Intelligent Caching: 24-hour file caching system
- Memory Optimization: Efficient handling of large files
- Progress Tracking: Real-time progress indicators
- Background Processing: Non-blocking operations
- Professional GUI: Modern, intuitive interface with tabbed design
- Drag & Drop: Easy file handling
- Batch Processing: Multiple file selection and processing
- Input Format Selection: NEW - Choose input format explicitly for better processing
- OCR Engine Selection: Real-time switching between Tesseract, EasyOCR, and Google Vision API
- Settings Panel: Comprehensive configuration options with 4 dedicated tabs
- Preview Mode: View processed results before saving
- Export Options: Multiple output formats and destinations
- π§ Legacy Integration Tab: Complete VB6/VFP9 integration with:
- Code generation for Visual Basic 6 and Visual FoxPro 9
- One-click DLL/executable builder with real-time logs
- Integration testing and validation tools
- Examples folder access and comprehensive setup validation
| Format | Extension | Description | OCR Quality |
|---|---|---|---|
| JPEG | .jpg, .jpeg |
Standard photo format | ββββ |
| PNG | .png |
Lossless image format | βββββ |
| TIFF | .tiff, .tif |
High-quality document format | βββββ |
| BMP | .bmp |
Windows bitmap format | ββββ |
| GIF | .gif |
Animated/static images | βββ |
| WebP | .webp |
Modern web format | ββββ |
.pdf |
Document format (image-based) | βββββ |
- Plain Text (
.txt) - Clean, formatted text - Rich Text (
.rtf) - Formatted text with styling - Microsoft Word (
.docx) - Professional documents - PDF (
.pdf) - Searchable PDF with OCR layer - Markdown (
.md) - GitHub-flavored markdown format - HTML (
.html) - Web-ready formatted documents - JSON (
.json) - Structured data with metadata - CSV (
.csv) - Tabular data extraction - EPUB (
.epub) - E-book format
# tesseract_config.json
{
"engine": "tesseract",
"language": "eng+fra+deu", # Multiple languages
"oem": 3, # OCR Engine Mode (0-3)
"psm": 6, # Page Segmentation Mode (0-13)
"dpi": 300, # Target DPI for processing
"preprocessing": {
"denoise": true,
"contrast_enhance": true,
"rotation_correction": true
}
}# easyocr_config.json
{
"engine": "easyocr",
"languages": ["en", "fr", "de"],
"gpu": false, # Use GPU acceleration
"batch_size": 1,
"workers": 0, # Number of worker threads
"confidence_threshold": 0.5
}# google_vision_config.json
{
"engine": "google_vision",
"enabled": true,
"service_account_key": "path/to/service-account.json",
"confidence_threshold": 0.8,
"features": ["TEXT_DETECTION", "DOCUMENT_TEXT_DETECTION"],
"language_hints": ["en", "fr", "de"],
"fallback_enabled": true, # NEW: Auto-fallback to free OCR
"fallback_engines": ["tesseract", "easyocr"], # Fallback order
"encryption": {
"enabled": true,
"encrypt_api_keys": true
}
}π Intelligent Fallback System:
- Automatically falls back to Tesseract/EasyOCR if Google Vision API fails
- Real-time status updates in GUI showing current OCR engine
- No service interruption - seamless transition between engines
- Preserves OCR quality with cost optimization
π Google Vision API Setup:
-
Create Google Cloud Project:
- Go to Google Cloud Console
- Create new project or select existing one
-
Enable Vision API:
- Navigate to APIs & Services > Library
- Search for "Cloud Vision API" and enable it
-
Create Service Account:
- Go to IAM & Admin > Service Accounts
- Click "Create Service Account"
- Give it a name (e.g., "ocr-converter")
- Grant "Vision API User" role
-
Download API Key:
- Click on your service account
- Go to "Keys" tab β "Add Key" β "Create New Key"
- Choose JSON format and download
-
Configure in Application:
- Open application β OCR Settings β Google Vision API tab
- Upload your JSON key file or paste the content
- Test connection to verify setup
π‘ Cost Information:
- First 1,000 requests per month: FREE
- Additional requests: $1.50 per 1,000 requests
- See Google Vision Pricing for details
# gui_settings.json
{
"theme": "modern", # UI theme
"auto_preview": true, # Show preview automatically
"batch_size": 10, # Max files per batch
"output_directory": "./output",
"cache_duration": 24, # Hours to keep cache
"language_detection": true,
"progress_notifications": true
}# processing_config.json
{
"max_threads": 4, # Parallel processing threads
"memory_limit": "2GB", # Maximum memory usage
"timeout": 300, # Processing timeout (seconds)
"retry_attempts": 3, # Retry failed operations
"temp_directory": "./temp",
"log_level": "INFO" # DEBUG, INFO, WARNING, ERROR
}# Install additional Tesseract language packs
sudo apt-get install tesseract-ocr-[LANG]
# Common language codes:
# eng (English), fra (French), deu (German), spa (Spanish)
# chi_sim (Chinese Simplified), jpn (Japanese), ara (Arabic)
# rus (Russian), kor (Korean), hin (Hindi), por (Portuguese)# language_config.json
{
"auto_detect": true,
"fallback_language": "eng",
"confidence_threshold": 0.8,
"supported_languages": [
"eng", "fra", "deu", "spa", "ita", "por",
"rus", "chi_sim", "jpn", "kor", "ara", "hin"
]
}-
Launch the application:
python universal_document_converter_ocr.py
-
Basic OCR Process:
- Drag and drop files into the application window
- Select OCR engine (Tesseract/EasyOCR/Auto)
- Choose output format and destination
- Click "Start OCR" to begin processing
-
Batch Processing:
- Select multiple files using Ctrl+Click
- Configure batch settings in the Settings panel
- Monitor progress in real-time
- Review results in the output directory
-
π§ Legacy Integration Tab (New in v3.1.0):
- VB6/VFP9 Code Generation: Select project type and generate integration code
- One-Click DLL Builder: Build executable/DLL with real-time build logs
- Integration Testing: Test conversion functionality and validate setup
- Examples Access: Open examples folder with VB6/VFP9 template files
The OCR Document Converter includes a powerful CLI for automation and integration.
# Single file OCR
python cli.py document.jpg -o result.txt -t txt --ocr
# Convert without OCR
python cli.py document.pdf -o document.md -t md
# Batch processing
python cli.py *.jpg -o converted/ -t txt --ocr
# Specify OCR language
python cli.py scan.png -o text.txt --ocr --language fra# For VFP9/VB6 users - simple command line execution
python cli.py input.md -o output.rtf -t rtf --quiet# Full command with all options
python ocr_engine/ocr_engine.py \
--input document.pdf \
--output result.docx \
--engine easyocr \
--language en,fr,de \
--confidence 0.7 \
--preprocessing \
--format docx \
--dpi 300| Argument | Description | Example |
|---|---|---|
--input |
Input file/pattern | document.jpg, "*.png" |
--output |
Output file | result.txt |
--output-dir |
Output directory | ./results/ |
--engine |
OCR engine | tesseract, easyocr, auto |
--language |
Language codes | eng, eng+fra, en,fr,de |
--confidence |
Confidence threshold | 0.5 to 1.0 |
--format |
Output format | txt, docx, pdf, json |
--dpi |
Target DPI | 150, 300, 600 |
--preprocessing |
Enable preprocessing | Flag (no value) |
--batch-size |
Batch processing size | 5, 10, 20 |
--threads |
Number of threads | 1, 4, 8 |
from ocr_engine import OCREngine
# Initialize OCR engine
ocr = OCREngine(engine='tesseract', language='eng')
# Process single file
result = ocr.extract_text('document.jpg')
print(result.text)
# Save to file
ocr.save_result(result, 'output.txt', format='txt')from ocr_engine import OCREngine, OCRConfig
# Custom configuration
config = OCRConfig(
engine='easyocr',
languages=['en', 'fr'],
confidence_threshold=0.8,
preprocessing=True,
dpi=300
)
# Initialize with config
ocr = OCREngine(config=config)
# Batch processing
files = ['doc1.jpg', 'doc2.png', 'doc3.pdf']
results = ocr.process_batch(files)
for file, result in results.items():
print(f"{file}: {result.confidence:.2f}")
ocr.save_result(result, f"{file}.txt")from ocr_engine import OCREngine, OCRError
try:
ocr = OCREngine()
result = ocr.extract_text('document.jpg')
if result.confidence < 0.5:
print("Warning: Low confidence OCR result")
except OCRError as e:
print(f"OCR Error: {e}")
except FileNotFoundError:
print("Input file not found")
except Exception as e:
print(f"Unexpected error: {e}")ocr_document_converter/
βββ π ocr_engine/ # Core OCR engine modules
β βββ __init__.py # Package initialization
β βββ ocr_engine.py # Main OCR engine class
β βββ ocr_engine_minimal.py # Lightweight OCR implementation
β βββ image_processor.py # Image preprocessing utilities
β βββ format_detector.py # File format detection
β βββ ocr_integration.py # Integration layer
β
βββ π gui/ # GUI components
β βββ universal_document_converter_ocr.py # Main GUI application
β βββ universal_document_converter_enhanced.py # Enhanced GUI features
β βββ ocr_gui_integration.py # GUI-OCR integration
β
βββ π tests/ # Test suite
β βββ test_ocr_integration.py # Integration tests
β βββ validate_ocr_integration.py # Validation scripts
β βββ test_data/ # Sample test files
β βββ sample_document.jpg
β βββ multi_language.png
β βββ low_quality.pdf
β
βββ π config/ # Configuration files
β βββ tesseract_config.json # Tesseract settings
β βββ easyocr_config.json # EasyOCR settings
β βββ gui_settings.json # GUI preferences
β βββ language_config.json # Language settings
β
βββ π output/ # Default output directory
βββ π temp/ # Temporary processing files
βββ π cache/ # OCR result cache
βββ π logs/ # Application logs
βββ π vb6_vfp9_integration/ # Legacy VB6/VFP9 integration package
β βββ UniversalConverter32.py # Main integration module
β βββ VB6_Example.vb # VB6 integration template
β βββ VFP9_Example.prg # VFP9 integration template
β βββ build_dll.bat # DLL/executable builder script
β βββ README.md # Legacy integration documentation
β
βββ π requirements.txt # Python dependencies
βββ π setup_ocr_environment.py # Automated setup script
βββ π README.md # This comprehensive guide
βββ π OCR_README.md # Technical OCR documentation
βββ π OCR_INTEGRATION_COMPLETE.md # Integration completion notes
βββ π .gitignore # Git ignore rules
βββ π LICENSE # MIT License
| File | Purpose | Key Features |
|---|---|---|
ocr_engine/ocr_engine.py |
Main OCR processing | Dual engine support, batch processing |
universal_document_converter_ocr.py |
GUI application | Drag-drop, settings panel, progress tracking |
setup_ocr_environment.py |
Automated installer | Dependencies, Tesseract, language packs |
test_ocr_integration.py |
Comprehensive tests | Unit tests, integration tests, benchmarks |
validate_ocr_integration.py |
Validation suite | System validation, performance tests |
requirements.txt |
Dependencies | All Python packages with versions |
# Run all tests
python test_ocr_integration.py
# Run validation suite
python validate_ocr_integration.py
# Run specific test categories
python test_ocr_integration.py --category unit
python test_ocr_integration.py --category integration
python test_ocr_integration.py --category performance- Unit Tests: 45+ individual component tests
- Integration Tests: End-to-end OCR workflows
- Performance Tests: Speed and memory benchmarks
- Language Tests: Multi-language OCR accuracy
- Format Tests: All supported input/output formats
- Error Handling: Exception and edge case testing
| Test Category | Files Tested | Success Rate | Avg. Processing Time |
|---|---|---|---|
| English Text | 100+ | 98.5% | 2.3s per page |
| Multi-Language | 50+ | 95.2% | 3.1s per page |
| Low Quality | 30+ | 87.8% | 4.2s per page |
| Batch Processing | 500+ | 97.1% | 1.8s per page |
File: Universal-Document-Converter-v3.1.0-Windows-Complete.zip (59 KB)
Contains EVERYTHING including:
- β Full GUI application with OCR
- β
CLI interface (
cli.py) - β OCR engines (Tesseract & EasyOCR support)
- β VFP9/VB6 integration (DLL package included)
- β All documentation
- β Automated installer
# Download from GitHub Releases
https://github.com/Beaulewis1977/quick_ocr_doc_converter/releases/latest/download/Universal-Document-Converter-v3.1.0-Windows-Complete.zipFile: UniversalConverter32.dll.zip (12 KB)
For users who ONLY need VFP9/VB6 integration:
- π¦ Lightweight download
- π DLL wrapper files
- π VFP9/VB6 example code
- π Integration documentation
- π§ Batch DLL simulator
# Download DLL package only
https://github.com/Beaulewis1977/quick_ocr_doc_converter/releases/latest/download/UniversalConverter32.dll.zip- Download the complete package
- Extract to any folder
- Run
install.batas Administrator - Launch using desktop shortcut or
run_ocr_converter.bat
# Clone and setup in one command
git clone https://github.com/Beaulewis1977/quick_ocr_document_converter.git
cd quick_ocr_document_converter
python setup_ocr_environment.py# Create virtual environment (recommended)
python -m venv ocr_env
source ocr_env/bin/activate # Linux/Mac
# or
ocr_env\Scripts\activate # Windows
# Install Python dependencies
pip install -r requirements.txtWindows:
# Download and install from:
# https://github.com/UB-Mannheim/tesseract/wiki
# Add to PATH: C:\Program Files\Tesseract-OCRmacOS:
# Using Homebrew
brew install tesseract
# Install language packs
brew install tesseract-langLinux (Ubuntu/Debian):
# Install Tesseract
sudo apt-get update
sudo apt-get install tesseract-ocr
# Install language packs
sudo apt-get install tesseract-ocr-eng tesseract-ocr-fra tesseract-ocr-deuLinux (CentOS/RHEL):
# Install Tesseract
sudo yum install epel-release
sudo yum install tesseract tesseract-langpack-eng# Install PyTorch (CPU version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# For GPU support (optional)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118# Dockerfile
FROM python:3.9-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
tesseract-ocr \
tesseract-ocr-eng \
tesseract-ocr-fra \
tesseract-ocr-deu \
libgl1-mesa-glx \
libglib2.0-0
# Copy application
COPY . /app
WORKDIR /app
# Install Python dependencies
RUN pip install -r requirements.txt
# Run application
CMD ["python", "universal_document_converter_ocr.py"]# Build and run Docker container
docker build -t ocr-converter .
docker run -p 8080:8080 -v $(pwd)/output:/app/output ocr-converter# Error: TesseractNotFoundError
# Solution: Add Tesseract to PATH
export PATH=$PATH:/usr/local/bin/tesseract # Linux/Mac
# or add C:\Program Files\Tesseract-OCR to Windows PATH# Try different preprocessing options
config = {
"preprocessing": {
"denoise": True,
"contrast_enhance": True,
"rotation_correction": True,
"dpi_optimization": True
}
}# Reduce batch size and enable memory optimization
config = {
"batch_size": 1,
"memory_limit": "1GB",
"enable_gc": True
}# Specify languages explicitly
config = {
"language": "eng+fra+deu", # Multiple languages
"auto_detect": False
}# Enable debug logging
export OCR_DEBUG=1
python universal_document_converter_ocr.py --debug
# Check log files
tail -f logs/ocr_debug.log- Check the logs:
logs/ocr_application.log - Run validation:
python validate_ocr_integration.py - Test with sample files: Use files in
tests/test_data/ - Create an issue: GitHub Issues
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Run the test suite:
python test_ocr_integration.py - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- New OCR Engines: Add support for additional OCR backends
- Language Support: Add new language models and detection
- Image Processing: Improve preprocessing algorithms
- GUI Enhancements: Add new features to the user interface
- Performance: Optimize processing speed and memory usage
- Documentation: Improve guides and API documentation
- Testing: Add more test cases and benchmarks
# Clone your fork
git clone https://github.com/YOUR_USERNAME/quick_ocr_document_converter.git
cd quick_ocr_document_converter
# Create development environment
python -m venv dev_env
source dev_env/bin/activate
# Install development dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Run tests
python -m pytest tests/
# Run linting
flake8 ocr_engine/
black ocr_engine/- Follow PEP 8 Python style guidelines
- Use Black for code formatting
- Add docstrings to all functions and classes
- Write comprehensive tests for new features
- Update documentation for any changes
This project is licensed under the MIT License - see the LICENSE file for details.
- Tesseract OCR - Google's open-source OCR engine
- EasyOCR - JaidedAI's neural network OCR
- OpenCV - Computer vision library for image processing
- PyTorch - Machine learning framework for EasyOCR
- Tkinter - Python's standard GUI toolkit
Made with β€οΈ for the OCR community
β Star this repository if it helped you! β
Building and maintaining OCR Document Converter takes time and resources. While the tool is completely free, your voluntary support helps ensure continued development and improvements.
If this tool has saved you time or added value to your work, consider showing your appreciation:
Venmo: @BeauinTulsa
Ko-fi: https://ko-fi.com/beaulewis
Together, we're making document conversion accessible to everyone. Thank you! πͺ
- Documentation: OCR_README.md
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: Create an issue for support
- Double-click
create_executable.py - Wait for compilation (creates a single .exe file)
- Share the .exe - works on any Windows computer without Python!
python universal_document_converter.py- π Universal Format Support: Convert between 6 input and 5 output formats (30 combinations)
- β‘ Lightning Fast: Multi-threaded processing with intelligent caching
- π±οΈ Drag & Drop: Intuitive interface with enhanced file/folder drag-and-drop
- π Batch Processing: Convert entire folders recursively with progress tracking
- π― Smart Detection: Automatic file format detection with fallback support
- π§ Zero APIs: Works completely offline without external dependencies
- π οΈ Advanced Settings: Comprehensive configuration system with GUI settings panel
- πΎ Settings Persistence: Automatic saving of user preferences and window positions
- π Profile Management: Multiple configuration profiles for different use cases
- π Import/Export: Share configurations between installations
- β‘ CLI Configuration: Full command-line configuration support with profiles
- π Multi-Threading: 2-4x performance improvement with configurable worker threads
- π§ Intelligent Caching: Prevents redundant conversions of unchanged files
- π Memory Optimization: 50-80% memory reduction for large files through streaming
- π Real-time Progress: Visual progress tracking with detailed conversion results
- π Professional Logging: Enterprise-grade logging system with file rotation
- π₯οΈ Native Windows Integration: Start Menu shortcuts, taskbar pinning, registry file associations
- π§ Linux Desktop Integration: .desktop files, MIME types, applications menu, file manager integration
- π macOS App Bundle: Native .app bundles, Dock integration, Finder associations, Spotlight search
- π¦ Universal Packaging: .deb, .rpm, AppImage, .dmg, .pkg, and .msi installers
- π§ Platform Detection: Automatic platform-specific paths and configurations
- π₯οΈ Modern GUI: Clean, responsive interface with tabbed settings
- π Desktop Integration: Native shortcuts and file associations on all platforms
- π File Opening: Built-in file opening with default applications
- π― Drag & Drop: Enhanced file and folder drag-and-drop support
- π Privacy First: All processing happens locally on your machine
| Input Formats (6) | Output Formats (5) |
|---|---|
| DOCX - Microsoft Word Documents | Markdown - GitHub-flavored markdown |
| PDF - Portable Document Format | TXT - Plain text with formatting |
| TXT - Plain text files | HTML - Clean, semantic HTML |
| HTML - Web pages and documents | RTF - Rich Text Format |
| RTF - Rich Text Format | EPUB - Electronic Publication (eBooks) |
| EPUB - Electronic Publication (eBooks) |
Total Conversion Combinations: 30 (6 Γ 5)
- π Full EPUB Reading: Extracts text