OCRmyTamilPDF is a GUI tool that makes scanned Tamil PDFs searchable and copy-pasteable by adding a text layer.
- Simple UI and Strong performance
- Cross-platform
OCRmyTamilPDF requires Python (built on 3.13.7) along with three external programs: Tk, Ghostscript and Tesseract OCR. It runs on literally all desktop platforms where Python is supported, such as Linux, Windows, macOS, and FreeBSD.
Note: For only running the executable binary file, installing the tk package is not required.
- Install the required external packages with the following commands based on your Linux distro:
Debian-based:
sudo apt update && sudo apt install tesseract-ocr tesseract-data-tam tesseract-data-eng tk ghostscriptArch-based:
sudo pacman -Syu tesseract tesseract-data-tam tesseract-data-eng tk ghostscript- From the project’s root directory, install the required Python packages with the following command:
pip install -r requirements.txt
- Download and install Ghostscript and Tesseract OCR software and add them to the system path. Make sure to select Tamil language pack while installing tesseract via installer.
- From the project’s root directory, install the required Python packages with the following command:
pip install -r requirements.txt
Note: Installing the tk package is not required as it is already bundled with Python in Windows.
If using the script: Run python main.py
If using the executable (download the .exe for Windows or .bin for Linux from the Releases page): Run it directly by double-clicking or from the terminal
If you face any problems or have suggestions, feel free to open an issue.
This project is licensed under the GPLv3 License - see the LICENSE file for details.
This project uses OCRmyPDF, which is licensed under the Mozilla Public License 2.0.
No modifications were made to OCRmyPDF itself.
See the OCRmyPDF repository for license details.
