DeepSeek OCR PDF Service

Complete OCR service for PDF documents with layout detection, powered by DeepSeek-OCR and vLLM.

Quick Start

Prerequisites

NVIDIA GPU with CUDA 11.8 support
Python 3.12 virtual environment at .venv/
At least 8GB GPU memory

Installation

Run the installation script:

./install/install.sh

This will install:

PyTorch 2.6.0 with CUDA 11.8
vLLM 0.8.5
flash-attn 2.7.3
All required dependencies

Start the Service

./run.sh

The service will be available at http://localhost:8000

Check Status

./status.sh

Stop the Service

./stop.sh

Installation

System Requirements

Hardware:

NVIDIA GPU (tested on A40 with 44GB memory)
CUDA 11.8 compatible GPU
Minimum 8GB GPU memory

Software:

Ubuntu 24.04 (or compatible)
Python 3.12
CUDA 11.8
nvidia-smi

Installation Steps

Create virtual environment (if not exists):

python3.12 -m venv .venv

Run installation script:

chmod +x install/install.sh
./install/install.sh

The script will:

Activate virtual environment
Install PyTorch with CUDA 11.8 support
Install vLLM wheel
Install all dependencies
Install flash-attn
Display authentication setup instructions

Verify installation:

.venv/bin/python -c "import torch, vllm; print(f'PyTorch: {torch.__version__}, vLLM: {vllm.__version__}')"

Redis Setup (Optional)

Redis can be used as a message broker to ensure tasks are processed sequentially (one at a time), preventing concurrent GPU access issues. This is optional but recommended for production use.

Quick Redis Installation

Option 1: Docker (Recommended)

chmod +x install/install_redis_docker.sh
./install/install_redis_docker.sh

Option 2: Standalone System Installation

chmod +x install/install_redis_standalone.sh
sudo ./install/install_redis_standalone.sh

Configuration

After installing Redis, update your .env file:

# Copy example if you haven't already
cp .env.example .env

# Edit .env and uncomment Redis settings:
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
QUEUE_NAME=deepseek_ocr_tasks
MAX_WORKERS=1  # Process one task at a time

Install Python Dependencies

.venv/bin/pip install redis rq

Verify Redis

# Test connection
redis-cli ping
# Should return: PONG

# Or with Docker
docker exec -it deepseek-redis redis-cli ping

For detailed Redis setup instructions, see REDIS_SETUP.md.

Service Management

Three scripts provide complete service lifecycle management:

run.sh - Start Service

./run.sh

Features:

Checks if service is already running (prevents duplicates)
Validates environment and port availability
Creates PID file for reliable tracking
Starts service in background with logging
Waits for service to be ready (up to 60 seconds)
Displays service URLs and status
Checks authentication configuration

Output:

=== DeepSeek OCR PDF Service ===

✓ Service started successfully
  PID: 12345
  Port: 8000

Service URLs:
  • Health check: http://localhost:8000/health
  • API docs: http://localhost:8000/docs
  • Base URL: http://localhost:8000/

Useful commands:
  • View logs: tail -f /tmp/deepseek_ocr.log
  • Stop service: ./stop.sh

status.sh - Check Status

./status.sh

Displays:

Process status (PID, uptime, CPU/memory usage)
GPU metrics (memory, utilization, temperature)
Authentication status
Recent log entries (last 5 lines)
Service URLs (if running)

stop.sh - Stop Service

./stop.sh

Features:

Graceful shutdown (SIGTERM, waits 10 seconds)
Force-kill if necessary (SIGKILL)
Cleans up PID file
Stops orphaned processes
Displays GPU memory status
Shows log file location

Common Workflows

Start and monitor:

./run.sh
tail -f /tmp/deepseek_ocr.log

Restart service:

./stop.sh && ./run.sh

Check if running:

./status.sh | grep -q "Service is RUNNING" && echo "Running" || echo "Not running"

File Locations

File	Location	Purpose
PID file	`/tmp/deepseek_ocr.pid`	Process ID tracking
Log file	`/tmp/deepseek_ocr.log`	Service logs
Config	`.env`	Authentication token

Authentication

The service supports token-based authentication using Bearer tokens.

Setup Authentication

Create .env file:

cp .env.example .env

Generate a secure token:

# Using Python
python -c "import secrets; print(secrets.token_hex(32))"

# Using OpenSSL
openssl rand -hex 32

Edit .env and set your token:

AUTH_TOKEN=your-generated-token-here

Restart service (if running):

./stop.sh && ./run.sh

Disable Authentication

To disable authentication (development only):

Remove or comment out AUTH_TOKEN in .env, or
Don't create a .env file

Protected Endpoints

When AUTH_TOKEN is set, these endpoints require authentication:

POST /process_pdf - Upload and process PDF
GET /result/{job_id}/markdown - Get markdown output
GET /result/{job_id}/markdown_det - Get markdown with detections
GET /result/{job_id}/layout_pdf - Download layout PDF
GET /result/{job_id}/images - List extracted images
GET /result/{job_id}/images/{image_name} - Get specific image
DELETE /result/{job_id} - Delete job files

Public Endpoints

These endpoints are always accessible without authentication:

GET / - API information
GET /health - Health check

API Usage

Using curl

Upload and process PDF:

curl -X POST "http://localhost:8000/process_pdf" \
  -H "Authorization: Bearer your-token-here" \
  -F "file=@document.pdf"

Response:

{
  "job_id": "abc123-def456-789...",
  "status": "completed",
  "message": "PDF processed successfully"
}

Get markdown result:

curl -X GET "http://localhost:8000/result/{job_id}/markdown" \
  -H "Authorization: Bearer your-token-here"

Download layout PDF:

curl -X GET "http://localhost:8000/result/{job_id}/layout_pdf" \
  -H "Authorization: Bearer your-token-here" \
  -o layout.pdf

Health check (no auth required):

curl http://localhost:8000/health

Using Python

import requests

# Configure
API_URL = "http://localhost:8000"
AUTH_TOKEN = "your-token-here"
headers = {"Authorization": f"Bearer {AUTH_TOKEN}"}

# Upload PDF
with open("document.pdf", "rb") as f:
    files = {"file": f}
    response = requests.post(
        f"{API_URL}/process_pdf",
        headers=headers,
        files=files
    )
    result = response.json()
    job_id = result["job_id"]
    print(f"Job ID: {job_id}")

# Get markdown result
response = requests.get(
    f"{API_URL}/result/{job_id}/markdown",
    headers=headers
)
markdown_content = response.json()["content"]
print(markdown_content)

# Download layout PDF
response = requests.get(
    f"{API_URL}/result/{job_id}/layout_pdf",
    headers=headers
)
with open("layout.pdf", "wb") as f:
    f.write(response.content)

# Clean up
requests.delete(f"{API_URL}/result/{job_id}", headers=headers)

Using JavaScript

const API_URL = "http://localhost:8000";
const AUTH_TOKEN = "your-token-here";

// Upload PDF
const formData = new FormData();
formData.append("file", pdfFile);

const uploadResponse = await fetch(`${API_URL}/process_pdf`, {
    method: "POST",
    headers: {
        "Authorization": `Bearer ${AUTH_TOKEN}`
    },
    body: formData
});

const { job_id } = await uploadResponse.json();

// Get markdown result
const resultResponse = await fetch(
    `${API_URL}/result/${job_id}/markdown`,
    {
        headers: {
            "Authorization": `Bearer ${AUTH_TOKEN}`
        }
    }
);

const { content } = await resultResponse.json();
console.log(content);

API Endpoints Reference

Method	Endpoint	Auth	Description
GET	`/`	No	API information
GET	`/health`	No	Health check
POST	`/process_pdf`	Yes*	Upload and process PDF
GET	`/result/{job_id}/markdown`	Yes*	Get markdown output
GET	`/result/{job_id}/markdown_det`	Yes*	Get markdown with detections
GET	`/result/{job_id}/layout_pdf`	Yes*	Download layout PDF
GET	`/result/{job_id}/images`	Yes*	List extracted images
GET	`/result/{job_id}/images/{image_name}`	Yes*	Get specific image
DELETE	`/result/{job_id}`	Yes*	Delete job files

*Auth required only if AUTH_TOKEN is configured in .env

Interactive API Documentation:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Configuration

Environment Variables

Create a .env file in the project root:

# Authentication Token
# Set this to enable token-based authentication
# If not set, API will be accessible without authentication
AUTH_TOKEN=your-secret-token-here

Service Configuration

Edit config.py to modify:

MODEL_PATH - Model location
PROMPT - OCR prompt template
SKIP_REPEAT - Skip repeated content
MAX_CONCURRENCY - Max concurrent requests
NUM_WORKERS - Number of worker threads
CROP_MODE - Image cropping mode

Model Configuration

The service uses DeepSeek-OCR model with these settings:

Model: deepseek-ai/DeepSeek-OCR
Max sequence length: 8192 tokens
GPU memory utilization: 90%
Tensor parallel size: 1
Block size: 256

Troubleshooting

Service Won't Start

Check if already running:

./status.sh

Check for port conflicts:

netstat -tuln | grep 8000
lsof -i :8000

Check logs:

tail -100 /tmp/deepseek_ocr.log

Check virtual environment:

ls -la .venv/bin/python
.venv/bin/python --version

Service Won't Stop

Force stop:

./stop.sh
# If that doesn't work:
pkill -9 -f "serve_pdf.py"
rm -f /tmp/deepseek_ocr.pid

Authentication Issues

401 Unauthorized Error:

Verify token matches AUTH_TOKEN in .env
Check "Bearer " prefix in Authorization header
Ensure .env file is loaded (restart service)

Disable authentication:

# Comment out or remove AUTH_TOKEN from .env
sed -i 's/^AUTH_TOKEN=/#AUTH_TOKEN=/' .env
./stop.sh && ./run.sh

GPU Memory Issues

Check GPU status:

nvidia-smi

Free GPU memory:

./stop.sh
# If memory not released:
pkill -9 -f "python.*serve_pdf"
nvidia-smi

Reduce memory usage: Edit serve_pdf.py:

llm = LLM(
    ...
    gpu_memory_utilization=0.7,  # Reduce from 0.9
    max_num_seqs=4,  # Reduce from MAX_CONCURRENCY
)

Import Errors

Missing modules:

# Reinstall dependencies
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install PyMuPDF img2pdf easydict addict

Check Python version:

.venv/bin/python --version  # Should be 3.12.x

Log Management

Log file too large:

./stop.sh
mv /tmp/deepseek_ocr.log /tmp/deepseek_ocr.log.old
./run.sh

Monitor logs:

# Real-time
tail -f /tmp/deepseek_ocr.log

# Last 50 lines
tail -50 /tmp/deepseek_ocr.log

# Search for errors
grep -i error /tmp/deepseek_ocr.log

Stale PID File

rm -f /tmp/deepseek_ocr.pid
./status.sh  # Verify clean state
./run.sh     # Start fresh

Advanced Topics

Systemd Integration

Create /etc/systemd/system/deepseek-ocr.service:

[Unit]
Description=DeepSeek OCR PDF Service
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/root/dpsk
ExecStart=/root/dpsk/.venv/bin/python /root/dpsk/serve_pdf.py
ExecStop=/root/dpsk/stop.sh
Restart=on-failure
RestartSec=10s
StandardOutput=append:/tmp/deepseek_ocr.log
StandardError=append:/tmp/deepseek_ocr.log

[Install]
WantedBy=multi-user.target

Enable and start:

systemctl daemon-reload
systemctl enable deepseek-ocr
systemctl start deepseek-ocr
systemctl status deepseek-ocr

Automated Health Checks

Cron job (check every 5 minutes):

# Add to crontab
*/5 * * * * /root/dpsk/status.sh | grep -q "NOT running" && /root/dpsk/run.sh

Monitoring script:

#!/bin/bash
# check_service.sh

if ! /root/dpsk/status.sh | grep -q "Service is RUNNING"; then
    echo "ALERT: Service is DOWN" | mail -s "Service Alert" admin@example.com
    /root/dpsk/run.sh
fi

Performance Tuning

Adjust concurrency: Edit config.py:

MAX_CONCURRENCY = 8  # Adjust based on GPU memory
NUM_WORKERS = 4      # Adjust based on CPU cores

Optimize GPU usage: Edit serve_pdf.py:

llm = LLM(
    ...
    gpu_memory_utilization=0.85,  # Adjust (0.7-0.95)
    max_num_seqs=MAX_CONCURRENCY,
    tensor_parallel_size=1,        # Increase for multi-GPU
)

Enable CUDA graphs for better performance:

llm = LLM(
    ...
    enforce_eager=False,  # Use CUDA graphs
)

Security Best Practices

Use strong tokens (at least 32 characters)
Rotate tokens regularly
Use HTTPS in production (reverse proxy with nginx/caddy)
Limit token sharing to authorized users only
Never commit .env to version control
Set up firewall rules to restrict access
Monitor access logs for suspicious activity

Reverse Proxy Setup (nginx)

server {
    listen 443 ssl http2;
    server_name ocr.example.com;

    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Increase timeout for long processing
        proxy_read_timeout 300s;
        proxy_connect_timeout 300s;

        # Increase max body size for large PDFs
        client_max_body_size 100M;
    }
}

Project Structure

/root/dpsk/
├── serve_pdf.py              # Main service application
├── pdf_utils.py              # PDF conversion utilities
├── processing_utils.py       # Image processing utilities
├── deepseek_ocr.py           # DeepSeek OCR model
├── config.py                 # Configuration
├── requirements.txt          # Python dependencies
├── install/                  # Installation scripts
│   └── install.sh            # Installation script
├── run.sh                    # Start service script
├── stop.sh                   # Stop service script
├── status.sh                 # Status check script
├── .env.example              # Environment template
├── .env                      # Your configuration (not in git)
├── .venv/                    # Virtual environment
├── process/                  # Processing modules
│   ├── ngram_norepeat.py
│   └── image_process.py
├── deepencoder/              # Encoder modules
│   ├── clip_sdpa.py
│   └── sam_vary_sdpa.py
└── README.md                 # This file

Technical Details

Environment:

Virtual environment: .venv/
Python: 3.12
PyTorch: 2.6.0 with CUDA 11.8
vLLM: 0.8.5
Model: DeepSeek-OCR

GPU Support:

Tested on NVIDIA A40 (44GB)
Requires CUDA 11.8
Uses 90% GPU memory by default

Features:

PDF to image conversion (high quality, 144 DPI)
OCR with layout detection
Bounding box extraction and visualization
Image region extraction
Markdown output with/without layout annotations
Token-based authentication
RESTful API with OpenAPI docs
Concurrent request processing

Support and Contribution

Check Status:

./status.sh

View Logs:

tail -f /tmp/deepseek_ocr.log

Report Issues: Include in your report:

Service status output
Last 50 lines of log
GPU status (nvidia-smi)
Error messages

License

This service uses DeepSeek-OCR model. Please refer to the model's license for usage terms.

Quick Reference

Common Commands

# Installation
./install/install.sh

# Service Management
./run.sh          # Start service
./stop.sh         # Stop service
./status.sh       # Check status

# Logs
tail -f /tmp/deepseek_ocr.log       # Follow logs
grep error /tmp/deepseek_ocr.log    # Find errors

# API Testing
curl http://localhost:8000/health   # Health check
curl http://localhost:8000/docs     # API documentation

# GPU Monitoring
nvidia-smi                           # Check GPU status
watch -n1 nvidia-smi                 # Monitor GPU continuously

Environment Setup

# Create .env
cp .env.example .env

# Generate token
python -c "import secrets; print(secrets.token_hex(32))"

# Edit .env
nano .env

Troubleshooting Commands

# Check if running
ps aux | grep serve_pdf.py

# Check port
netstat -tuln | grep 8000
lsof -i :8000

# Force stop all
pkill -9 -f serve_pdf.py
rm -f /tmp/deepseek_ocr.pid

# Clean restart
./stop.sh && rm -f /tmp/deepseek_ocr.log && ./run.sh

Version: 1.0.0 Last Updated: 2025-10-31

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
deepencoder		deepencoder
process		process
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
API_GUIDE.md		API_GUIDE.md
ARCHITECTURE.md		ARCHITECTURE.md
CHANGES.md		CHANGES.md
Dockerfile		Dockerfile
MIGRATION_GUIDE.md		MIGRATION_GUIDE.md
OOM_PROTECTION.md		OOM_PROTECTION.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
auto_restart.sh		auto_restart.sh
build_docker.sh		build_docker.sh
config.py		config.py
database.py		database.py
deepseek_ocr.py		deepseek_ocr.py
docker-compose.yml		docker-compose.yml
install.sh		install.sh
monitor_memory.sh		monitor_memory.sh
pdf_utils.py		pdf_utils.py
processing_utils.py		processing_utils.py
requirements.txt		requirements.txt
restart.sh		restart.sh
run.sh		run.sh
run_dpsk_ocr_eval_batch.py		run_dpsk_ocr_eval_batch.py
run_dpsk_ocr_image.py		run_dpsk_ocr_image.py
run_dpsk_ocr_pdf.py		run_dpsk_ocr_pdf.py
serve_pdf.py		serve_pdf.py
status.sh		status.sh
stop.sh		stop.sh
task_queue.py		task_queue.py
test_async_api.py		test_async_api.py

Folders and files

Latest commit

History

Repository files navigation

DeepSeek OCR PDF Service

Table of Contents

Quick Start

Prerequisites

Installation

Start the Service

Check Status

Stop the Service

Installation

System Requirements

Installation Steps

Redis Setup (Optional)

Quick Redis Installation

Configuration

Install Python Dependencies

Verify Redis

Service Management

run.sh - Start Service

status.sh - Check Status

stop.sh - Stop Service

Common Workflows

File Locations

Authentication

Setup Authentication

Disable Authentication

Protected Endpoints

Public Endpoints

API Usage

Using curl

Using Python

Using JavaScript

API Endpoints Reference

Configuration

Environment Variables

Service Configuration

Model Configuration

Troubleshooting

Service Won't Start

Service Won't Stop

Authentication Issues

GPU Memory Issues

Import Errors

Log Management

Stale PID File

Advanced Topics

Systemd Integration

Automated Health Checks

Performance Tuning

Security Best Practices

Reverse Proxy Setup (nginx)

Project Structure

Technical Details

Support and Contribution

License

Quick Reference

Common Commands

Environment Setup

Troubleshooting Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages