DeepSeek-OCR-2 API Service

This repository provides an API service and Docker support for DeepSeek-OCR-2, utilizing the V2 model with "Visual Causal Flow". This is a port of the API structure from dpsk-ocr adapted for the newer model architecture.

Features

DeepSeek-OCR-2 Model: Uses the latest V2 architecture for improved OCR performance and layout analysis.
FastAPI Backend: Efficient, async-capable API for PDF and image processing.
Dockerized: Ready-to-deploy Docker image with CUDA 11.8, PyTorch 2.6, and vLLM 0.8.5 optimization.
Task Queue: Background processing for handling large PDF files without blocking the API.
Layout Detection: Returns structured markdown, detection coordinates, and layout visualizations.

Prerequisites

NVIDIA GPU (Ampere or newer recommended, e.g., A100, A10, RTX 3090/4090)
roughly 16GB+ VRAM for the model.
Linux with CUDA drivers installed.

Installation

Option 1: Docker (Recommended)

Build the image:
```
docker build -t deepseek-ocr2 .
```

Run the container:

docker run --gpus all -p 8000:8000 -v $(pwd)/data:/app/data deepseek-ocr2

Option 2: Local Conda Environment

Create Environment:

conda create -n deepseek-ocr2 python=3.12 -y
conda activate deepseek-ocr2

Install GPU Dependencies:

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118

# Install vLLM (Compatible wheel for Python 3.12/CUDA 11.8)
wget https://github.com/vllm-project/vllm/releases/download/v0.8.5/vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
pip install vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl

pip install flash-attn==2.7.3 --no-build-isolation

Install Python Requirements:
```
pip install -r requirements.txt
```
Run the API:
```
python serve_pdf.py
```

Usage

API Endpoints

The API runs at http://localhost:8000. Full interactive documentation is available at http://localhost:8000/docs.

1. Process a PDF

POST /process_pdf

Upload a PDF file to start processing.
Returns a job_id.

2. Check Status

GET /result/{job_id}/status

Check if the job is pending, processing, completed, or failed.

3. Get Results

Markdown: GET /result/{job_id}/markdown
Markdown with Coordinates: GET /result/{job_id}/markdown_det
Layout Visualization (PDF): GET /result/{job_id}/layout_pdf
Extracted Images: GET /result/{job_id}/images

CLI Scripts

You can also run the original scripts for direct processing:

Single Image: python run_dpsk_ocr2_image.py
PDF: python run_dpsk_ocr2_pdf.py
Batch Eval: python run_dpsk_ocr2_eval_batch.py

Note: Configure config.py to set input/output paths for CLI scripts.

Project Structure

serve_pdf.py: Main API application.
DeepSeek-OCR2-vllm/: Core model code.
config.py: Configuration settings.
database.py & task_queue.py: Job management.
pdf_utils.py & processing_utils.py: Helper functions for image/PDF handling.

License

This project follows the license of the original DeepSeek-OCR-2 repository.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
deepencoderv2		deepencoderv2
process		process
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
README_OOM_DETECTION.md		README_OOM_DETECTION.md
config.py		config.py
database.py		database.py
deepseek-ocr.service		deepseek-ocr.service
deepseek_ocr2.py		deepseek_ocr2.py
install.sh		install.sh
memory_monitor.py		memory_monitor.py
pdf_utils.py		pdf_utils.py
processing_utils.py		processing_utils.py
requirements.txt		requirements.txt
run_dpsk_ocr2_eval_batch.py		run_dpsk_ocr2_eval_batch.py
run_dpsk_ocr2_image.py		run_dpsk_ocr2_image.py
run_dpsk_ocr2_pdf.py		run_dpsk_ocr2_pdf.py
serve_pdf.py		serve_pdf.py
task_queue.py		task_queue.py
watchdog_restart.py		watchdog_restart.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSeek-OCR-2 API Service

Features

Prerequisites

Installation

Option 1: Docker (Recommended)

Option 2: Local Conda Environment

Usage

API Endpoints

1. Process a PDF

2. Check Status

3. Get Results

CLI Scripts

Project Structure

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepSeek-OCR-2 API Service

Features

Prerequisites

Installation

Option 1: Docker (Recommended)

Option 2: Local Conda Environment

Usage

API Endpoints

1. Process a PDF

2. Check Status

3. Get Results

CLI Scripts

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages