PDFToru is a simple containerized service you can use to parse PDFs over REST API and receive a response (depends on the endpoint).
Container consists of:
- base image: python:3.11-slim
- PDF parser: pdfplumber Python module to parse PDFs
- ASGI server:
uvicornto run the API
Docker image is available from Github Docker Registry: ghcr.io/vahem2lu/pdftoru:latest
Available tags:
v1.x- version tagslatest- latest build
- All parsing endpoints require a POST request
- Responses are always in JSON format.
Configure via environment variables when running the container.
MAX_UPLOAD_SIZE_MB can be changed via environment values. Default is 10. Unit is MB.
APP_VERSION is set by git tag on every build.
This API has one public port necessary and has several endpoints.
The API listens on port 8000 inside the container. Forward it to a suitable port on the Docker host (e.g., 3002).
- GET
/health- check if the service is running
Example health response:
{
"status": "ok",
"version": "v1.0.0",
"uptime_seconds": 100,
"max_upload_MB": 5,
}
- POST
/extract/text- extract full text from PDF - POST
/extract/spatial-text- extract full text from PDF with spatial layout preserved - POST
/extract/words- extract individual words with coordinates - POST
/extract/tables- extract tables and their text - POST
/extract/layout- extract all layout objects with coordinates
Run container via CLI:
docker run -p 3002:8000 -e MAX_UPLOAD_SIZE_MB=50 ghcr.io/vahem2lu/pdftoru:latest
Test with curl:
curl -X POST "http://localhost:3002/extract/text" \
-H "Content-Type: multipart/form-data" \
-F "file=@yourResume.pdf"After some developing you may want to try it out. This can be done either way.
After necessary changes, build local container. You need to have docker desktop environment set up!
docker build --build-arg APP_VERSION="dev-local" -t pdftoru:dev-local .
and for run:
docker run -p 3002:8000 -e MAX_UPLOAD_SIZE_MB=5 pdftoru:dev-local
Make some necessary changes, commit your changes, add tag with v-prefix and push code with tag to your repository.
git add *
git commit -m "Changes"
git tag v1.1
git pushSee LICENSE file.