A serverless AWS Lambda function that converts PDF files to images. The function accepts PDF uploads, converts each page to an image (JPEG by default), and stores them in S3. It includes a web interface for easy testing and features path-based routing, caching of results, and source IP tagging.
To see it in action, register at Chat With My Slides and upload your PDFs.
- Convert PDF files to individual images (JPEG by default)
- Path-based API routing for better organization
- Caching of conversion results for improved performance
- Source IP tagging of uploaded images for tracking and analytics
- Web-based testing interface
- CORS-enabled for browser access
- Secure pre-signed URLs for direct S3 uploads
- Proper error handling and logging
- AWS Lambda: Handles PDF processing and image conversion
- Amazon S3: Stores uploaded PDFs and converted images
- Lambda Function URL: Provides HTTP endpoint for the function
.
├── README.md
├── template.yaml # SAM template for AWS resources
├── deps
│ ├── build_layer.sh # Script to build Lambda layer with Poppler
│ └── requirements.txt # Python dependencies
├── src
│ └── app.py # Lambda function code
└── test_lambda.html # Web testing interface
The Lambda function provides these endpoints through its Function URL:
-
Get Upload URL (
GET /upload_url)- Returns a pre-signed URL for uploading PDF to S3. The client should upload the PDF to this URL before starting the conversion process.
- Response:
{ "uploadUrl": "...", "fileId": "..." }
-
Process PDF (
GET /process/<file-id>)- Converts the uploaded PDF to images
- Response:
{ "fileId": "...", "imageUrls": ["...", "..."], "pageCount": N } - The source IP of the request is automatically tagged to the uploaded images
The function processes PDF files and converts them to images:
- Each PDF page is converted to a JPEG image by default
- Two versions of each image are created: main (full size) and preview (thumbnail)
- Results are cached for faster retrieval on subsequent requests
- Source IP of the requester is tagged to each uploaded image
-
Install prerequisites:
- AWS SAM CLI
- Docker (for building the Lambda layer)
- Git (for cloning the repository)
-
Clone this repository:
git clone https://github.com/ai-1st/pdf-to-image-aws-lambda.git cd pdf-to-image-aws-lambda -
Build the Lambda layer (see Building the Lambda Layer section for details):
cd deps chmod +x build_layer.sh ./build_layer.sh cd ..
-
Deploy using SAM:
sam build sam deploy --guided
Before deploying, you need to build the Lambda layer that contains Poppler and other dependencies. The layer is built using Docker to ensure compatibility with the Lambda environment.
- Docker installed and running
- AWS SAM CLI
- Bash shell
-
Navigate to the
depsdirectory:cd deps -
Make the build script executable:
chmod +x build_layer.sh
-
Run the build script:
./build_layer.sh
The script will:
- Create a Docker container based on the AWS Lambda Python 3.13 ARM64 image
- Install system packages including Poppler and its dependencies
- Install Python packages from requirements.txt
- Copy necessary binaries and shared libraries
- Create a layer directory with the correct structure
- Clean up temporary files
The resulting layer will be created in the deps/layer directory with the following structure:
layer/
├── bin/ # Poppler binaries (pdfinfo, pdftoppm, pdftocairo)
├── lib/ # Shared libraries
└── python/ # Python packages
- Open
test_lambda.htmlin a web browser - Enter your Lambda Function URL
- Upload a PDF file
- View the converted images in the grid layout
- Python 3.8+
- pdf2image
- boto3
- Poppler (installed in Lambda layer)
BUCKET_NAME: S3 bucket name for file storage
- CORS enabled for browser access
- Pre-signed URLs for secure S3 uploads
- Public Lambda Function URL with CORS controls
The function includes comprehensive error handling for:
- Invalid file types
- Failed conversions
- S3 upload/download issues
- CORS and pre-signed URL issues
Feel free to open issues or submit pull requests for improvements.
MIT License