Event-driven serverless application that automatically analyzes uploaded documents and images using AWS AI services.
Built with Lambda, API Gateway, S3, DynamoDB, Rekognition, Textract, SNS — fully provisioned with Terraform.
Users upload a document or image via a REST API. The system:
- Stores the file securely in S3 (encrypted, versioned)
- Triggers an AI analysis pipeline automatically (event-driven, no polling)
- Extracts text from PDFs/images using Amazon Textract
- Detects objects, faces, labels, and content moderation using Amazon Rekognition
- Stores all results in DynamoDB with full metadata
- Notifies via SNS email when analysis completes
- Exposes a REST API so any frontend can list, retrieve, and delete documents
┌──────────────────────────────────────────────────────────────┐
│ AWS Cloud (eu-west-1) │
│ │
Client / App │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ API Gateway (REST) │ │
│─────────────┼──▶│ POST /documents → Lambda: upload_handler │ │
│ │ │ GET /documents → Lambda: list_handler │ │
│ │ │ GET /documents/{id}→ Lambda: list_handler │ │
│ │ │ DELETE /documents/{id}→ Lambda: delete_handler │ │
│ │ └────────────────────────────┬─────────────────────────┘ │
│ │ │ │
│ │ ┌────────────────────────────▼─────────────────────────┐ │
│ │ │ Lambda: upload_handler (Python 3.12) │ │
│ │ │ • Validates file type and size │ │
│ │ │ • Generates presigned S3 URL │ │
│ │ │ • Stores metadata in DynamoDB (status: PENDING) │ │
│ │ └────────────────────────────┬─────────────────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ S3 Bucket (encrypted + versioned) │ │
│ │ │ uploads/ → triggers S3 event │ │
│ │ └────────────────────────────┬─────────────────────────┘ │
│ │ │ s3:ObjectCreated │
│ │ ▼ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ Lambda: analyze_handler (Python 3.12) │ │
│ │ │ │ │
│ │ │ ┌─────────────────┐ ┌──────────────────────────┐ │ │
│ │ │ │ Amazon Textract │ │ Amazon Rekognition │ │ │
│ │ │ │ • Extract text │ │ • Detect labels │ │ │
│ │ │ │ • Detect forms │ │ • Content moderation │ │ │
│ │ │ │ • Read tables │ │ • Detect text in images │ │ │
│ │ │ └────────┬────────┘ └───────────┬──────────────┘ │ │
│ │ │ └──────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ Updates DynamoDB (status: DONE) │ │
│ │ │ Sends SNS notification │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │
│ │ ┌────────────────────┐ ┌──────────────────────────────┐ │
│ │ │ DynamoDB Table │ │ SNS Topic │ │
│ │ │ (on-demand, TTL) │ │ → Email notification │ │
│ │ └────────────────────┘ └──────────────────────────────┘ │
│ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ CloudWatch: Logs + Alarms + X-Ray Tracing │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ └───────────────────────────────────────────────────────────────┘
| Method | Path | Description |
|---|---|---|
POST |
/documents |
Get a presigned URL to upload a file |
GET |
/documents |
List all documents with AI analysis results |
GET |
/documents/{id} |
Get one document with full AI results |
DELETE |
/documents/{id} |
Delete a document (S3 + DynamoDB) |
# Step 1: Get presigned URL
curl -X POST https://YOUR_API_ID.execute-api.eu-west-1.amazonaws.com/prod/documents \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{"filename": "invoice.pdf", "content_type": "application/pdf"}'
# Response:
# {
# "document_id": "doc_a1b2c3d4",
# "upload_url": "https://s3.amazonaws.com/...",
# "expires_in": 300
# }
# Step 2: Upload directly to S3
curl -X PUT "PRESIGNED_URL" \
-H "Content-Type: application/pdf" \
--data-binary @invoice.pdf
# Step 3: Check analysis results (after ~5 seconds)
curl https://YOUR_API_ID.execute-api.eu-west-1.amazonaws.com/prod/documents/doc_a1b2c3d4 \
-H "x-api-key: YOUR_API_KEY"{
"document_id": "doc_a1b2c3d4",
"filename": "invoice.pdf",
"status": "ANALYZED",
"uploaded_at": "2026-03-15T10:23:45Z",
"analyzed_at": "2026-03-15T10:23:52Z",
"file_size_bytes": 204800,
"s3_key": "uploads/2026/03/doc_a1b2c3d4/invoice.pdf",
"ai_results": {
"textract": {
"extracted_text": "INVOICE #2024-001\nDate: March 15, 2026\nTotal: $1,250.00",
"word_count": 487,
"page_count": 2,
"forms_detected": ["Invoice Number", "Date", "Total Amount"],
"confidence": 99.2
},
"rekognition": {
"labels": [
{"name": "Document", "confidence": 99.8},
{"name": "Text", "confidence": 98.4},
{"name": "Page", "confidence": 97.1}
],
"content_moderation": "SAFE",
"text_detected": true
}
},
"ttl": 1773000000
}serverless-ai-document-mgmt/
├── lambdas/
│ ├── upload_handler/ # Validates upload requests, returns presigned URL
│ │ └── handler.py
│ ├── analyze_handler/ # Triggered by S3 event, calls Textract + Rekognition
│ │ └── handler.py
│ ├── list_handler/ # Lists all docs or fetches one by ID
│ │ └── handler.py
│ └── delete_handler/ # Deletes from S3 + DynamoDB
│ └── handler.py
├── layers/
│ └── common/python/ # Shared utilities (response helpers, DynamoDB client)
│ └── utils.py
├── terraform/
│ ├── modules/
│ │ ├── api_gateway/ # REST API + usage plan + API key
│ │ ├── lambda/ # All 4 Lambda functions + IAM roles
│ │ ├── storage/ # S3 + DynamoDB + S3 event trigger
│ │ ├── ai_pipeline/ # Rekognition + Textract permissions + SNS
│ │ └── monitoring/ # CloudWatch alarms + X-Ray + dashboard
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── frontend/
│ └── index.html # Simple demo UI to test the API
├── tests/
│ ├── test_upload.py
│ ├── test_analyze.py
│ └── test_api.py
├── .github/workflows/
│ └── deploy.yml
└── README.md
| Decision | Reason |
|---|---|
| Presigned S3 URLs | Client uploads directly to S3 — Lambda never handles binary data, saving memory and cost |
| Event-driven pipeline | S3 ObjectCreated event triggers analysis — no polling, no queues needed for this scale |
| DynamoDB on-demand | No capacity planning needed, pay per request, scales to 0 at rest |
| TTL on DynamoDB items | Documents auto-expire after 90 days to manage storage costs |
| X-Ray tracing | Full distributed tracing across API Gateway → Lambda → S3 → Rekognition |
| API Key auth | Simple but effective for a portfolio project; production would use Cognito |
This serverless architecture costs $0.00/month when idle — you only pay when requests come in.
| Service | Cost |
|---|---|
| Lambda (first 1M requests/month) | Free tier |
| API Gateway (first 1M calls/month) | Free tier |
| DynamoDB (on-demand, low traffic) | ~$0.01 |
| S3 (1GB storage) | ~$0.02 |
| Textract / Rekognition | ~$0.001 per document |
| Total for demo/portfolio | < $0.05/month |
- ✅ S3 bucket: private, versioned, encrypted (AES-256), no public access
- ✅ API protected with API key (+ rate limiting: 100 req/day for demo)
- ✅ Lambda has minimum IAM permissions (least privilege per function)
- ✅ DynamoDB encrypted at rest
- ✅ X-Ray tracing for full audit trail
- ✅ CloudWatch alarms on Lambda errors and API 5XX
[Your Name]
AWS Solutions Architect Associate | AWS AI Practitioner | Terraform Associate | PMP
📧 your.email@example.com
🔗 LinkedIn | GitHub