Skip to content

SeifeddineSAAD/serverless-ai-document-mgmt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

🤖 Serverless AI Document Management System

AWS Terraform Python License: MIT CI

Event-driven serverless application that automatically analyzes uploaded documents and images using AWS AI services.
Built with Lambda, API Gateway, S3, DynamoDB, Rekognition, Textract, SNS — fully provisioned with Terraform.


🧠 What It Does

Users upload a document or image via a REST API. The system:

  1. Stores the file securely in S3 (encrypted, versioned)
  2. Triggers an AI analysis pipeline automatically (event-driven, no polling)
  3. Extracts text from PDFs/images using Amazon Textract
  4. Detects objects, faces, labels, and content moderation using Amazon Rekognition
  5. Stores all results in DynamoDB with full metadata
  6. Notifies via SNS email when analysis completes
  7. Exposes a REST API so any frontend can list, retrieve, and delete documents

📐 Architecture

                    ┌──────────────────────────────────────────────────────────────┐
                    │                    AWS Cloud (eu-west-1)                      │
                    │                                                               │
  Client / App      │   ┌──────────────────────────────────────────────────────┐   │
      │             │   │              API Gateway (REST)                       │   │
      │─────────────┼──▶│  POST /documents     → Lambda: upload_handler         │   │
      │             │   │  GET  /documents     → Lambda: list_handler           │   │
      │             │   │  GET  /documents/{id}→ Lambda: list_handler           │   │
      │             │   │  DELETE /documents/{id}→ Lambda: delete_handler       │   │
      │             │   └────────────────────────────┬─────────────────────────┘   │
      │             │                                │                             │
      │             │   ┌────────────────────────────▼─────────────────────────┐   │
      │             │   │              Lambda: upload_handler (Python 3.12)     │   │
      │             │   │  • Validates file type and size                       │   │
      │             │   │  • Generates presigned S3 URL                         │   │
      │             │   │  • Stores metadata in DynamoDB (status: PENDING)      │   │
      │             │   └────────────────────────────┬─────────────────────────┘   │
      │             │                                │                             │
      │             │                                ▼                             │
      │             │   ┌──────────────────────────────────────────────────────┐   │
      │             │   │              S3 Bucket (encrypted + versioned)        │   │
      │             │   │              uploads/  →  triggers S3 event           │   │
      │             │   └────────────────────────────┬─────────────────────────┘   │
      │             │                                │  s3:ObjectCreated           │
      │             │                                ▼                             │
      │             │   ┌──────────────────────────────────────────────────────┐   │
      │             │   │              Lambda: analyze_handler (Python 3.12)    │   │
      │             │   │                                                       │   │
      │             │   │  ┌─────────────────┐    ┌──────────────────────────┐ │   │
      │             │   │  │ Amazon Textract  │    │  Amazon Rekognition       │ │   │
      │             │   │  │ • Extract text   │    │  • Detect labels          │ │   │
      │             │   │  │ • Detect forms   │    │  • Content moderation     │ │   │
      │             │   │  │ • Read tables    │    │  • Detect text in images  │ │   │
      │             │   │  └────────┬────────┘    └───────────┬──────────────┘ │   │
      │             │   │           └──────────────────────────┘               │   │
      │             │   │                         │                             │   │
      │             │   │              Updates DynamoDB (status: DONE)         │   │
      │             │   │              Sends SNS notification                   │   │
      │             │   └──────────────────────────────────────────────────────┘   │
      │             │                                                               │
      │             │   ┌────────────────────┐    ┌──────────────────────────────┐ │
      │             │   │  DynamoDB Table     │    │  SNS Topic                   │ │
      │             │   │  (on-demand, TTL)   │    │  → Email notification        │ │
      │             │   └────────────────────┘    └──────────────────────────────┘ │
      │             │                                                               │
      │             │   ┌──────────────────────────────────────────────────────┐   │
      │             │   │  CloudWatch: Logs + Alarms + X-Ray Tracing           │   │
      │             │   └──────────────────────────────────────────────────────┘   │
      │             └───────────────────────────────────────────────────────────────┘

🚀 API Endpoints

Method Path Description
POST /documents Get a presigned URL to upload a file
GET /documents List all documents with AI analysis results
GET /documents/{id} Get one document with full AI results
DELETE /documents/{id} Delete a document (S3 + DynamoDB)

Example — Upload a document

# Step 1: Get presigned URL
curl -X POST https://YOUR_API_ID.execute-api.eu-west-1.amazonaws.com/prod/documents \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{"filename": "invoice.pdf", "content_type": "application/pdf"}'

# Response:
# {
#   "document_id": "doc_a1b2c3d4",
#   "upload_url": "https://s3.amazonaws.com/...",
#   "expires_in": 300
# }

# Step 2: Upload directly to S3
curl -X PUT "PRESIGNED_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary @invoice.pdf

# Step 3: Check analysis results (after ~5 seconds)
curl https://YOUR_API_ID.execute-api.eu-west-1.amazonaws.com/prod/documents/doc_a1b2c3d4 \
  -H "x-api-key: YOUR_API_KEY"

Example — Analysis Result (DynamoDB item)

{
  "document_id": "doc_a1b2c3d4",
  "filename": "invoice.pdf",
  "status": "ANALYZED",
  "uploaded_at": "2026-03-15T10:23:45Z",
  "analyzed_at": "2026-03-15T10:23:52Z",
  "file_size_bytes": 204800,
  "s3_key": "uploads/2026/03/doc_a1b2c3d4/invoice.pdf",
  "ai_results": {
    "textract": {
      "extracted_text": "INVOICE #2024-001\nDate: March 15, 2026\nTotal: $1,250.00",
      "word_count": 487,
      "page_count": 2,
      "forms_detected": ["Invoice Number", "Date", "Total Amount"],
      "confidence": 99.2
    },
    "rekognition": {
      "labels": [
        {"name": "Document", "confidence": 99.8},
        {"name": "Text", "confidence": 98.4},
        {"name": "Page", "confidence": 97.1}
      ],
      "content_moderation": "SAFE",
      "text_detected": true
    }
  },
  "ttl": 1773000000
}

📁 Project Structure

serverless-ai-document-mgmt/
├── lambdas/
│   ├── upload_handler/        # Validates upload requests, returns presigned URL
│   │   └── handler.py
│   ├── analyze_handler/       # Triggered by S3 event, calls Textract + Rekognition
│   │   └── handler.py
│   ├── list_handler/          # Lists all docs or fetches one by ID
│   │   └── handler.py
│   └── delete_handler/        # Deletes from S3 + DynamoDB
│       └── handler.py
├── layers/
│   └── common/python/         # Shared utilities (response helpers, DynamoDB client)
│       └── utils.py
├── terraform/
│   ├── modules/
│   │   ├── api_gateway/       # REST API + usage plan + API key
│   │   ├── lambda/            # All 4 Lambda functions + IAM roles
│   │   ├── storage/           # S3 + DynamoDB + S3 event trigger
│   │   ├── ai_pipeline/       # Rekognition + Textract permissions + SNS
│   │   └── monitoring/        # CloudWatch alarms + X-Ray + dashboard
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
├── frontend/
│   └── index.html             # Simple demo UI to test the API
├── tests/
│   ├── test_upload.py
│   ├── test_analyze.py
│   └── test_api.py
├── .github/workflows/
│   └── deploy.yml
└── README.md

⚡ Key Design Decisions

Decision Reason
Presigned S3 URLs Client uploads directly to S3 — Lambda never handles binary data, saving memory and cost
Event-driven pipeline S3 ObjectCreated event triggers analysis — no polling, no queues needed for this scale
DynamoDB on-demand No capacity planning needed, pay per request, scales to 0 at rest
TTL on DynamoDB items Documents auto-expire after 90 days to manage storage costs
X-Ray tracing Full distributed tracing across API Gateway → Lambda → S3 → Rekognition
API Key auth Simple but effective for a portfolio project; production would use Cognito

💰 Cost at Rest

This serverless architecture costs $0.00/month when idle — you only pay when requests come in.

Service Cost
Lambda (first 1M requests/month) Free tier
API Gateway (first 1M calls/month) Free tier
DynamoDB (on-demand, low traffic) ~$0.01
S3 (1GB storage) ~$0.02
Textract / Rekognition ~$0.001 per document
Total for demo/portfolio < $0.05/month

🔐 Security

  • ✅ S3 bucket: private, versioned, encrypted (AES-256), no public access
  • ✅ API protected with API key (+ rate limiting: 100 req/day for demo)
  • ✅ Lambda has minimum IAM permissions (least privilege per function)
  • ✅ DynamoDB encrypted at rest
  • ✅ X-Ray tracing for full audit trail
  • ✅ CloudWatch alarms on Lambda errors and API 5XX

👤 Author

[Your Name]
AWS Solutions Architect Associate | AWS AI Practitioner | Terraform Associate | PMP
📧 your.email@example.com
🔗 LinkedIn | GitHub

About

> **Event-driven serverless application** that automatically analyzes uploaded documents and images using AWS AI services. > Built with **Lambda, API Gateway, S3, DynamoDB, Rekognition, Textract, SNS** — fully provisioned with Terraform.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors