Skip to content

ReAlive @ hackHarvard2022 | Winner: 'Best use of Google Cloud'

License

Notifications You must be signed in to change notification settings

sacredvoid/ReAlive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

56 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ReAlive πŸŽ΅πŸ“Έ

MIT License Python FastAPI Google Cloud HackHarvard 2022 Best Use of Google Cloud

Winner of "Best Use of Google Cloud" at HackHarvard 2022 πŸ† https://devpost.com/software/realive

ReAlive is an innovative AI-powered web application that brings your old photos to life by adding realistic, contextually-aware audio. Using advanced computer vision and machine learning techniques, ReAlive analyzes images to generate depth maps, extract semantic information, and synthesize immersive audio experiences that recreate what the scene might have sounded like.

🌟 Features

  • 🎨 Image Analysis: Advanced computer vision to extract visual elements and depth information
  • πŸ”Š Smart Audio Synthesis: AI-powered audio generation based on image content
  • πŸ“ Depth Mapping: Monocular depth estimation using CNN and OpenCV
  • 🌐 Web Interface: Clean, responsive web UI built with FastAPI and Bootstrap
  • ☁️ Cloud-Native: Fully deployed on Google Cloud Platform with containerized microservices
  • 🎡 Audio Mixing: Intelligent audio layering and intensity mapping
  • πŸ“± Real-time Processing: Fast image-to-audio conversion pipeline

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Google Cloud Platform account
  • Docker (for containerized deployment)

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/realive.git
    cd realive
  2. Install dependencies

    # Install main requirements
    pip install -r requirements.txt
    
    # Install PyTorch-specific requirements
    pip install -r pytorch_requirements.txt
    
    # Install spaCy requirements
    pip install -r spacy_requirements.txt
  3. Set up Google Cloud credentials

    # Download your GCP service account key
    # Place it in the project root as 'plasma-myth-365608-02f3f88329f7.json'
    export GOOGLE_APPLICATION_CREDENTIALS="plasma-myth-365608-02f3f88329f7.json"
  4. Download required models

    # Download the depth estimation model
    # Place 'depthmodel.h5' in the project root
  5. Run the application

    python main.py
  6. Access the web interface Open your browser and navigate to http://localhost:8000

πŸ—οΈ Architecture

System Overview

ReAlive follows a microservices architecture with the following components:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Web Frontend  β”‚    β”‚   Image2Text    β”‚    β”‚   Depth Map     β”‚
β”‚   (FastAPI)     │◄──►│   Service       β”‚    β”‚   Generator     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Sound Mapper  β”‚    β”‚   Google Cloud  β”‚    β”‚   Audio Mixer   β”‚
β”‚   & Synthesizer β”‚    β”‚   Storage       β”‚    β”‚   (Pydub)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

1. Main API Server (main.py)

  • FastAPI-based web server
  • Handles image uploads and processing requests
  • Orchestrates the entire pipeline
  • Returns processed videos and metadata

2. Image-to-Text Service (img2text/)

  • Flask-based microservice for image captioning
  • Uses pre-trained models for scene description
  • Deployed as containerized service

3. Depth Map Generator (depth_image_generator.py)

  • Monocular depth estimation using CNN
  • Generates depth maps for audio intensity mapping
  • Uses custom Keras model with BilinearUpSampling2D

4. Sound Mapper (sound_mapper_helpers.py)

  • Maps text descriptions to audio samples
  • Handles audio synthesis and mixing
  • Implements intensity-based volume control

5. Google Cloud Integration (gcp_helpers.py)

  • Manages file uploads and downloads
  • Handles cloud storage operations
  • Provides secure file access

πŸ“ Project Structure

realive/
β”œβ”€β”€ πŸ“„ main.py                          # Main FastAPI application
β”œβ”€β”€ πŸ“„ depth_image_generator.py         # Depth estimation module
β”œβ”€β”€ πŸ“„ sound_mapper_helpers.py          # Audio synthesis and mapping
β”œβ”€β”€ πŸ“„ gcp_helpers.py                   # Google Cloud Platform utilities
β”œβ”€β”€ πŸ“„ tokenizer.py                     # Text processing utilities
β”œβ”€β”€ πŸ“„ layers.py                        # Custom Keras layers
β”œβ”€β”€ πŸ“„ post_req_helper.py               # HTTP request utilities
β”œβ”€β”€ πŸ“ img2text/                        # Image-to-text microservice
β”‚   β”œβ”€β”€ πŸ“„ app.py                       # Flask application
β”‚   β”œβ”€β”€ πŸ“„ image_to_text.py             # Image captioning logic
β”‚   β”œβ”€β”€ πŸ“„ Dockerfile                   # Container configuration
β”‚   └── πŸ“„ start_app.sh                 # Startup script
β”œβ”€β”€ πŸ“ heatmap/                         # Depth visualization service
β”‚   β”œβ”€β”€ πŸ“„ app.py                       # Heatmap generation API
β”‚   β”œβ”€β”€ πŸ“„ depth_image_generator.py     # Depth processing
β”‚   └── πŸ“„ Dockerfile                   # Container configuration
β”œβ”€β”€ πŸ“ templates/                       # Web templates
β”‚   └── πŸ“„ index.html                   # Main web interface
β”œβ”€β”€ πŸ“ data/                            # Configuration and data
β”‚   β”œβ”€β”€ πŸ“„ __init__.py                  # Data module configuration
β”‚   └── πŸ“ databaseFiles/               # Audio datasets and mappings
β”‚       β”œβ”€β”€ πŸ“„ esc50.csv                # Audio dataset metadata
β”‚       β”œβ”€β”€ πŸ“„ intensityMap.json        # Audio intensity mappings
β”‚       └── πŸ“„ textMusicMapping.json    # Text-to-audio mappings
β”œβ”€β”€ πŸ“ test_images/                     # Sample images for testing
β”œβ”€β”€ πŸ“„ requirements.txt                 # Python dependencies
β”œβ”€β”€ πŸ“„ pytorch_requirements.txt         # PyTorch-specific dependencies
β”œβ”€β”€ πŸ“„ spacy_requirements.txt           # spaCy NLP dependencies
└── πŸ“„ LICENSE                          # MIT License

πŸ”§ API Documentation

Endpoints

GET /

  • Description: Main web interface
  • Response: HTML page with upload form

POST /upload/

  • Description: Process uploaded image and generate audio-video
  • Request: Multipart form with image file
  • Response: JSON with processing results
    {
      "img2text": "A beautiful landscape with mountains and trees",
      "linkToFinalVideo": "https://storage.googleapis.com/bucket/video.mp4",
      "linkToHeatMap": "https://storage.googleapis.com/bucket/depth.jpg"
    }

Request/Response Models

PipelineFinish

class PipelineFinish(BaseModel):
    img2text: str                    # Generated text description
    linkToFinalVideo: str           # URL to final video with audio
    linkToHeatMap: str              # URL to depth map visualization

🎯 Usage Examples

Basic Usage

  1. Upload an Image

    curl -X POST "http://localhost:8000/upload/" \
         -H "Content-Type: multipart/form-data" \
         -F "file=@your_image.jpg"
  2. Process via Python

    import requests
    
    with open('image.jpg', 'rb') as f:
        files = {'file': f}
        response = requests.post('http://localhost:8000/upload/', files=files)
        result = response.json()
        print(f"Description: {result['img2text']}")
        print(f"Video: {result['linkToFinalVideo']}")

Advanced Configuration

Custom Audio Intensity Mapping

# Modify intensityMap.json to adjust audio levels
{
  "bird": -2,
  "water": -1,
  "wind": -3,
  "traffic": 0
}

Depth Model Customization

# Adjust depth estimation parameters
def predict(model, images, minDepth=10, maxDepth=1000, batch_size=2):
    # Customize depth range and batch processing

πŸš€ Deployment

Docker Deployment

  1. Build containers

    # Build image-to-text service
    cd img2text
    docker build -t realive-img2text .
    
    # Build heatmap service
    cd ../heatmap
    docker build -t realive-heatmap .
  2. Deploy to Google Cloud Run

    # Deploy main application
    gcloud run deploy realive-main --source . --platform managed --region us-central1
    
    # Deploy microservices
    gcloud run deploy realive-img2text --image realive-img2text --platform managed
    gcloud run deploy realive-heatmap --image realive-heatmap --platform managed

Environment Variables

export GCP_BUCKET_NAME="your-bucket-name"
export IMG2TEXT_API="https://your-img2text-service.run.app/img2txt/"
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"

πŸ§ͺ Testing

Test Images

The test_images/ directory contains sample images for testing:

  • Landscape photos
  • Urban scenes
  • Nature images
  • Various lighting conditions

Running Tests

# Test image processing pipeline
python -c "
from main import upload
import os
test_image = 'test_images/photo-1502781252888-9143ba7f074e.jpeg'
with open(test_image, 'rb') as f:
    result = upload(f)
    print('Test completed successfully!')
"

πŸ”¬ Technical Details

Machine Learning Models

Depth Estimation

  • Model: Custom CNN with BilinearUpSampling2D
  • Input: RGB images (640x480)
  • Output: Depth maps for audio intensity mapping
  • Framework: Keras/TensorFlow

Image Captioning

  • Model: Pre-trained vision-language model
  • Input: RGB images
  • Output: Natural language descriptions
  • Framework: Transformers/Hugging Face

Audio Synthesis

  • Dataset: ESC-50 environmental sound classification
  • Mapping: Text-to-audio semantic matching
  • Processing: Pydub for audio manipulation
  • Output: Mixed stereo audio tracks

Performance Optimizations

  • Batch Processing: Efficient depth estimation with configurable batch sizes
  • Caching: Google Cloud Storage for processed files
  • Async Processing: Non-blocking file operations
  • Memory Management: Optimized image resizing and processing

🀝 Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

# Install development dependencies
pip install -r requirements.txt
pip install -r pytorch_requirements.txt
pip install -r spacy_requirements.txt

# Install pre-commit hooks
pre-commit install

# Run tests
python -m pytest tests/

πŸ“Š Performance Metrics

  • Processing Time: ~30-60 seconds per image
  • Supported Formats: JPEG, PNG, WebP
  • Max Image Size: 10MB
  • Audio Duration: 2-4 seconds
  • Video Output: MP4 (H.264)

πŸ› Troubleshooting

Common Issues

1. Model Loading Errors

# Ensure depth model is in project root
ls -la depthmodel.h5
# If missing, download from model repository

2. Google Cloud Authentication

# Verify credentials
gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"

3. Memory Issues

# Reduce batch size in depth_image_generator.py
def predict(model, images, batch_size=1):  # Reduced from 2

4. Audio Processing Errors

# Install additional audio codecs
sudo apt-get install ffmpeg
pip install pydub[scipy]

πŸ“ˆ Future Roadmap

  • Image Animation: Add subtle motion to static images
  • Augmented Reality: AR integration for immersive experiences
  • Real-time Processing: Optimize for faster response times
  • Mobile App: Native iOS and Android applications
  • Advanced Audio: 3D spatial audio and surround sound
  • Batch Processing: Multiple image processing capabilities
  • Custom Models: User-trainable audio synthesis models

πŸ“š Resources

Documentation

Related Projects

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ‘₯ Team

HackHarvard 2022 Team

  • Samanvya Tripathi - Lead Developer & Project Manager
  • Team Members - Full-stack development, ML engineering, Cloud architecture

πŸ™ Acknowledgments

  • HackHarvard 2022 for the amazing hackathon experience
  • Google Cloud Platform for providing the infrastructure and winning the "Best Use of Google Cloud" award
  • Open Source Community for the incredible tools and libraries
  • ESC-50 Dataset creators for the environmental sound data
  • Keras/TensorFlow team for the depth estimation models

πŸ“ž Contact


Made with ❀️ at HackHarvard 2022

Bringing memories to life, one photo at a time πŸŽ΅πŸ“Έ

Releases

No releases published

Packages

 
 
 

Contributors