🌀 Vortex: The Revenue Recovery Engine

AI-Powered Cart Abandonment Recovery Platform
Real-time streaming • LLM personalization • Semantic search • A/B testing

Features • Tech Stack • Architecture • Skills • Quick Start

🎯 Problem Statement

With average cart abandonment rates at ~70%, e-commerce businesses lose the majority of potential sales.

Vortex solves this by:

Challenge	Solution
Late detection	⚡ Real-time event streaming via Azure Event Hub
Generic messages	🧠 AI-personalized recovery using Cerebras LLaMA 3.1
No optimization	🧪 A/B testing with statistical significance
Hard to find patterns	🔍 Semantic search with Voyage AI embeddings

🚀 Live Demo

👉 https://vortex-the-revenue-recovery-engine.streamlit.app/

The demo runs on generated sample data (1000 events) to showcase the platform's capabilities:

📊 Dashboard with KPIs and interactive charts
🤖 AI message generation (Cerebras LLaMA 3.1-8B)
🔍 Semantic session search (Voyage AI)
🧪 A/B testing framework with z-score analysis
🎮 Interactive cart abandonment simulator

✨ Features

📊 Executive Dashboard

KPI dashboard with dark theme and CSS animations. Uses generated sample data for demo.

Metric	Description
💰 Revenue at Risk	Total value from abandoned carts
🎯 Recoverable	Projected recovery at optimal timing
📈 Conversion Rate	Checkout success percentage
📉 6-Hour Rolling Average	Simulated baseline projection (rolling mean, not a real forecast)

🧪 A/B Testing Engine

Experiment framework for recovery messages (demo with simulated variants):

3 Experiments: Urgency vs Friendly, Discount vs Free Shipping, SMS vs Email
Statistical Significance: Z-score with 95% confidence intervals
Revenue Impact: Projected lift from winning variants

🤖 AI Recovery Messages

Personalized outreach powered by Cerebras Cloud (LLaMA 3.1-8B):

Customer archetype detection (Bargain Hunter, Premium Shopper, etc.)
Context-aware tone and urgency matching
Multi-channel support (SMS, Email, Push)
Webhook delivery simulation with animated console

🔍 Semantic Search

Find similar sessions using Voyage AI embeddings:

Natural language queries ("high-value electronics abandonments")
Cosine similarity scoring with color-coded results
Session analytics with conversion breakdown
Fallback keyword matching when API unavailable

⏱️ Recovery Analytics

Deep-dive timing analysis with ROI modeling:

5-min vs 24-hour recovery windows
Channel effectiveness comparison (SMS, Push, Email)
Interactive ROI calculator
Priority explainability with scoring breakdown

🎮 Interactive Simulator

Test the recovery engine yourself:

Build a shopping cart with real products
Simulate abandonment scenarios
Watch AI generate personalized recovery in real-time
Track session journey with webhook simulation

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                          DATA INGESTION                              │
├─────────────────────────────────────────────────────────────────────┤
│  E-commerce Events → Azure Event Hub → Delta Live Tables             │
│  (page_view, add_to_cart, checkout_success, cart_abandoned)          │
└─────────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    MEDALLION ARCHITECTURE                            │
├─────────────────────────────────────────────────────────────────────┤
│  Databricks + Delta Lake + dbt Core                                  │
│  ├── 🥉 Bronze: Raw event ingestion                                  │
│  ├── 🥈 Silver: Cleaned, validated, sessionized                      │
│  └── 🥇 Gold: Aggregated metrics + recovery candidates               │
└─────────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         AI SERVICES                                  │
├─────────────────────────────────────────────────────────────────────┤
│  Cerebras Cloud (LLaMA 3.1-8B)  →  Recovery message generation       │
│  Voyage AI (voyage-2)           →  Semantic embeddings + search      │
└─────────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       PRESENTATION LAYER                             │
├─────────────────────────────────────────────────────────────────────┤
│  Streamlit Dashboard (7 Interactive Tabs)                            │
│  ├── Dashboard       │  Recovery Analytics  │  A/B Testing           │
│  ├── Try It Yourself │  Recovery Queue      │  Semantic Search       │
│  └── Architecture                                                    │
└─────────────────────────────────────────────────────────────────────┘

🛠️ Tech Stack

Data Engineering

Component	Technology	Purpose
Streaming	Azure Event Hub	Real-time event ingestion
Lakehouse	Databricks + Delta Lake	ACID transactions, time travel
Transformation	dbt Core	SQL-based data modeling
Orchestration	Databricks Workflows	Pipeline scheduling
Architecture	Medallion (Bronze/Silver/Gold)	Data quality layers

AI/ML

Component	Technology	Purpose
LLM	Cerebras Cloud (LLaMA 3.1-8B)	Recovery message generation
Embeddings	Voyage AI (voyage-2)	Semantic search vectors
Similarity	Cosine Distance	Session matching
Statistics	Z-Score Testing	A/B experiment significance

Frontend

Component	Technology	Purpose
Framework	Streamlit 1.31+	Interactive dashboard
Visualization	Plotly Express	Charts and graphs
Styling	Custom CSS	Animations, dark theme
State	Streamlit Session State	Cross-component data

DevOps & Infrastructure

Component	Technology	Purpose
Hosting	Streamlit Cloud	App deployment
CI/CD	GitHub Actions	Linting and deploy
Version Control	Git + GitHub	Code management
Secrets	Streamlit Secrets / .env	Credential management

📈 Skills Demonstrated

Data Engineering

✅ Real-time event streaming (Azure Event Hub)
✅ Lakehouse architecture (Databricks + Delta Lake)
✅ Medallion pattern (Bronze → Silver → Gold)
✅ SQL transformations with dbt Core
✅ Delta Lake features (ACID, time travel, schema evolution)
✅ Data quality validation and session-based event tracking (session IDs assigned at generation)

Analytics & Statistics

✅ KPI dashboard development
✅ A/B testing with statistical significance (z-score, 95% CI)
✅ Conversion funnel analysis
✅ Cohort analysis by customer archetype
✅ Simulated forecast visualization
✅ ROI modeling and revenue attribution

AI/ML Engineering

✅ LLM integration (Cerebras Cloud API)
✅ Prompt engineering for personalization
✅ Vector embeddings (Voyage AI)
✅ Semantic similarity search
✅ Graceful fallback handling
✅ Customer archetype classification

Full-Stack Development

✅ Interactive web application (Streamlit)
✅ Data visualization (Plotly)
✅ Custom CSS animations and theming
✅ Real-time data simulation

DevOps & Best Practices

✅ CI/CD pipeline (GitHub Actions — lint + deploy)
✅ Environment variable management
✅ Secure credential handling

🚀 Quick Start

Prerequisites

Python 3.10+
(Optional) Cerebras API key for AI messages
(Optional) Voyage AI key for semantic search

Installation

# Clone the repository
git clone https://github.com/Mohith-akash/Vortex-The-Revenue-Recovery-Engine.git
cd Vortex-The-Revenue-Recovery-Engine

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the app
cd streamlit_app
streamlit run app.py

Environment Variables (Optional)

Create a .env file in the root directory:

# AI Services (optional - app works without these)
CEREBRAS_API_KEY=your_cerebras_key
VOYAGE_API_KEY=your_voyage_key

# Azure Event Hub (for production streaming)
AZURE_CONNECTION_STRING=your_connection_string
EVENT_HUB_NAME=vortex-events

# Databricks (for production data)
DBT_DATABRICKS_HOST=your_workspace.cloud.databricks.com
DBT_DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/your_warehouse
DBT_DATABRICKS_TOKEN=your_token

Note: The app works without any API keys using sample data and template-based message generation.

📁 Project Structure

Vortex-The-Revenue-Recovery-Engine/
├── .github/workflows/          # CI/CD pipeline
│   └── ci.yml                  # GitHub Actions workflow
├── databricks/                 # Databricks configuration
│   └── databricks.yml          # Asset bundle config
├── notebooks/                  # Databricks notebooks
│   ├── 01_dlt_pipeline.py      # Delta Live Tables
│   ├── 02_recovery_orchestration.py
│   ├── 03_time_travel_demo.py
│   ├── 04_dashboard_queries.sql
│   ├── 05_streaming_pipeline_no_dlt.py
│   └── 06_sample_data_setup.py
├── scripts/                    # Utility scripts
│   ├── traffic_generator.py    # Event simulation
│   ├── recovery_tracker.py     # Recovery monitoring
│   ├── databricks_consumer.py  # Event consumer
│   └── heartbeat.py            # Health check
├── streamlit_app/              # Main application
│   ├── app.py                  # Dashboard (7 tabs)
│   ├── ai_recovery.py          # Cerebras integration
│   ├── semantic_search.py      # Voyage AI search
│   ├── data_generator.py       # Sample data
│   └── requirements.txt        # App dependencies
├── vortex_analytics/           # dbt project
│   ├── dbt_project.yml         # dbt configuration
│   ├── models/                 # SQL models
│   └── profiles.yml            # Connection profiles
├── pyproject.toml              # Python project config
├── requirements.txt            # Root dependencies
├── LICENSE                     # MIT License
└── README.md                   # This file

📊 Key Metrics (Demo Data)

Metric	Value
Sample Events	1,000+
Abandoned Carts	~300
Recovery Rate	32% (at 5-min response)
A/B Test Confidence	95%
Customer Archetypes	5 types
Product Categories	7 categories

🔐 Security

✅ API keys stored in environment variables
✅ .env file excluded from git
✅ Streamlit secrets for cloud deployment
✅ No hardcoded credentials in source code
✅ Databricks secret scopes for production

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Mohith Akash

⭐ Star this repo if you found it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
databricks		databricks
notebooks		notebooks
scripts		scripts
streamlit_app		streamlit_app
vortex_analytics		vortex_analytics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🌀 Vortex: The Revenue Recovery Engine

🎯 Problem Statement

🚀 Live Demo

✨ Features

📊 Executive Dashboard

🧪 A/B Testing Engine

🤖 AI Recovery Messages

🔍 Semantic Search

⏱️ Recovery Analytics

🎮 Interactive Simulator

🏗️ Architecture

🛠️ Tech Stack

Data Engineering

AI/ML

Frontend

DevOps & Infrastructure

📈 Skills Demonstrated

Data Engineering

Analytics & Statistics

AI/ML Engineering

Full-Stack Development

DevOps & Best Practices

🚀 Quick Start

Prerequisites

Installation

Environment Variables (Optional)

📁 Project Structure

📊 Key Metrics (Demo Data)

🔐 Security

🤝 Contributing

📄 License

👤 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages