AlphaStack

AI-powered project generator that transforms natural language descriptions into complete, production-ready codebases with Docker configurations and automated testing.

📄 Paper submitted to ICML 2026
A novel approach to autonomous code generation using multi-agent systems with iterative self-healing and comprehensive validation across diverse programming paradigms.

🎯 Key Features

Intelligent Multi-Agent Architecture

Planning Agent: Analyzes errors and generates comprehensive fix strategies using tool-augmented reasoning
Correction Agent: Executes fixes with code understanding and validation
Iterative Self-Healing: Automatically detects and resolves dependency conflicts, build errors, and test failures

Comprehensive Code Generation

Natural language to production-ready code
Multi-file project generation with proper structure
Support for modern languages and frameworks
Intelligent dependency resolution
Best practices and design patterns

Docker-Based Validation

Automated Docker container creation
Isolated build and test environments
Resource-managed execution (configurable CPU/memory limits)
Complete validation pipeline from build to test execution

Extensive Evaluation Framework

40 Programming Challenges across 4 languages:
- CUDA: GPU computing and parallel algorithms (10 challenges)
- Go: Concurrent systems and distributed computing (10 challenges)
- Rust: Memory-safe systems programming (10 challenges)
- TypeScript: Type-safe applications and frameworks (10 challenges)
4-Tier Difficulty System: From fundamentals to production systems
Comprehensive benchmarking and metrics collection

How It Works

graph LR
    A[Natural Language Input] --> B[AI Analysis & Blueprint]
    B --> C[Multi-File Code Generation]
    C --> D[Dependency Resolution]
    D --> E[Docker Configuration]
    E --> F[Build Validation]
    F --> G{Build Success?}
    G -->|No| H[Planning Agent]
    H --> I[Correction Agent]
    I --> F
    G -->|Yes| J[Test Execution]
    J --> K{Tests Pass?}
    K -->|No| H
    K -->|Yes| L[Production-Ready Project]

    style A fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff
    style B fill:#9B59B6,stroke:#6C3483,stroke-width:2px,color:#fff
    style C fill:#E67E22,stroke:#A04000,stroke-width:2px,color:#fff
    style D fill:#3498DB,stroke:#1F618D,stroke-width:2px,color:#fff
    style E fill:#1ABC9C,stroke:#117A65,stroke-width:2px,color:#fff
    style F fill:#E74C3C,stroke:#922B21,stroke-width:2px,color:#fff
    style L fill:#27AE60,stroke:#186A3B,stroke-width:2px,color:#fff

Architecture Components

Core Generation Pipeline:

Blueprint Generation: Analyzes requirements and creates software architecture
Folder Structure: Generates project hierarchy with proper organization
File Generation: Creates all necessary files with content (source, config, tests, docs)
Metadata Management: Tracks dependencies, entry points, and test commands

Intelligent Error Resolution:

Error Tracking: Monitors all errors across build and test phases
Tool-Augmented Planning: Uses file operations, command execution, and analysis tools
Context-Aware Fixes: Understands project structure and dependencies
Iterative Refinement: Continues until success or max iterations reached

Validation & Testing:

Docker Isolation: Sandboxed build and test environments
Command Detection: Automatically identifies build/test commands
Log Analysis: Extracts and analyzes error messages
Success Verification: Validates complete pipeline execution

Installation

Requirements:

Python 3.9+
Google Gemini API Key
Docker (optional, for validation)

# Clone and install
git clone https://github.com/HyperKuvid-Labs/alpha-stack.git
cd alpha-stack
pip install .

# Configure API key
alphastack setup

Docker Installation (Recommended):

# Install Docker Engine (Linux)
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Or via package manager (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Usage

Interactive Mode:

alphastack
# Follow the interactive prompts to generate your project

Command Line:

# Generate a project
alphastack generate "A Flask REST API with user authentication and JWT tokens"

# Specify output directory
alphastack generate "Python CLI tool for file processing" -o /path/to/output

# Generate with custom name
alphastack generate "React TypeScript dashboard with charts"

# List generated projects
alphastack list

# Clean up projects
alphastack clean

Example Projects:

# Web Applications
alphastack generate "Express.js REST API with MongoDB and authentication"
alphastack generate "FastAPI service with PostgreSQL and async operations"

# CLI Tools
alphastack generate "Python CLI tool for image compression with progress bar"
alphastack generate "Go CLI for log analysis with concurrent processing"

# Data Processing
alphastack generate "Rust program for parallel CSV processing"
alphastack generate "Python script for web scraping with retry logic"

# System Programming
alphastack generate "CUDA kernel for matrix multiplication optimization"
alphastack generate "Go service with gRPC and protocol buffers"

🔬 Research & Evaluation

Evaluation Suite

AlphaStack includes a comprehensive evaluation framework with 40 carefully designed programming challenges across 4 modern languages, organized into 4 difficulty tiers:

CUDA (GPU Computing)

Focus: Parallel computing, memory management, kernel optimization
Challenges: Vector operations → Matrix operations → Sparse algorithms → Ray tracing engines
Tier 4 Example: Ray tracing engine with BVH acceleration structure

Go (Concurrent Systems)

Focus: Distributed systems, goroutines, channels, service architecture
Challenges: Worker pools → REST APIs → Load balancers → Raft consensus
Tier 4 Example: Full Raft consensus protocol implementation

Rust (Systems Programming)

Focus: Memory safety, ownership, lifetimes, zero-cost abstractions
Challenges: Custom iterators → HTTP parsers → Procedural macros → Custom allocators
Tier 4 Example: Custom bump allocator as global allocator with FFI

TypeScript (Type-Safe Applications)

Focus: Type system, generics, inference, compile-time safety
Challenges: Event emitters → Type-safe routers → DI containers → Full-stack RPC
Tier 4 Example: End-to-end type-safe RPC framework with inference

Difficulty Progression

Tier	Focus	Complexity	Lines of Code	Time
Tier 1	Fundamentals	Single concept, basic algorithms	150-400	2-4h
Tier 2	Architecture	Multiple modules, abstractions	400-700	4-8h
Tier 3	Advanced	Domain expertise, algorithms	500-900	8-16h
Tier 4	Production	Complete systems, optimization	800-1500	16-32h

Evaluation Metrics

Success Rate: Percentage of challenges solved correctly
Build Success: Projects that compile/build without errors
Test Pass Rate: Projects with passing test suites
Iteration Count: Average iterations needed for error resolution
Time to Solution: End-to-end generation time
Code Quality: Adherence to best practices and patterns

Evaluation Location: src/prompts/eval/ contains all challenge specifications and test cases.

🏗️ Project Structure

alpha-stack/
├── src/
│   ├── agents/                  # Multi-agent system
│   │   ├── planner.py          # Planning agent for error analysis
│   │   └── corrector.py        # Correction agent for fixes
│   ├── docker/                  # Docker integration
│   │   ├── generator.py        # Dockerfile generation
│   │   └── testing.py          # Docker-based validation
│   ├── prompts/                 # Jinja2 prompt templates
│   │   └── eval/               # Evaluation challenges
│   │       ├── cuda/           # 10 CUDA challenges
│   │       ├── go/             # 10 Go challenges
│   │       ├── rust/           # 10 Rust challenges
│   │       └── typescript/     # 10 TypeScript challenges
│   ├── utils/                   # Core utilities
│   │   ├── helpers.py          # Helper functions
│   │   ├── prompt_manager.py   # Template management
│   │   ├── error_tracker.py    # Error tracking
│   │   └── tools.py            # Tool definitions
│   ├── generator.py             # Main generation logic
│   ├── eval_generator.py        # Evaluation system
│   ├── cli.py                   # Command-line interface
│   ├── tui.py                   # Terminal UI
│   └── config.py                # Configuration management
├── website/                     # Project website
├── test_runner.py               # Development test runner
└── pyproject.toml              # Project metadata

🔧 Technical Details

AI Model

Primary Model: Google Gemini (configurable via MODEL_NAME)
Alternative Support: OpenRouter API for evaluation framework
Context Management: Intelligent prompt engineering with Jinja2 templates

Multi-Agent System

Planning Agent (src/agents/planner.py):

Analyzes build/test errors using structured error tracking
Generates comprehensive fix plans with tool-based reasoning
Maintains project structure cache for efficient planning
Supports different error types (dependency, docker, common errors)

Correction Agent (src/agents/corrector.py):

Executes planned fixes with code understanding
Validates code changes before application
Uses language-specific parsers for syntax validation
Tracks changes to prevent infinite loops

Docker Integration

Features:

Automatic Dockerfile generation based on project type
Multi-stage builds for optimized images
Resource management (configurable CPU/memory limits)
Network isolation and security
Support for custom base images

Testing Framework (src/docker/testing.py):

Command detection (build, test, run commands)
Real-time log capture and analysis
Iterative error resolution with max iteration limits
Success/failure validation with detailed reporting

Prompt Engineering

Template System:

Jinja2-based prompt templates for consistency
Context-aware prompt rendering
Specialized templates for different generation phases:
- Software blueprint generation
- Folder structure planning
- File content generation
- Error correction strategies
- Docker configuration

📊 Performance & Capabilities

Generation Capabilities

Languages: Python, JavaScript/TypeScript, Go, Rust, Java, C/C++, CUDA, and more
Frameworks: Flask, FastAPI, Express.js, React, Vue, Next.js, etc.
Project Types: Web APIs, CLI tools, data processors, system utilities, GPU kernels
File Types: Source code, configuration, tests, documentation, Docker files

Self-Healing Iterations

Dependency Resolution: Automatically resolves missing packages and version conflicts
Build Fixes: Corrects syntax errors, import issues, configuration problems
Test Fixes: Addresses failing tests, missing test dependencies, assertion errors
Max Iterations: Configurable (default: 5 per phase)

Docker Validation

Build Time: Typically 1-5 minutes depending on project complexity
Test Execution: Isolated environment with resource limits
Success Rate: High success rate on Tier 1-2 challenges (>80%)
Resource Usage: Configurable memory (default: 25% of system) and CPU (default: 50%)

🎓 Academic Context

This work introduces a novel approach to autonomous code generation that addresses key challenges in AI-assisted software development:

Key Contributions

Multi-Agent Architecture: Separation of planning and correction concerns for better error resolution
Iterative Self-Healing: Autonomous error detection and correction without human intervention
Comprehensive Validation: End-to-end validation from build to test execution in isolated environments
Cross-Language Evaluation: Diverse evaluation suite spanning different programming paradigms
Tool-Augmented Reasoning: Integration of file operations and command execution for context-aware fixes

Research Questions

How effectively can multi-agent systems autonomously resolve software errors?
What is the success rate across different programming paradigms and difficulty levels?
How many iterations are typically required for convergence to a working solution?
What types of errors can be automatically resolved vs. requiring human intervention?

Evaluation Methodology

The evaluation framework (src/prompts/eval/) provides a standardized benchmark with:

40 challenges across 4 languages and 4 difficulty tiers
Clear success criteria (build success, test pass rate)
Reproducible evaluation in Docker containers
Metrics for iteration count, time to solution, and code quality

For more details on the evaluation suite, see src/prompts/eval/README.md

🤝 Contributing

We welcome contributions! Areas of interest:

Additional programming language support
New evaluation challenges
Performance optimizations
Documentation improvements
Bug fixes and error handling

📜 License

MIT License - see LICENSE file for details

🔗 Links

Repository: github.com/HyperKuvid-Labs/alpha-stack
Issues: github.com/HyperKuvid-Labs/alpha-stack/issues
Evaluation Suite: src/prompts/eval/

📧 Contact

For research collaborations or questions about the ICML 2026 submission, please open an issue or contact the AlphaStack Team.

AlphaStack - Transforming Ideas into Code

Submitted to ICML 2026

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
eval		eval
src		src
src_pi		src_pi
website		website
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check_mbpp_ids.py		check_mbpp_ids.py
install.bat		install.bat
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
test_runner.py		test_runner.py

Folders and files

Latest commit

History

Repository files navigation

AlphaStack

🎯 Key Features

Intelligent Multi-Agent Architecture

Comprehensive Code Generation

Docker-Based Validation

Extensive Evaluation Framework

How It Works

Architecture Components

Installation

Usage

🔬 Research & Evaluation

Evaluation Suite

CUDA (GPU Computing)

Go (Concurrent Systems)

Rust (Systems Programming)

TypeScript (Type-Safe Applications)

Difficulty Progression

Evaluation Metrics

🏗️ Project Structure

🔧 Technical Details

AI Model

Multi-Agent System

Docker Integration

Prompt Engineering

📊 Performance & Capabilities

Generation Capabilities

Self-Healing Iterations

Docker Validation

🎓 Academic Context

Key Contributions

Research Questions

Evaluation Methodology

🤝 Contributing

📜 License

🔗 Links

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages