Skip to content

Document Ada Lovelace support status and add official vLLM deployment guide for GPT-OSS-20B#4

Draft
Copilot wants to merge 4 commits into
mainfrom
copilot/add-ada-lovelace-support
Draft

Document Ada Lovelace support status and add official vLLM deployment guide for GPT-OSS-20B#4
Copilot wants to merge 4 commits into
mainfrom
copilot/add-ada-lovelace-support

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 13, 2025

Pull Request

Description

Ada Lovelace (RTX 6000 Ada) support for GPT-OSS-20B is in progress per vLLM team—not production-ready. Previous deployment failures were due to missing PyTorch +cu128 suffix and Ada architecture gaps.

This PR documents:

  • GPU support matrix (Ada Lovelace experimental, A100/H100 fully supported)
  • Critical version requirements: vLLM ≥0.10.2, PyTorch with +cu128 suffix, CUDA ≥12.8
  • Official vLLM flags: --async-scheduling, --tool-call-parser openai, --enable-auto-tool-choice
  • Qwen3-Coder-30B as stable production alternative (working on RTX 6000 Ada, 128K context)

Type of Change

  • Documentation update

Changes Made

New Documentation

  • docs/setup/GPT-OSS-VLLM-OFFICIAL-GUIDE.md - Official vLLM deployment guide
    • GPU support status matrix (fully supported vs experimental)
    • Version requirement table with verification commands
    • A100 vs Ada Lovelace deployment configurations
    • Troubleshooting for PyTorch +cu128 and CUDA version issues
  • docs/reference/GPT-OSS-VERIFICATION-CHECKLIST.md - Pre-deployment environment checks
    • Step-by-step verification commands
    • Pass/fail criteria for each requirement
    • Quick fixes for common issues

Configuration Updates

  • models/gptoss.sh - Conservative Ada Lovelace settings
    • Context reduced from 128K to 32K (stability)
    • GPU memory util set to 0.90 (official recommendation)
    • Ada Lovelace status warnings with version requirements
  • docs/troubleshooting/GPT-OSS-TROUBLESHOOTING.md - Enhanced error patterns
    • PyTorch CUDA version mismatch (torch::nvtoolsext linker error)
    • vLLM flag compatibility (requires ≥0.10.2 for --tool-call-parser)
    • Official configuration examples

User-Facing Updates

  • README.md - Ada Lovelace status warnings in Models section
  • QUICK-REFERENCE.md - Model comparison (GPT-OSS experimental vs Qwen3 stable)

Testing

  • Bash script syntax validated (bash -n)
  • Environment variable exports verified
  • Markdown formatting checked
  • No breaking changes to existing scripts

Checklist

  • My code follows the project's style guidelines
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings or errors
  • No secrets or API keys are hardcoded

Related Issues

Addresses issue documenting Ada Lovelace support status and vLLM deployment requirements.


Key Takeaway: Use Qwen3-Coder-30B for production until vLLM announces Ada Lovelace support. GPT-OSS deployment on RTX 6000 Ada may fail.

Original prompt

This section details on the original issue you should resolve

<issue_title>GPT-OSS-20B: Ada Lovelace support status and new deployment guide</issue_title>
<issue_description>## Summary
Official vLLM documentation now includes a comprehensive GPT-OSS deployment guide that may resolve our previous deployment issues.

Background

Previously attempted to deploy GPT-OSS-20B but encountered PyTorch dependency issues:

  • Error: torch==2.9.0.dev20250804+cu128 unavailable
  • Model was unavailable due to missing dependencies

New Information from vLLM Docs

Source: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html

GPU Support Status

  • Fully Supported: H100, H200, B200, AMD MI300x/MI325x/MI355x
  • 🔄 In Progress: Ampere, Ada Lovelace (RTX 6000 Ada), RTX 5090
  • Note: vLLM team is "actively working" on Ada Lovelace support

Version Requirements

  • vLLM >= 0.10.2 required for --tool-call-parser openai
  • PyTorch must have +cu128 suffix
  • CUDA >= 12.8 required

Recommended Deployment (A100 Instructions)

Since Ada Lovelace isn't fully supported yet, try A100 configuration:

# For GPT-OSS-20B (should work on single GPU)
vllm serve openai/gpt-oss-20b \
  --async-scheduling \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.90

Key Flags

  • --async-scheduling: Higher performance, reduces host overheads (vLLM >= 0.11.1 recommended)
  • --tool-call-parser openai: Native OpenAI tool calling format
  • --enable-auto-tool-choice: Automatic tool selection

Known Issues to Check

From the troubleshooting section:

  1. PyTorch version check:

    # Verify PyTorch has +cu128 suffix
    python -c "import torch; print(torch.__version__)"
  2. CUDA version check:

    # Must be >= 12.8 and same for installation/serving
    nvcc --version
  3. Error: "The link interface of target 'torch::nvtoolsext' contains: CUDA::nvToolsExt but the target was not found"

    • Solution: Verify PyTorch version has +cu128 suffix

Recommendation

Wait for official Ada Lovelace support announcement OR try A100 configuration with --async-scheduling flag.

Action Items

  • Verify vLLM version on RunPod (need >= 0.10.2)
  • Check PyTorch version has correct suffix
  • Try A100 configuration with async-scheduling
  • Monitor vLLM releases for Ada Lovelace support announcement
  • Consider using Qwen3-Coder-30B as current stable alternative

Related

  • Original deployment attempt documented in archive/docs/GPT-OSS-*.md
  • Current production model: Qwen3-Coder-30B (working, 128K context)

References

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits December 13, 2025 02:00
…guide

Co-authored-by: jsirish <1163643+jsirish@users.noreply.github.com>
Co-authored-by: jsirish <1163643+jsirish@users.noreply.github.com>
Co-authored-by: jsirish <1163643+jsirish@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for Ada Lovelace deployment guide Document Ada Lovelace support status and add official vLLM deployment guide for GPT-OSS-20B Dec 13, 2025
Copilot AI requested a review from jsirish December 13, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPT-OSS-20B: Ada Lovelace support status and new deployment guide

2 participants