Skip to content

Feature Suggestion: RAG Mode for Large Documents - Cloud Implementation as Simpler Alternative (Cloud RAG Mode Suggestion) #52

@ghost

Description

Dear Chorus Development Team,

I'm writing to suggest adding RAG Mode functionality to help users process large documents more efficiently. After thinking through implementation approaches, I believe a cloud-based solution following your existing architecture patterns would be significantly simpler than local processing.


The Core Problem

I recently tried uploading a 40-50k token document through Chorus's multi-model interface. All three models failed simultaneously:

  • Claude Sonnet 4.5: Rate limit exceeded
  • Gemini 2.5 Flash: 429 Too Many Requests
  • GPT-5: Context limit reached

Large documents quickly hit API rate limits and become expensive to process repeatedly. The same file works fine when uploaded directly to claude.ai, but fails when routed through APIs with Chorus.

What users need: A way to extract only the relevant portions of large documents before sending to AI models, reducing both token usage and costs by 80-85%.


The Solution: RAG Mode (Two Possible Approaches)

RAG (Retrieval-Augmented Generation) extracts the most relevant information from documents based on user queries, sending only those chunks to the AI instead of the entire document.

There are two ways to implement this:


Option 1: Local RAG Processing

How it works:

  • Vector database runs on user's computer
  • Document processing happens on user's device
  • Everything stays private and local

Implementation requirements:

  • ❌ Development time: 2-3 months
  • ❌ Complex codebase: ~5,000+ lines
  • ❌ Install and manage local vector database (ChromaDB/LanceDB)
  • ❌ Handle cross-platform compatibility (Windows/Mac/Linux)
  • ❌ Test on various hardware configurations
  • ❌ Ongoing maintenance for different OS versions
  • ❌ Requires significant disk space and RAM from users
  • ❌ More complicated user setup process
  • ❌ Breaks from Chorus's current architecture pattern

Benefits:

  • ✅ Completely private (data never leaves device)
  • ✅ Free for users after setup
  • ✅ Fast local retrieval

Option 2: Cloud-Based RAG ProcessingRECOMMENDED

How it works:

  • Outsource to third-party RAG service (like Voyage AI, Jina AI)
  • User provides API key (exactly like Web Search feature)
  • Chorus routes requests, provider handles processing

Implementation requirements:

  • ✅ Development time: 1-2 weeks
  • ✅ Simple API integration: ~100-200 lines of code
  • ✅ Standard API calls (similar to Perplexity integration)
  • ✅ Works on any device immediately
  • ✅ No installation or setup complexity
  • ✅ Minimal ongoing maintenance
  • Follows Chorus's existing architecture exactly
  • ✅ Zero infrastructure costs for Chorus

Costs:

  • User pays: ~$0.005-0.02 per document processed
  • Chorus pays: $0 (user's API key)

Why Cloud RAG Follows Chorus's Existing Pattern Perfectly

I noticed Chorus already uses this exact approach for other features:

Current Architecture:

Web Search Feature:
├── User provides: Perplexity or OpenRouter API key
├── Chorus routes: User query → Perplexity API
├── Perplexity handles: Web search, content retrieval
├── Returns: Search results to Chorus
└── Cost: User pays Perplexity directly, $0 to Chorus

Web Fetching Feature:
├── Uses: Firecrawl.dev service
├── Chorus routes: URL → Firecrawl API
├── Firecrawl handles: Web scraping, content extraction
├── Returns: Page content to Chorus
└── Cost: Based on Firecrawl pricing, user pays

Image Generation:
├── User provides: OpenAI API key
├── Chorus routes: Prompt → OpenAI API
├── OpenAI handles: Image generation
└── Cost: User pays OpenAI directly

Proposed: RAG Mode (Same Pattern)

RAG Mode Feature:
├── User provides: Voyage AI or Jina AI API key
├── Chorus routes: Document → RAG service API
├── Service handles: Chunking, embedding, semantic search
├── Returns: Relevant chunks to Chorus
├── Chorus sends: Chunks → Claude/GPT (user's existing keys)
└── Cost: User pays RAG service directly, $0 to Chorus

It's literally the same architecture you already use successfully.


Comparison: Local vs Cloud

Aspect | Local RAG | Cloud RAG -- | -- | -- Development time | 2-3 months | 1-2 weeks Code complexity | ~5,000 lines | ~100-200 lines Infrastructure cost | $0 | $0 (user pays) User setup | Complex (install vector DB) | Simple (add API key) Maintenance | High (cross-platform) | Low (standard API) Compatibility | Device-dependent | Universal Follows current pattern | ❌ No | ✅ Yes Time to market | Months | Weeks

How Cloud RAG Would Work (User Experience)

Setup (One-Time):

User goes to: Tools → RAG Mode → [Set up]

┌─────────────────────────────────────────┐
│  RAG Mode Setup                         │
├─────────────────────────────────────────┤
│  This works just like Web Search:       │
│  • Choose a provider                     │
│  • Add your API key                      │
│  • Start using immediately               │
│                                          │
│  Recommended Provider:                   │
│  ● Voyage AI                            │
│    Cost: ~$0.005 per 1k tokens          │
│    Quality: Industry-leading            │
│    [Get API Key →]                       │
│                                          │
│  Alternative:                            │
│  ○ Jina AI (Free tier available)       │
│  ○ OpenAI (use existing key)           │
│                                          │
│  API Key: [____________________]         │
│                                          │
│  [Save]                                  │
└─────────────────────────────────────────┘

Usage:

1. User uploads 40k token document
   → Chorus sends to Voyage AI for processing
   → Document chunked, embedded, stored
   → Takes 2-3 seconds
   
2. User asks: "What's the Sharpe Ratio?"
   → Chorus queries Voyage AI: "Find relevant info"
   → Receives 3-5k tokens (instead of 40k)
   → Sends to Claude with user's Claude API key
   → Claude answers based on relevant chunks only
   
3. Result: 
   → 85% cost reduction
   → Stays within API rate limits
   → Fast responses

Suggested Implementation: Following Web Search Pattern

This would integrate into Tools exactly like your other outsourced features:

BUILT-IN:
├── Web ✓
│   Search the web and read webpages
│   Requires: Perplexity/OpenRouter API key
├── Terminal
├── Image Generator ✓
│   Generate images. Powered by OpenAI.
│   Requires: OpenAI API key
├── GitHub
│   Manage repos, code, issues, and PRs
└── RAG Mode [Set up →] ← NEW
    Process large documents efficiently
    Requires: Voyage AI/Jina AI API key
    Reduces API costs by 80-85%

Technical implementation would be nearly identical to how you integrated Perplexity:
// Setup: User adds API key (same as Perplexity)
const ragApiKey = userSettings.rag_provider_key;

// Document upload (similar to Firecrawl for web fetching)
async function processDocument(file) {
const response = await fetch('https://api.voyageai.com/v1/embed', {
method: 'POST',
headers: { 'Authorization': Bearer ${ragApiKey} },
body: JSON.stringify({ texts: chunkDocument(file) })
});
return response.json().document_id;
}

// Query document (similar to Perplexity search)
async function queryDocument(docId, question) {
const response = await fetch('https://api.voyageai.com/v1/search', {
method: 'POST',
headers: { 'Authorization': Bearer ${ragApiKey} },
body: JSON.stringify({ document_id: docId, query: question })
});

const relevantChunks = await response.json();

// Send to Claude (existing integration)
return sendToClaude(relevantChunks + question);
}

It's standard REST API integration, nothing fundamentally different from your current tools.


Cost Example (My Real Use Case)

My document: 40-50k tokens (11 pages, 502 data entries)
Typical usage: 10 questions about the document

Approach | Processing | 10 Queries | Total | Savings -- | -- | -- | -- | -- No RAG | $0 | $4.00 | $4.00 | - With Cloud RAG | $0.20 | $0.60 | $0.80 | 80%

Even with the cloud processing cost included, users save 80% and can handle documents of any size without hitting rate limits.


Why This Alternative Approach Makes Sense

Main advantages of cloud-based RAG:

  1. Follows your proven architecture - Same pattern as Web Search/Firecrawl
  2. Much faster to implement - 1-2 weeks vs 2-3 months
  3. Zero infrastructure costs - User pays provider directly
  4. No maintenance burden - Provider handles updates
  5. Works immediately - No complex user setup
  6. Universal compatibility - No device requirements
  7. Professional quality - Specialized RAG services do this all day

The trade-off:

  • Users pay small amount per document (~$0.20 for large docs)
  • Documents processed on third-party servers (like Perplexity for Web Search)

For most users: The convenience, simplicity, and 80% cost savings vs full-context processing makes this an excellent trade-off.


Recommended Next Steps

If this approach interests you:

  1. Week 1: Add RAG Mode toggle to Tools, implement API key storage
  2. Week 2: Integrate one provider (suggest Voyage AI), test with large documents
  3. Week 3: Beta test with interested users, gather feedback
  4. Week 4: Polish UI, add cost estimates, official release

Total timeline: ~1 month from start to release

Compare this to local RAG which would take 2-3 months of development plus ongoing cross-platform maintenance.


Conclusion

I believe cloud-based RAG following the Perplexity/Firecrawl pattern is the practical path forward:

  • ✅ Solves the core problem (large document processing)
  • ✅ Follows your existing, proven architecture
  • ✅ Minimal development time (1-2 weeks)
  • ✅ Zero infrastructure costs
  • ✅ Simple user experience
  • ✅ Professional implementation quality

This would make Chorus one of the few multi-model chat tools that can handle enterprise-scale documents efficiently, while keeping development simple and costs at zero for the company.

I'd be happy to beta test this feature and provide detailed feedback if you decide to implement it.

Thank you for considering this suggestion!

Best regards,
Ingvar

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions