Skip to content

[CRITICAL] Extreme slowdown when scraping multiple platforms with 'hard' difficulty #56

@AVPthegreat

Description

@AVPthegreat

Problem: When selecting all platforms + 'hard' difficulty, response time becomes unacceptably slow (multiple minutes or timeouts). App appears to hang.\n\nImpact: Users cannot scrape comprehensive problem sets. Forced to use limited queries or single platforms.\n\nSymptoms:\n- Selecting 1-2 platforms: completes in reasonable time (<30s)\n- Selecting all platforms + hard: takes 5+ minutes or times out\n- No incremental feedback during long operations\n- Browser may show 'page unresponsive' warnings\n\nRoot Causes:\n- Sequential scraping (no concurrency between platforms)\n- No request rate limiting causes IP blocks/slowdowns\n- Large result sets processed in-memory without streaming\n- Blocking I/O in web thread\n\nAcceptance Criteria:\n- ✅ All platforms + hard difficulty completes in <2 minutes\n- ✅ Progress updates visible during scraping (not stuck at 0%)\n- ✅ Implement concurrent scraping with MAX_CONCURRENT workers\n- ✅ Add rate limiting per platform (avoid IP blocks)\n- ✅ Show estimated time remaining based on progress\n- ✅ Consider pagination/streaming for large result sets\n\nPriority: P0 - Severely impacts usability

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Critical prioritybackendServer, API, orchestratorbugSomething isn't workingperformancePerformance improvementsscraperPlatform scrapers

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions