Skip to content

Expand ingestion coverage for top cable brands (Shopify + non-Shopify) #96

@anand-testcompare

Description

@anand-testcompare

Why

Catalog quality is limited by source coverage. We need broader brand/vendor ingestion to improve identification accuracy and search relevance.

Outcome

Build a repeatable ingestion program for top cable companies across Shopify and non-Shopify sites.

In Scope

  • Define a target source list (top cable brands + key retailers).
  • Support both Shopify and non-Shopify source types.
  • Add source provenance fields (source URL, crawl/import timestamp, source identifier).
  • Add normalization + validation rules for core cable fields.
  • Add quality scoring/reporting per source (coverage + parse success + critical-field completeness).

Out of Scope

  • Perfect extraction of every long-tail product variant in one pass.
  • New UI features unrelated to source quality/coverage.

Implementation Plan

  1. Build prioritized source inventory (tier 1/2) with owner + cadence.
  2. Expand Shopify ingestion configs/connectors for selected Shopify brands.
  3. Add non-Shopify extractors (HTML/JSON-LD/API where available).
  4. Standardize normalization pipeline for connector/wattage/data/video/length fields.
  5. Add per-source validation + anomaly reporting.
  6. Add replayable seed process for preview and local QA.

Test Plan

  • convex: parser/normalizer fixture tests per source type.
  • convex: invariants for required fields + unknown handling (no silent coercion).
  • manual: run ingest for each tier-1 source and verify coverage + critical fields.

Acceptance Criteria

  • Tier-1 source list exists and is implemented.
  • Both Shopify and non-Shopify ingestion paths are running in CI-usable workflows.
  • Source-level quality report exists with parse success and critical-field completeness.
  • Ingest failures are explicit (no silent fallback behavior).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions