Skip to content

[Bug] Duplicate job URLs are not prevented in Postgres #79

@Divv1524

Description

@Divv1524

The aggregator service is expected to avoid storing duplicate job postings, but duplicate job URLs are not currently prevented at the database level.

In aggregator-service/db.py, the insert_job() query uses ON CONFLICT DO NOTHING, and the comment says duplicate URLs are silently dropped. However, the jobs table does not define a UNIQUE constraint or unique index on the url column.

Because of this, the same job URL can be inserted multiple times.

Expected behavior:

  • Non-null duplicate job URLs should be ignored.
  • The dashboard should not show the same job posting multiple times.
  • Stats and downstream workflows should not be affected by duplicate jobs.

Possible fix:

  • Add a partial unique index on jobs(url) where url IS NOT NULL.
  • Update/add integration tests for duplicate URL insertion.

Hi @sharmavaibhav31 ,

I would like to work on this issue as part of GSSoC 2026.
Please assign this issue to me. I’ll start working on it and provide updates accordingly.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions