Skip to content

parallelize update github reads #89

@reobin

Description

@reobin

the update job still fetches repositories from github one at a time. after DB writes are batched, github/network latency will likely become the next major bottleneck.

  • add a small worker pool for github repository reads, likely 5-10 workers
  • preserve existing 404 delete handling and error counts
  • handle primary and secondary github rate limits with backoff
  • keep database writes batched after results are collected
  • verify update duration and github error behavior in ecs logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions