Problem
The current [ProcurementScout] iterates through 20+ local government sources sequentially. While this naturally spaces out requests between different domains, individual connectors that implement pagination (looping through multiple pages of results) can hit a single server with rapid-fire requests. This risks triggering Web Application Firewalls (WAF) or getting our crawler IP banned by smaller municipal servers.
Proposed Solution
Implement a centralized RateLimiter utility or decorator that enforces a mandatory "politeness delay" between HTTP requests to the same host.
Technical Specifications
- Algorithm: Token Bucket or simple "Sleep-until" mechanism.
- Default Policy: 1 request per 2.0 seconds per domain.
- Handling 429s: If a server responds with
HTTP 429 Too Many Requests, the connector should respect the Retry-After header or apply exponential backoff.
Tasks
Problem
The current [ProcurementScout] iterates through 20+ local government sources sequentially. While this naturally spaces out requests between different domains, individual connectors that implement pagination (looping through multiple pages of results) can hit a single server with rapid-fire requests. This risks triggering Web Application Firewalls (WAF) or getting our crawler IP banned by smaller municipal servers.
Proposed Solution
Implement a centralized
RateLimiterutility or decorator that enforces a mandatory "politeness delay" between HTTP requests to the same host.Technical Specifications
HTTP 429 Too Many Requests, the connector should respect theRetry-Afterheader or apply exponential backoff.Tasks
govsignal/utils/rate_limiter.py.@rate_limit(calls=1, period=2)decorator._fetch_datamethods in [govsignal/local_connectors.py].