feat: add configurable delay between same-provider accounts#856
feat: add configurable delay between same-provider accounts#856shaiu wants to merge 1 commit intodaniel-hauser:mainfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 12 minutes and 36 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThe PR introduces a configurable delay mechanism ( Changes
Sequence DiagramsequenceDiagram
participant WorkQueue as Work Queue
participant Scraper as scrapeAccounts
participant Config as Configuration
participant Browser as Browser/Context
participant Logger as Logger
WorkQueue->>Scraper: Process account (companyId=A)
Scraper->>Config: Read sameProviderDelayMs
Note over Scraper: lastCompanyId = undefined<br/>(no delay)
Scraper->>Browser: Create context for A
Scraper->>Browser: Scrape account
Browser-->>Scraper: Success
Scraper->>Browser: Close context
Note over Scraper: lastCompanyId = A
WorkQueue->>Scraper: Process account (companyId=B)
Scraper->>Config: Read sameProviderDelayMs
Note over Scraper: lastCompanyId=A, new companyId=B<br/>(different provider, no delay)
Scraper->>Browser: Create context for B
Scraper->>Browser: Scrape account
Browser-->>Scraper: Success
Scraper->>Browser: Close context
Note over Scraper: lastCompanyId = B
WorkQueue->>Scraper: Process account (companyId=A)
Scraper->>Config: Read sameProviderDelayMs
alt sameProviderDelayMs > 0 and lastCompanyId == A
Scraper->>Logger: Log delay event
Scraper->>Scraper: Wait sameProviderDelayMs ms
end
Scraper->>Browser: Create context for A
Scraper->>Browser: Scrape account
Browser-->>Scraper: Success
Scraper->>Browser: Close context
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/scraper/index.ts`:
- Around line 98-101: In the finally block where browserContext.close() is
awaited, forward any errors to the existing onError handler in addition to
logging: inside the catch for browserContext.close use onError?.(e,
"browserContext.close") so cleanup failures follow the same notification path as
browser.close(), while still calling logger("Failed to close browser context",
e); update the finally block that references browserContext.close, onError, and
logger accordingly.
- Around line 72-103: The current code only ensures browserContext.close() in an
inner finally, but if createSecureBrowserContext, scrapeAccount, or the status
callback throws the shared browser (variable browser) may never be closed; wrap
the entire scrape phase (the loggerContextStore.run(...) call and its await) in
an outer try/finally that always calls await browser.close().catch(...) to log
errors, while keeping the existing inner finally that closes browserContext;
reference createSecureBrowserContext, scrapeAccount, scrapeStatusChanged,
browserContext.close(), and browser.close() when making the change.
- Around line 57-70: The shared lastCompanyId marker in the parallelLimit worker
allows concurrent scrapes for the same company when parallelScrapers > 1;
replace it with a per-company serializer (e.g., a Map<string, Promise<void> |
Mutex/Queue) keyed by companyId) inside the worker created by parallelLimit so
each account task awaits that company's tail promise/lock before applying the
sameProviderDelayMs and running the scrape, then updates the company's tail
promise with a promise that resolves after the delay+scrape completes; keep
using parallelLimit for global concurrency control. Update the worker logic
referencing lastCompanyId, accounts.map(...) and sameProviderDelayMs to use the
per-company Map and add a regression test for parallelScrapers > 1 that enqueues
multiple accounts with the same companyId and asserts their scrapes do not
overlap.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8514a3b5-db31-4c04-ba70-9f0e92f0a35b
📒 Files selected for processing (4)
jest.config.jssrc/config.schema.tssrc/scraper/index.test.tssrc/scraper/index.ts
| const browserContext = await createSecureBrowserContext( | ||
| browser, | ||
| companyId, | ||
| ); | ||
|
|
||
| return loggerContextStore.run( | ||
| { prefix: `[#${i} ${companyId}]` }, | ||
| async () => | ||
| scrapeAccount( | ||
| account, | ||
| { | ||
| browserContext: await createSecureBrowserContext( | ||
| browser, | ||
| async () => { | ||
| try { | ||
| return await scrapeAccount( | ||
| account, | ||
| { | ||
| browserContext, | ||
| startDate, | ||
| companyId, | ||
| ), | ||
| startDate, | ||
| companyId, | ||
| futureMonthsToScrape: futureMonths, | ||
| storeFailureScreenShotPath: getFailureScreenShotPath(companyId), | ||
| additionalTransactionInformation, | ||
| includeRawTransaction, | ||
| ...scraperOptions, | ||
| }, | ||
| async (message, append = false) => { | ||
| status[i] = append ? `${status[i]} ${message}` : message; | ||
| return scrapeStatusChanged?.(status); | ||
| }, | ||
| ), | ||
| futureMonthsToScrape: futureMonths, | ||
| storeFailureScreenShotPath: getFailureScreenShotPath(companyId), | ||
| additionalTransactionInformation, | ||
| includeRawTransaction, | ||
| ...scraperOptions, | ||
| }, | ||
| async (message, append = false) => { | ||
| status[i] = append ? `${status[i]} ${message}` : message; | ||
| return scrapeStatusChanged?.(status); | ||
| }, | ||
| ); | ||
| } finally { | ||
| await browserContext.close().catch((e) => { | ||
| logger("Failed to close browser context", e); | ||
| }); | ||
| } | ||
| }, |
There was a problem hiding this comment.
The browser still leaks on scrape failures.
This finally guarantees browserContext.close(), but if createSecureBrowserContext, scrapeAccount, or scrapeStatusChanged rejects, the function exits before the later browser.close() path runs. Please wrap the whole scrape phase in an outer try/finally so the shared browser is always closed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scraper/index.ts` around lines 72 - 103, The current code only ensures
browserContext.close() in an inner finally, but if createSecureBrowserContext,
scrapeAccount, or the status callback throws the shared browser (variable
browser) may never be closed; wrap the entire scrape phase (the
loggerContextStore.run(...) call and its await) in an outer try/finally that
always calls await browser.close().catch(...) to log errors, while keeping the
existing inner finally that closes browserContext; reference
createSecureBrowserContext, scrapeAccount, scrapeStatusChanged,
browserContext.close(), and browser.close() when making the change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aa6a42a to
c22ca0c
Compare
Summary
finallyblock to free resources and prevent session bleedsameProviderDelayMsconfig option (default: 30000ms)Motivation
When multiple accounts use the same bank provider (e.g., two Cal credit cards), scraping them back-to-back can trigger rate limiting or cause session conflicts where the second login fails with a 400 error. A configurable delay between same-provider accounts mitigates this.
Additionally, browser contexts were not being closed after scraping, which could lead to resource leaks and session bleed between accounts using the same provider.
Changes
src/scraper/index.ts— TracklastCompanyIdto detect consecutive same-provider accounts. When detected, delay bysameProviderDelayMsbefore scraping. WrapscrapeAccountintry/finallyto ensurebrowserContext.close()is always called.src/config.schema.ts— NewsameProviderDelayMsfield inScrapingOptionsSchema(min: 0, max: 120000, default: 30000)jest.config.js— Adddiagnostics: falseto ts-jest config to work around pre-existing type mismatches withisraeli-bank-scraperstypesTest plan
npm run lintpassesnpm run buildpassesnpm run testpassesSummary by CodeRabbit
New Features
Tests