Skip to content

feat: add configurable delay between same-provider accounts#856

Open
shaiu wants to merge 1 commit intodaniel-hauser:mainfrom
shaiu:feat/same-provider-delay
Open

feat: add configurable delay between same-provider accounts#856
shaiu wants to merge 1 commit intodaniel-hauser:mainfrom
shaiu:feat/same-provider-delay

Conversation

@shaiu
Copy link
Copy Markdown
Contributor

@shaiu shaiu commented Apr 8, 2026

Summary

  • Add configurable delay between consecutive accounts that use the same scraping provider to avoid rate limiting and session conflicts
  • Close browser contexts after each account scrape via finally block to free resources and prevent session bleed
  • New sameProviderDelayMs config option (default: 30000ms)

Motivation

When multiple accounts use the same bank provider (e.g., two Cal credit cards), scraping them back-to-back can trigger rate limiting or cause session conflicts where the second login fails with a 400 error. A configurable delay between same-provider accounts mitigates this.

Additionally, browser contexts were not being closed after scraping, which could lead to resource leaks and session bleed between accounts using the same provider.

Changes

  • src/scraper/index.ts — Track lastCompanyId to detect consecutive same-provider accounts. When detected, delay by sameProviderDelayMs before scraping. Wrap scrapeAccount in try/finally to ensure browserContext.close() is always called.
  • src/config.schema.ts — New sameProviderDelayMs field in ScrapingOptionsSchema (min: 0, max: 120000, default: 30000)
  • jest.config.js — Add diagnostics: false to ts-jest config to work around pre-existing type mismatches with israeli-bank-scrapers types

Test plan

  • Unit tests for delay logic (same provider, different providers, disabled with 0)
  • Unit tests for context cleanup (success, failure, close error handling)
  • npm run lint passes
  • npm run build passes
  • npm run test passes

Summary by CodeRabbit

  • New Features

    • Added configuration option to control delay between consecutive accounts using the same scraping provider. Default is 30 seconds; adjustable from 0-120 seconds or disabled by setting to 0.
  • Tests

    • Added comprehensive test coverage for account scraping functionality to ensure reliability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Warning

Rate limit exceeded

@shaiu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 36 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 12 minutes and 36 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c3a0a52b-adc0-4af9-a96f-612a1c91f88d

📥 Commits

Reviewing files that changed from the base of the PR and between aa6a42a and c22ca0c.

📒 Files selected for processing (4)
  • jest.config.js
  • src/config.schema.ts
  • src/scraper/index.test.ts
  • src/scraper/index.ts
📝 Walkthrough

Walkthrough

The PR introduces a configurable delay mechanism (sameProviderDelayMs) between consecutive accounts from the same scraping provider, updates the scraper to track provider consistency and apply delays accordingly, ensures proper browser context cleanup via try-finally blocks, and adds comprehensive test coverage for the new behavior.

Changes

Cohort / File(s) Summary
Build Configuration
jest.config.js
Added explicit transform configuration applying ts-jest with diagnostics: false to TypeScript files, supplementing the existing preset setup.
Schema Definition
src/config.schema.ts
Introduced sameProviderDelayMs field to ScrapingOptionsSchema—a number between 0 and 120000 milliseconds with a default of 30000, controlling the wait time between consecutive accounts sharing the same provider.
Scraper Implementation
src/scraper/index.ts
Modified scraping logic to track lastCompanyId across work items, apply conditional delays when consecutive accounts share the same provider (if sameProviderDelayMs > 0), create browser contexts upfront per account, and ensure cleanup via try-finally blocks with failure logging.
Test Suite
src/scraper/index.test.ts
Added comprehensive test suite covering browser context lifecycle (creation, closure, error handling), delay behavior across different and identical providers, and interaction with configuration values.

Sequence Diagram

sequenceDiagram
    participant WorkQueue as Work Queue
    participant Scraper as scrapeAccounts
    participant Config as Configuration
    participant Browser as Browser/Context
    participant Logger as Logger
    
    WorkQueue->>Scraper: Process account (companyId=A)
    Scraper->>Config: Read sameProviderDelayMs
    Note over Scraper: lastCompanyId = undefined<br/>(no delay)
    Scraper->>Browser: Create context for A
    Scraper->>Browser: Scrape account
    Browser-->>Scraper: Success
    Scraper->>Browser: Close context
    Note over Scraper: lastCompanyId = A
    
    WorkQueue->>Scraper: Process account (companyId=B)
    Scraper->>Config: Read sameProviderDelayMs
    Note over Scraper: lastCompanyId=A, new companyId=B<br/>(different provider, no delay)
    Scraper->>Browser: Create context for B
    Scraper->>Browser: Scrape account
    Browser-->>Scraper: Success
    Scraper->>Browser: Close context
    Note over Scraper: lastCompanyId = B
    
    WorkQueue->>Scraper: Process account (companyId=A)
    Scraper->>Config: Read sameProviderDelayMs
    alt sameProviderDelayMs > 0 and lastCompanyId == A
        Scraper->>Logger: Log delay event
        Scraper->>Scraper: Wait sameProviderDelayMs ms
    end
    Scraper->>Browser: Create context for A
    Scraper->>Browser: Scrape account
    Browser-->>Scraper: Success
    Scraper->>Browser: Close context
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hopping through accounts with timely grace,
Delays between providers keep the pace,
Contexts close with care and cleanup might,
Tests ensure each leap is light!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding a configurable delay feature for consecutive accounts sharing the same scraping provider.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@shaiu shaiu marked this pull request as draft April 8, 2026 21:44
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scraper/index.ts`:
- Around line 98-101: In the finally block where browserContext.close() is
awaited, forward any errors to the existing onError handler in addition to
logging: inside the catch for browserContext.close use onError?.(e,
"browserContext.close") so cleanup failures follow the same notification path as
browser.close(), while still calling logger("Failed to close browser context",
e); update the finally block that references browserContext.close, onError, and
logger accordingly.
- Around line 72-103: The current code only ensures browserContext.close() in an
inner finally, but if createSecureBrowserContext, scrapeAccount, or the status
callback throws the shared browser (variable browser) may never be closed; wrap
the entire scrape phase (the loggerContextStore.run(...) call and its await) in
an outer try/finally that always calls await browser.close().catch(...) to log
errors, while keeping the existing inner finally that closes browserContext;
reference createSecureBrowserContext, scrapeAccount, scrapeStatusChanged,
browserContext.close(), and browser.close() when making the change.
- Around line 57-70: The shared lastCompanyId marker in the parallelLimit worker
allows concurrent scrapes for the same company when parallelScrapers > 1;
replace it with a per-company serializer (e.g., a Map<string, Promise<void> |
Mutex/Queue) keyed by companyId) inside the worker created by parallelLimit so
each account task awaits that company's tail promise/lock before applying the
sameProviderDelayMs and running the scrape, then updates the company's tail
promise with a promise that resolves after the delay+scrape completes; keep
using parallelLimit for global concurrency control. Update the worker logic
referencing lastCompanyId, accounts.map(...) and sameProviderDelayMs to use the
per-company Map and add a regression test for parallelScrapers > 1 that enqueues
multiple accounts with the same companyId and asserts their scrapes do not
overlap.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8514a3b5-db31-4c04-ba70-9f0e92f0a35b

📥 Commits

Reviewing files that changed from the base of the PR and between 079716c and aa6a42a.

📒 Files selected for processing (4)
  • jest.config.js
  • src/config.schema.ts
  • src/scraper/index.test.ts
  • src/scraper/index.ts

Comment thread src/scraper/index.ts Outdated
Comment thread src/scraper/index.ts
Comment on lines +72 to +103
const browserContext = await createSecureBrowserContext(
browser,
companyId,
);

return loggerContextStore.run(
{ prefix: `[#${i} ${companyId}]` },
async () =>
scrapeAccount(
account,
{
browserContext: await createSecureBrowserContext(
browser,
async () => {
try {
return await scrapeAccount(
account,
{
browserContext,
startDate,
companyId,
),
startDate,
companyId,
futureMonthsToScrape: futureMonths,
storeFailureScreenShotPath: getFailureScreenShotPath(companyId),
additionalTransactionInformation,
includeRawTransaction,
...scraperOptions,
},
async (message, append = false) => {
status[i] = append ? `${status[i]} ${message}` : message;
return scrapeStatusChanged?.(status);
},
),
futureMonthsToScrape: futureMonths,
storeFailureScreenShotPath: getFailureScreenShotPath(companyId),
additionalTransactionInformation,
includeRawTransaction,
...scraperOptions,
},
async (message, append = false) => {
status[i] = append ? `${status[i]} ${message}` : message;
return scrapeStatusChanged?.(status);
},
);
} finally {
await browserContext.close().catch((e) => {
logger("Failed to close browser context", e);
});
}
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The browser still leaks on scrape failures.

This finally guarantees browserContext.close(), but if createSecureBrowserContext, scrapeAccount, or scrapeStatusChanged rejects, the function exits before the later browser.close() path runs. Please wrap the whole scrape phase in an outer try/finally so the shared browser is always closed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scraper/index.ts` around lines 72 - 103, The current code only ensures
browserContext.close() in an inner finally, but if createSecureBrowserContext,
scrapeAccount, or the status callback throws the shared browser (variable
browser) may never be closed; wrap the entire scrape phase (the
loggerContextStore.run(...) call and its await) in an outer try/finally that
always calls await browser.close().catch(...) to log errors, while keeping the
existing inner finally that closes browserContext; reference
createSecureBrowserContext, scrapeAccount, scrapeStatusChanged,
browserContext.close(), and browser.close() when making the change.

Comment thread src/scraper/index.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shaiu shaiu force-pushed the feat/same-provider-delay branch from aa6a42a to c22ca0c Compare April 8, 2026 21:58
@shaiu shaiu marked this pull request as ready for review April 8, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant