Skip to content

feat: patchright browser fallback to bypass Cloudflare#80

Merged
dreamiurg merged 4 commits into
mainfrom
feat/patchright-cloudflare-bypass
May 22, 2026
Merged

feat: patchright browser fallback to bypass Cloudflare#80
dreamiurg merged 4 commits into
mainfrom
feat/patchright-cloudflare-bypass

Conversation

@dreamiurg

Copy link
Copy Markdown
Owner

Who's affected

End users of the peakbagger CLI — every peak/ascent command currently returns 403 because PeakBagger.com sits behind a Cloudflare managed challenge that cloudscraper cannot solve.

What changed

Before: cloudscraper requests to search.aspx, peak.aspx, /climber/ascent.aspx, etc. returned 403 with cf-mitigated: challenge (the "Just a moment..." Turnstile page). No tuning (User-Agent, headers, browser/platform, delays) helped — cloudscraper only solves the legacy IUAM challenge, not the modern managed one.

After: the HTTP client gains a lazy stealth-browser fallback. On a challenge response it drives a real, headful Chrome via patchright (a stealth-patched Playwright) to clear the challenge, harvests the cf_clearance cookie + matching User-Agent, injects them into the existing cloudscraper session, and retries. The clearance is cached to ~/.cache/peakbagger/ and reused across runs until it expires, so the browser launches only occasionally. Rate limiting, parsing, and output formatting are unchanged.

patchright is an optional browser extra (pip install 'peakbagger[browser]'); without it the CLI behaves as before and emits a clear install hint when a challenge is hit.

Verified live end-to-end: peak search, peak show, peak stats, peak ascents, and ascent show all return real data for Mount Shuksan (pid 1630) — one browser solve, then all subsequent requests reuse the cached cookie.

Notable decisions

  • patchright over plain Playwright or browser-use. Vanilla Playwright (even with manual stealth flags) is detected by Cloudflare via the CDP Runtime.enable leak; browser-use works but is heavyweight (~100 deps + LLM agent layer). patchright is the minimal stealth engine that passes, with a standard Playwright API.
  • Headful only. Headless is detected regardless of engine (HeadlessChrome UA + other signals), so the solve runs with a visible window. On headless CI/servers it needs xvfb.
  • Hybrid, not full-browser. cf_clearance is not strictly fingerprint-bound here, so handing it to cloudscraper works — keeping requests fast and rate-limited rather than routing everything through a browser.
  • Removed the custom CLI User-Agent (it broke cloudscraper's browser fingerprint and was never necessary).
  • Drive-by: wrapped bs4 find() href/src lambda predicates in bool() to satisfy the type checker (pre-existing ty findings in scraper.py).

Agent context

  • Contracts changed: PeakBaggerClient.get() now solves-and-retries once on a Cloudflare challenge (guarded by _allow_solve to prevent loops); new peakbagger/browser_transport.py mints/caches clearance.
  • Config: new optional browser extra (patchright); cache at $XDG_CACHE_HOME/peakbagger/ (cloudflare_clearance.json, chrome-profile/); env PEAKBAGGER_BROWSER_CHANNEL overrides the Chrome channel.
  • Scope boundary: does not touch scraping/parsing logic, models, formatters, statistics, or CLI command surface.
  • Known gaps: headful-only (no xvfb wiring); single-instance (concurrent solves contend on the Chrome profile dir); cached cookie lifetime bounded by Cloudflare (~minutes to ~1h).
  • Rollback risk: safe to revert — fallback is additive and gated behind the optional extra.

PeakBagger.com now serves a Cloudflare managed challenge (cf-mitigated:
challenge) on its data endpoints that cloudscraper cannot solve, so every
request returned 403. Add a stealth-browser transport: on a challenge,
lazily drive headful patchright to mint a cf_clearance cookie, cache it to
~/.cache/peakbagger, inject it into the existing cloudscraper session, and
retry. Subsequent requests reuse the cached cookie until it expires, so the
browser runs only occasionally and rate limiting/parsing are unchanged.

patchright is an optional "browser" extra; the custom CLI User-Agent (which
broke the browser fingerprint) is removed. Also wraps bs4 find() href/src
predicates in bool() to satisfy the type checker.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Rate limit exceeded

@dreamiurg has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 8 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: edb73e0f-f84f-4a53-b9c9-9a621545fb45

📥 Commits

Reviewing files that changed from the base of the PR and between 8329d54 and dfef42b.

📒 Files selected for processing (9)
  • .pre-commit-config.yaml
  • README.md
  • peakbagger/browser_transport.py
  • peakbagger/client.py
  • peakbagger/scraper.py
  • pyproject.toml
  • tests/conftest.py
  • tests/test_browser_transport.py
  • tests/test_client.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.77419% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.43%. Comparing base (5fb7cc9) to head (dfef42b).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
peakbagger/client.py 95.52% 0 Missing and 3 partials ⚠️
peakbagger/browser_transport.py 97.40% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #80      +/-   ##
==========================================
+ Coverage   93.07%   93.43%   +0.35%     
==========================================
  Files           9       10       +1     
  Lines        1718     1857     +139     
  Branches      224      239      +15     
==========================================
+ Hits         1599     1735     +136     
  Misses         84       84              
- Partials       35       38       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dreamiurg and others added 3 commits May 22, 2026 01:37
The Complexity gate ran only in CI, so over-limit functions were caught
late. Mirror the CI invocation as a local pre-commit hook to catch them
before push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Inject a fake patchright.sync_api to exercise the browser-driving paths
(page reuse, navigate-until-cleared success/timeout, cf_clearance
verification) without a real browser. Lifts browser_transport coverage
from ~64% to ~98% and resolves the codecov patch gap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dreamiurg dreamiurg marked this pull request as ready for review May 22, 2026 09:00
@dreamiurg dreamiurg merged commit 44edc1e into main May 22, 2026
19 checks passed
@dreamiurg dreamiurg deleted the feat/patchright-cloudflare-bypass branch May 22, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant