feat: patchright browser fallback to bypass Cloudflare#80
Conversation
PeakBagger.com now serves a Cloudflare managed challenge (cf-mitigated: challenge) on its data endpoints that cloudscraper cannot solve, so every request returned 403. Add a stealth-browser transport: on a challenge, lazily drive headful patchright to mint a cf_clearance cookie, cache it to ~/.cache/peakbagger, inject it into the existing cloudscraper session, and retry. Subsequent requests reuse the cached cookie until it expires, so the browser runs only occasionally and rate limiting/parsing are unchanged. patchright is an optional "browser" extra; the custom CLI User-Agent (which broke the browser fingerprint) is removed. Also wraps bs4 find() href/src predicates in bool() to satisfy the type checker. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #80 +/- ##
==========================================
+ Coverage 93.07% 93.43% +0.35%
==========================================
Files 9 10 +1
Lines 1718 1857 +139
Branches 224 239 +15
==========================================
+ Hits 1599 1735 +136
Misses 84 84
- Partials 35 38 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The Complexity gate ran only in CI, so over-limit functions were caught late. Mirror the CI invocation as a local pre-commit hook to catch them before push. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Inject a fake patchright.sync_api to exercise the browser-driving paths (page reuse, navigate-until-cleared success/timeout, cf_clearance verification) without a real browser. Lifts browser_transport coverage from ~64% to ~98% and resolves the codecov patch gap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Who's affected
End users of the
peakbaggerCLI — everypeak/ascentcommand currently returns403because PeakBagger.com sits behind a Cloudflare managed challenge thatcloudscrapercannot solve.What changed
Before:
cloudscraperrequests tosearch.aspx,peak.aspx,/climber/ascent.aspx, etc. returned403withcf-mitigated: challenge(the "Just a moment..." Turnstile page). No tuning (User-Agent, headers, browser/platform, delays) helped —cloudscraperonly solves the legacy IUAM challenge, not the modern managed one.After: the HTTP client gains a lazy stealth-browser fallback. On a challenge response it drives a real, headful Chrome via patchright (a stealth-patched Playwright) to clear the challenge, harvests the
cf_clearancecookie + matching User-Agent, injects them into the existingcloudscrapersession, and retries. The clearance is cached to~/.cache/peakbagger/and reused across runs until it expires, so the browser launches only occasionally. Rate limiting, parsing, and output formatting are unchanged.patchrightis an optionalbrowserextra (pip install 'peakbagger[browser]'); without it the CLI behaves as before and emits a clear install hint when a challenge is hit.Verified live end-to-end:
peak search,peak show,peak stats,peak ascents, andascent showall return real data for Mount Shuksan (pid 1630) — one browser solve, then all subsequent requests reuse the cached cookie.Notable decisions
Runtime.enableleak;browser-useworks but is heavyweight (~100 deps + LLM agent layer). patchright is the minimal stealth engine that passes, with a standard Playwright API.HeadlessChromeUA + other signals), so the solve runs with a visible window. On headless CI/servers it needsxvfb.cf_clearanceis not strictly fingerprint-bound here, so handing it tocloudscraperworks — keeping requests fast and rate-limited rather than routing everything through a browser.cloudscraper's browser fingerprint and was never necessary).bs4find()href/src lambda predicates inbool()to satisfy the type checker (pre-existingtyfindings inscraper.py).Agent context
PeakBaggerClient.get()now solves-and-retries once on a Cloudflare challenge (guarded by_allow_solveto prevent loops); newpeakbagger/browser_transport.pymints/caches clearance.browserextra (patchright); cache at$XDG_CACHE_HOME/peakbagger/(cloudflare_clearance.json,chrome-profile/); envPEAKBAGGER_BROWSER_CHANNELoverrides the Chrome channel.xvfbwiring); single-instance (concurrent solves contend on the Chrome profile dir); cached cookie lifetime bounded by Cloudflare (~minutes to ~1h).