Skip to content

fix: use Playwright to screenshot pages with text overlay#1

Open
carlos9917 wants to merge 1 commit into
axrona:mainfrom
carlos9917:fix/text-in-images
Open

fix: use Playwright to screenshot pages with text overlay#1
carlos9917 wants to merge 1 commit into
axrona:mainfrom
carlos9917:fix/text-in-images

Conversation

@carlos9917
Copy link
Copy Markdown

The original approach downloaded raw CDN background images (absimg) which lack Scribd's text layer — text is rendered as absolutely-positioned HTML elements on top of the image, not embedded in it.

Replace the image download path with a Playwright-based renderer that:

  • Launches a headless Chromium browser with a 1280x4000 viewport (tall enough to fit a full document page without clipping)
  • Blocks consent-management-platform scripts (OneTrust etc.) which inject their own absimg elements that would corrupt page ordering
  • Scrolls through the document to trigger lazy-loading of each page
  • At each scroll position, screenshots the parent element of every visible absimg (the parent holds both the background image and the text overlay)
  • Filters img.absimg elements to only those served from scribd CDN domains

The original approach downloaded raw CDN background images (absimg) which
lack Scribd's text layer — text is rendered as absolutely-positioned HTML
elements on top of the image, not embedded in it.

Replace the image download path with a Playwright-based renderer that:
- Launches a headless Chromium browser with a 1280x4000 viewport (tall
  enough to fit a full document page without clipping)
- Blocks consent-management-platform scripts (OneTrust etc.) which inject
  their own absimg elements that would corrupt page ordering
- Scrolls through the document to trigger lazy-loading of each page
- At each scroll position, screenshots the parent element of every visible
  absimg (the parent holds both the background image and the text overlay)
- Filters img.absimg elements to only those served from scribd CDN domains

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@carlos9917 carlos9917 closed this Apr 6, 2026
@carlos9917 carlos9917 deleted the fix/text-in-images branch April 6, 2026 07:45
@carlos9917 carlos9917 restored the fix/text-in-images branch April 6, 2026 07:45
@carlos9917 carlos9917 reopened this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant