Skip to content

Add Plasmate integration#467

Open
dbhurley wants to merge 1 commit intodeepset-ai:mainfrom
dbhurley:add-plasmate
Open

Add Plasmate integration#467
dbhurley wants to merge 1 commit intodeepset-ai:mainfrom
dbhurley:add-plasmate

Conversation

@dbhurley
Copy link
Copy Markdown

@dbhurley dbhurley commented May 2, 2026

Adds the Plasmate integration to the showcase.

Following @anakin87's suggestion in deepset-ai/haystack#11056 — that PR proposed a PlasmateFetcher directly inside the Haystack core, which @anakin87 redirected here. The integration package itself already lives at plasmate-labs/haystack-plasmate (Haystack 2.0 components: PlasmateWebFetcher and PlasmateSOMConverter).

What Plasmate is

Open-source (Apache 2.0) browser engine that produces the Semantic Object Model (SOM) — a flat, typed JSON representation of a web page optimized for LLM consumption. Drop-in alternative to LinkContentFetcher / HTMLToDocument that delivers an order of magnitude lower token cost per page.

Public benchmark (38 sites, weekly updated): https://webtaskbench.com — currently 29.6× average compression, 9.8× median, 118.5× peak (cloud.google.com).

Files changed

  • integrations/plasmate.md — integration markdown matching the existing pattern (apify.md, anthropic.md, etc.) with Overview / Installation / Components / RAG pipeline example / License sections.

Logo

/logos/plasmate.png is referenced but not added in this PR. Happy to add it in a follow-up commit if you let me know your size/format preference (most existing logos look like ~512px PNG with transparency). Or feel free to drop one in directly.

Thanks for the steer to the right venue.

Plasmate is an open-source (Apache 2.0) browser engine for AI agents that
produces the Semantic Object Model (SOM) — a flat, typed JSON document
representing a web page in a form optimized for LLM consumption.

This integration adds PlasmateWebFetcher and PlasmateSOMConverter components
for Haystack 2.0 RAG pipelines as a drop-in alternative to LinkContentFetcher
and HTMLToDocument, with ~17x average token reduction across the public
WebTaskBench benchmark.

Reopening the redirect from deepset-ai/haystack#11056 (closed Apr 13 with
@anakin87 suggesting this venue).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dbhurley dbhurley requested a review from a team as a code owner May 2, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant