From 82915a5c9808c3542048e5f4d13e4fa073da89d7 Mon Sep 17 00:00:00 2001
From: Rhyannon Joy Rodriguez <rhyannonjoy@gmail.com>
Date: Fri, 8 May 2026 16:13:18 -0700
Subject: [PATCH] [add] platforms (Issue #27)

---
 README.md                       |  2 +-
 SPEC.md                         | 20 +----------
 site/config/_default/menus.toml |  5 +++
 site/content/platforms.md       | 63 +++++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+), 20 deletions(-)
 create mode 100644 site/content/platforms.md

diff --git a/README.md b/README.md
index f0c012e..a3254c5 100644
--- a/README.md
+++ b/README.md
@@ -98,7 +98,7 @@ This spec is open for community review. We welcome:
 - **Proposed changes**: Submit a pull request (open an issue first for
   significant changes)
 - **Platform data**: If you know a platform's truncation limits, contribute to
-  the [Known Platform Limits](SPEC.md#known-platform-limits) table
+  [the Platforms tables](./site/content/platforms.md)
 - **Real-world results**: If you've evaluated your docs against this spec,
   we'd love to hear what you found
 
diff --git a/SPEC.md b/SPEC.md
index cc85418..3c01912 100644
--- a/SPEC.md
+++ b/SPEC.md
@@ -1503,25 +1503,7 @@ becomes available.
 
 ### Known Platform Limits
 
-| Platform | Truncation Limit | Source | Confidence | Notes |
-| ---------- | ----------------- | -------- | ------------ | ------- |
-| Claude Code | ~100,000 chars | [Reverse engineering](https://giuseppegurgone.com/claude-webfetch) | High | Trusted sites serving `text/markdown` under 100K chars bypass summarization model entirely. Content over this threshold goes through a summarization model that may lose information. |
-| MCP Fetch (reference server) | 5,000 chars (default) | [Official docs](https://pypi.org/project/mcp-server-fetch/) | High | Default `max_length` is 5,000 chars. Configurable up to 1,000,000. Supports chunked reading via `start_index`. |
-| Claude API (web_fetch tool) | ~20,700 chars - default, unset | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | Medium | Optional `max_content_tokens` parameter can cap content length, but no default truncation limit is documented. Distinct implementation from Claude Code client-side tool. Default truncation ~20,700 chars when unset - ended mid-word. `max_content_tokens` is approximate — setting 5,000 returned 17,186 chars. Truncation occurs mid-token. CSS stripped effectively unlike Claude Code. HTML boilerplate 81–97.5% before first heading; Markdown reduces content 77%. JS-rendered pages return static shell only. |
-| Google Gemini (URL context) | Unknown | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | Medium | Docs state a 34 MB max fetch size per URL, but this is a retrieval ceiling, not a processing limit. How much content actually reaches the model after fetching is undocumented. 20 URL hard limit per request, `400 INVALID_ARGUMENT` if exceeded, zero tokens consumed. Truncation boundary unknown — retrieved content is injected into context without a testable field; `tool_use_prompt_token_count` is the only available size proxy, <1% variance across runs. PDF failed consistently despite being a documented supported type; YouTube succeeded despite being documented as unsupported. `url_context_metadata` order is non-deterministic. Tested on `gemini-2.5-flash` only — behavior may vary across supported models. |
-| OpenAI (web search) | Unknown | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | Medium | 128K token context window for web search. `search_context_size` parameter (low/medium/high) controls context amount but no per-page truncation limit is surfaced; when the tool invokes, any truncation of retrieved source content occurs before the model generates a response and isn't observable via the APIs. Consistent latency lever in Chat Completions API track, high ~1.5–1.7× slower, inconsistent in Responses API track. Source count stable at 12 regardless of context size. Tool invocation conditional and deterministic: static facts and trivial math don't invoke the tool. Domain filtering documented but non-functional via Python SDK — allow-list worked once on `web_search_preview`, never on `web_search`; block-list never succeeded across 6 runs, 2 tool types, 2 models. `search_queries_issued` appends training-era year strings despite running in 2026. Tested on `gpt-4o` + `gpt-4o-mini-search-preview` - behavior may vary across supported models. |
-| Cursor | Method-dependent | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | High | No documented truncation limit, behavior varies between backend methods `WebFetch MCP` ~28KB, `urllib` ~72KB, other routes 240KB+; `Auto` agent routing opaque; Cursor autonomously selects fetch mechanism. On timeout, falls back to `curl` (unfiltered HTML, 16MB+ observed). Requests `text/markdown` via `Accept` header. No token limit detected (tested 6.68M tokens). Perfect reproducibility for same URL; high variance for small files across sessions. |
-| GitHub Copilot | No fixed ceiling detected | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | Medium | No documented web fetch or truncation details; tool selection is non-deterministic and not controllable by prompt. `fetch_webpage` identified through logs only; performs relevance-ranked semantic excerpts with `...` elision markers in HTML-to-Markdown transformation with chunk-based reassembly; output order doesn't always reflect page reading order. No size limit detected across 55 runs; `curl` substitution delivers full retrieval, raw bytes in server format with no transformation layer. `Auto` model routing dispatches across multiple models with no documented routing logic. Tested on `Claude Haiku 4.5`, `Claude Sonnet 4.6`, `GPT-5.3-Codex`, `GPT-5.4`, `Grok Code Fast 1`, `Raptor mini (Preview)`. |
-| Windsurf Cascade | No fixed ceiling detected at retrieval stage, but agent-dependent write ceiling | [empirical testing](https://rhyannonjoy.github.io/agent-ecosystem-testing/) | High | Two-stage pipeline `read_url_content` returns chunk index with summaries, metadata, requires sequential `view_content_chunk` calls. Full retrieval agent, doc size dependent. Full retrieval doesn't guarantee full content delivery. Agents often retrieve fully ~<14 chunks, spotty ~35, sparse sampling 50+. Includes per-chunk trucation, some chunk summaries include byte-count loss notices. CSS-heavy, SPAs often retrieve ~20-35% expected rendered size. `@web` syntax redundant with URL. Read-write asymmetry: agents that self-report full retrieval frequently fail to reproduce semantically-meaningful content with `curl` HTML/JS shells, false completions, cross-agent file reuse. |
-
-**Thank you to contributors!**
-
-- Claude API (web_fetch tool) limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/)
-- Cursor limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/)
-- GitHub Copilot limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/)
-- Google Gemini (URL context) limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/)
-- OpenAI (web search) limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/)
-- Windsurf Cascade limitations contributed by [Rhyannon Rodriguez](https://rhyannonjoy.github.io/agent-ecosystem-testing/)
+Compare platform architecture and truncation limits in [Platforms](./site/content/platforms.md).
 
 ### What This Means for Threshold Selection
 
diff --git a/site/config/_default/menus.toml b/site/config/_default/menus.toml
index 6a87b6b..1e41ba9 100644
--- a/site/config/_default/menus.toml
+++ b/site/config/_default/menus.toml
@@ -3,6 +3,11 @@ name = "Spec"
 pageRef = "spec"
 weight = 10
 
+[[main]]
+name = "Platforms"
+pageRef = "platforms"
+weight = 15
+
 [[main]]
 name = "GitHub"
 url = "https://github.com/agent-ecosystem/agent-docs-spec"
diff --git a/site/content/platforms.md b/site/content/platforms.md
new file mode 100644
index 0000000..51c27fc
--- /dev/null
+++ b/site/content/platforms.md
@@ -0,0 +1,63 @@
+---
+title: "Platforms"
+description: "Agent platform comparisons for retrieval, truncation, and summarization layers."
+---
+
+| **Section** | **Description** |
+| ----------- | ------------------ |
+| [Retrieval](#retrieval) | How and when an agent fetches content |
+| [Truncation](#truncation) | What gets lost and whether agents report it |
+| [Summarization](#summarization) | What happens to content between retrieval and generation |
+
+## Retrieval
+
+The web fetch gap isn't in retrieval, but in what follows: how agents attend to various content types
+during generation, whether that's context window handling, chunking losses, or summarization. Platform
+links lead to each tool's official documentation.
+
+| **Platform** | **Prompt Syntax** | **Invocation Pattern** | **Retrieval Behavior** |
+| ---------- | ------------ | -------------------- | ----------------------- |
+| [Claude API web fetch](https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-fetch-tool) | Enable tool to augment Claude's context with URL | _Mid-generation deterministic_: tool requires enablement in API request, includes URL validation and results cache, may or may not provide live web content | _Visibility high_: only platform where response body includes raw tool result; no JavaScript execution, CSS-heavy pages and/or SPAs often return little to no prose |
+| [Claude Code](https://code.claude.com/docs/en/overview) | `WebFetch` invoked automatically with prompt URL | _Mid-generation deterministic_: returns cached result if available, otherwise fetches live | _Visibility high_: Markdown result returned directly if trusted, text/markdown, <100k; otherwise a smaller LLM extracts relevant content before passing to Claude; no JavaScript execution |
+| [Cursor](https://cursor.com/docs) | _No web fetch behavior publicly documented_, `@Web` context attachment redundant, agents don't correct misuse | _Mid-generation nondeterministic_: `Auto` default setting autonomous LLM and fetch method selection per request | _Visibility low_: fetch method not explicitly named, no JavaScript execution, CSS-heavy pages and/or SPAs often return little to no prose; prefers Markdown, content negotation documented with `Accept: text/markdown`; sends full browser fingerprint: `User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36` with Chrome client hints `Sec-Ch-Ua`, `Sec-Fetch-*` |
+| [Gemini API URL context](https://ai.google.dev/gemini-api/docs/url-context) | Enable tool to augment Gemini's context with URL, request requires `url_context` with full, unnested URLs | _Pre-generation injection deterministic_: two-step process, fetches from internal cache, if unsuccessful, then live fetch; documentation includes parsing limitations | _Visibility low_: retrieved content injected into context without a testable field, retrieval orchestration and generation process opaque; `url_context_metadata` order _nondeterministic_, authoritative signal `url_retrieval_status`, `tool_use_prompt_token_count` only size proxy |
+| [GitHub Copilot](https://code.visualstudio.com/docs/copilot/overview) | _No web fetch behavior publicly documented_, prompt with URL | _Mid-generation nondeterministic_: `Auto` default setting autonomous LLM and fetch method selection per request | _Visibility medium_: intermittently tools named via error, `fetch_webpage` returns relevance-ranked excerpts with elision markers, occasional nonlinear, inaccurate reassembly, `curl` byte-perfect retrieval, but no prose; content negotiation tool-dependent, presents as a browser, but overclaims `User-Agent`: `Mozilla/5.0`, `AppleWebKit` `Accept`: full HTML, `curl/8.7.1` no preference,`Accept`: `*/*` |
+| [MCP Fetch (reference server)](https://pypi.org/project/mcp-server-fetch/) | `url` required; `max_length`, `start_index`, `raw` optional | _Mid-generation deterministic_: fetch invoked automatically with URL | _Visibility high_: returns extracted contents as Markdown; supports chunked reading via `start_index`, allowing LLM to page through content until it finds what's needed; no JavaScript execution |
+| [OpenAI web search](https://developers.openai.com/api/docs/guides/tools-web-search) | Chat Completions API augments `GPT`'s search with URL, Responses API for `web_search` | _Mid-generation nondeterministic: integration and agent-dependent_: static facts and trivial math don't invoke the tool; Chat Completions search implicit, Responses `web_search_preview` conditional, control cached/indexed or live content `external_web_access` | _Visibility low_: Responses `response.output`'s `web_search_call` names tools, but search context not equal to LLM context window; no JavaScript execution; `search_context_size: low/medium/high` controls context amount, Chat Completions latency lever consistent, but Responses inconsistent |
+| [Windsurf Cascade](https://docs.windsurf.com/windsurf/cascade/web-search) | Web and docs search partially documented, `@web` directive redundant with URL, agents don't correct misuse | _Mid-generation deterministic_: autonomous two-stage pipeline designed to emulate human browsing and skimming, documentation acknowledges not all pages parseable | _Visibility medium_: `read_url_content` returns chunk index with summaries, metadata and requires sequential `view_content_chunk` calls; `curl` substitution for CSS-heavy pages, SPAs return ~20–35% of expected size, little or no prose; agents used `@web`'s `web_search` as verification once every ~60 turns; presentation transparent about using crawler-scaper, but underdelivers, `User-Agent`: [Colly](https://github.com/gocolly/colly) |
+
+## Truncation
+
+Pipelines are lossy by design in attempt to balance token cost, speed, and access to fresh content.
+Agents intermittently acknowledge architectural constraints, misattribute truncation causes, or
+self-report completeness when content is incomplete or unusable. Platform links lead to empirical
+testing analysis and/or tool documentation.
+
+
+| **Platform** | **Truncation Limit** | **Observations** |
+| ---------- | ----------------- | ------- |
+| [Claude API web fetch](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/anthropic-claude-api-web-fetch-tool/claude-interpreted-vs-raw) | ~20,700 chars and/or ~100 KB of rendered content _default unset_ | `max_content_tokens` approximate, setting 5,000 returned 17,186 chars, truncation occurs mid-token. Default limit identified in raw track, self-report attributed missing content to JavaScript rendering, masking character limit. |
+| [Claude Code](https://giuseppegurgone.com/claude-webfetch) | ~100,000 chars | Trusted sites serving `text/markdown` under 100K chars bypass summarization, while content over 100K chars are passed to a summarization LLM. |
+| [Cursor](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/anysphere-cursor/cursor-interpreted-vs-raw)** | 28 KB–240 KB+ _method-dependent_, _nondeterministic filtering_ | `WebFetch MCP` ~28 KB, `urllib` ~72 KB, unknown path 245 KB+, `curl` no ceiling detected; appears to apply structure-aware content filtering, navigation and CSS stripped, but content selection heuristic presents as complete, so agents don't report truncation. |
+| [Gemini API URL context](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/google-gemini-url-context-tool/gemini-interpreted-vs-raw) | _No fixed ceiling or silent dropping detected_, 20 URLs hard limit per request | API-layer rejection returns `400` and doesn't consume tokens; retrieval-layer failure completes the request, but records `URL_RETRIEVAL_STATUS_ERROR`. Format support inconsistent with documentation: PDF fails, YouTube succeeds, JSON nondeterministic; Google Docs fail consistently. |
+| [GitHub Copilot](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/microsoft-github-copilot/copilot-interpreted-vs-raw) | _No fixed ceiling detected, _nondeterministic excerpting_, tested 6.68M tokens | Pipeline with `fetch_webpage` discards whole sections or more granularly before generation, `curl` delivers all raw bytes but unreadable, chat rendering cutoff visible in output, not persisted as requested, but agents don't reliably report these results as truncation. |
+| [MCP Fetch (reference server)](https://pypi.org/project/mcp-server-fetch/) | Default 5,000 chars | Default `max_length` is 5,000 chars, but configurable up to 1,000,000; uniquely user-controlled truncation. |
+| [OpenAI web search](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/open-ai-web-search-tool/chatgpt-interpreted-vs-raw) | _No fixed ceiling or silent dropping detected_ | Raw source count stable at 12 regardless of `search_context_size` setting. Query construction not temporally aware, internal queries append training-era date strings despite running in 2026. Documented domain filtering limits not functional in Python SDK. |
+| [Windsurf Cascade](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/cognition-windsurf-cascade/cascade-interpreted-explicit-vs-raw) | _No fixed ceiling detected at retrieval stage_, _nondeterministic agent-dependent write ceiling_ | Full retrieval agent and doc-size-dependent. Agents often retrieve fully under ~14 chunks, spotty at ~35, sparse sampling at 50+. Chunk index summary population not guaranteed, those present often include byte-count loss notices. Unique read-write asymmetry. Agents often self-report full retrieval, but fail to prove it with a write task or report truncation. |
+
+## Summarization
+
+Processing layer observability vary by implementation. Platforms often offer user-configured subagents
+while turn-by-turn chat interactions abstract any default orchestrator-subagent relationships away.
+Observable outputs from default settings primarily inform the conclusions below.
+
+| **Platform** | **Processing Layer** | **Inference** |
+| ---------- | -------------------- | ------------ |
+| [Claude API web fetch](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/anthropic-claude-api-web-fetch-tool/methodology) | _Dynamic filtering optional_, `web_fetch_20260209` | Server-side tool called directly with inspectable tool result in response. [Dynamic filtering](https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-fetch-tool) available with certain LLMs in which Claude writes, executes code to filter before content reaches the context window, but it's not default behavior. |
+| [Claude Code](https://giuseppegurgone.com/claude-webfetch) | _Summarization threshold-triggered_ | Content under ~100K chars from trusted text/markdown sources reaches the context window directly without intermediate processing, but content exceeding this threshold goes through a summarization LLM that may lose information. |
+| [Cursor](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/anysphere-cursor/methodology) | _Inferred via filtering, undocumented for web fetch_ | Codebase research, terminal commands, and browser automation requests trigger [built-in subagents](https://cursor.com/docs/subagents) `explore`, `bash`, and `browser`. Test prompts likely invoked `explore` and `bash` alongside web fetch. Backend routing and structure-aware content filtering suggest a pre-generation processing layer, not a passive, linear pipeline. |
+| [Gemini API URL context](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/google-gemini-url-context-tool/methodology) | _API layer pipeline, undocumented_ | Pre-generation injection suggests processing occurs before LLM invocation. No transformation layer between retrieval and generation; LLM receives content directly and any summarization occurs as part of generation, not as an intermediate pipeline stage. |
+| [GitHub Copilot](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/microsoft-github-copilot/methodology) | _Inferred via relevance-ranking, undocumented for web fetch_ | Reassembled excerpts, outputs that don't note discarded content, browser masquerading, and tool substitution patterns suggests an orchestrator-subagent relationship and not a linear, passive pipeline. Agent loop descriptions vary by implementation. [VS Code-Copilot docs](https://code.visualstudio.com/docs/copilot/agents/subagents) describe subagent delegation as _main agent-initiated_ for complex tasks with further config available, but [Copilot SDK docs](https://docs.github.com/en/copilot/how-tos/copilot-sdk/use-copilot-sdk/custom-agents) only describe subagents as configurable, and not default architecture. |
+| MCP Fetch (reference server) | _None_ hard truncation at `max_length` | Passive, linear pipeline without a processing layer. |
+| [OpenAI web search](./open-ai-web-search-tool/methodology.md) | _Differs by API surface, undocumented_ | Chat Completions autonomously retrieves, but Responses' LLM actively manages search in the chain of thought with `open_page` and `find_in_page`, suggesting a processing layer, but not explicitly documented or named in either API responses. |
+| [Windsurf Cascade](https://rhyannonjoy.github.io/agent-ecosystem-testing/docs/cognition-windsurf-cascade/methodology) | _Inferred via chunking, undocumented for web and docs search_ | Codebase research triggers [built-in subagent Fast Context](https://docs.windsurf.com/context-awareness/fast-context). Test prompts likely invoked Fast Context alongside web search. Chunk analysis, tool substitution, terminal execution, and workspace referencing suggest an extensive processing layer, not a passive, linear pipeline. |