Upstream sanitizer api by noamr · Pull Request #12395 · whatwg/html

noamr · 2026-04-21T13:16:44Z

Convert the incubated spec in https://wicg.github.io/sanitizer-api/ to the HTML format and make it part of the HTML standard.

At least two implementers are interested (and none opposed):
- Gecko/Chromium already shipping
Tests are written and can be reviewed and commented upon at:
- Remove "tentative" from sanitizer API tests. web-platform-tests/wpt#59400
Implementation bugs are filed:
- Chromium: already supported
- Gecko: already supported
- WebKit: https://bugs.webkit.org/show_bug.cgi?id=303808
Corresponding HTML AAM & ARIA in HTML issues & PRs: N/A
MDN issue is filed: already there: https://developer.mozilla.org/en-US/docs/Web/API/Sanitizer
The top of this comment includes a clear commit message to use.

(See WHATWG Working Mode: Changes for more details.)

/canvas.html ( diff )
/comms.html ( diff )
/dom.html ( diff )
/dynamic-markup-insertion.html ( diff )
/edits.html ( diff )
/embedded-content-other.html ( diff )
/form-elements.html ( diff )
/forms.html ( diff )
/grouping-content.html ( diff )
/iframe-embed-object.html ( diff )
/image-maps.html ( diff )
/imagebitmap-and-animations.html ( diff )
/index.html ( diff )
/indices.html ( diff )
/infrastructure.html ( diff )
/interaction.html ( diff )
/interactive-elements.html ( diff )
/microdata.html ( diff )
/parsing.html ( diff )
/references.html ( diff )
/rendering.html ( diff )
/sections.html ( diff )
/semantics.html ( diff )
/system-state.html ( diff )
/tables.html ( diff )
/text-level-semantics.html ( diff )
/timers-and-user-prompts.html ( diff )
/web-messaging.html ( diff )
/webstorage.html ( diff )
/workers.html ( diff )

noamr · 2026-04-21T20:11:47Z

@zcorpan @evilpie @mozfreddyb @otherdaniel
initial review? :)
this is quite a big PR...

evilpie · 2026-04-22T10:07:01Z

Amazing, thanks for working on this.

The built-in safe default configuration is pretty integral to the API, where did I go?

For anyone else looking at this, the gist of the changes are in dynamic-markup-insertion.html.

noamr · 2026-04-22T10:26:27Z

Amazing, thanks for working on this.

The built-in safe default configuration is pretty integral to the API, where did I go?

Oh you're right I had it on my todo list and forgot. Getting to it. Thanks!

noamr · 2026-04-22T10:54:17Z

Amazing, thanks for working on this.
The built-in safe default configuration is pretty integral to the API, where did I go?

Oh you're right I had it on my todo list and forgot. Getting to it. Thanks!

Done.

annevk · 2026-04-22T20:43:22Z

I thought as part of moving this into the HTML standard we'd also address the parser integration issue?

noamr · 2026-04-23T07:33:37Z

I thought as part of moving this into the HTML standard we'd also address the parser integration issue?

This is a huge PR so I thought doing it in two stages, the first one being a purely technical upstream, would be easier to review?

Open and happy to incorporate the stream-while-parsing changes in this PR if you and @zcorpan are ok to review that in one go.

noamr · 2026-04-23T07:58:32Z

@zcorpan @annevk can we align on whether we upstream the sanitizer as is and then change it to stream-while-parsing, or do it in one go? I'm perfectly happy with both options.

zcorpan · 2026-04-23T08:59:38Z

I prefer doing the parser integration in a follow-up PR.

evilpie · 2026-04-23T11:29:55Z

I think these three PRs would be good to merge before merging into the HTML standard:

noamr · 2026-04-23T12:16:00Z

I think these three PRs would be good to merge before merging into the HTML standard:

Remove custom element state when is attribute is blocked WICG/sanitizer-api#396

Clarify condition by splitting it up into explicitly named bools. WICG/sanitizer-api#385

Replace replace all with steps from replaceWith. WICG/sanitizer-api#375

Since some security sensitive changes rely on "sanitizing while parsing", and that in turn relies on the current post-processing sanitizer being upstreamed, I don't think we should delay upstreaming any further.

Can we race it? If any of these go in before the upstream PR is in I'll incorporate them into the HTML PR.

noamr · 2026-04-28T08:34:51Z

+  data-x="dom-SanitizerProcessingInstruction-target">target</code> member.</p>
+  </div>
+
+  <div algorithm>


These algorithms look like they belong in infra... would people be open to adding an optional comparator predicate to those, or to the definition of list/order set?

@annevk @zcorpan

Moving this to Infra SGTM.

Opened whatwg/infra#709 for now.
I'm not sure about the comparator thing - infra doesn't really say what it means that two items in a list are the same. Would it be enough to mention here whaht makes items of attributes/elements/... lists "equal"

otherdaniel

Thank you, and I'm super happy to see this happening!

I wonder if we can link to the "Security Considerations" section in the current spec; or have them in a supplementary document somewhere?

otherdaniel

Thank you, and I'm super happy to see this happening!

I wonder if we can link to the "Security Considerations" section in the current spec; or have them in a supplementary document somewhere?

noamr · 2026-04-28T13:22:15Z

Thank you, and I'm super happy to see this happening!

I wonder if we can link to the "Security Considerations" section in the current spec; or have them in a supplementary document somewhere?

I've upstreamed them instead into a security consideration subsection

annevk · 2026-04-28T17:49:18Z

   <li><p>Return <var>document</var>.</p></li>
  </ol>
  </div>

  </div>

+  <!-- https://github.com/WICG/sanitizer-api/commit/c4e328037ab6cd9c753b12694f5dcfc14988dec5 -->
+
+  <h4>Safe HTML parsing methods</h4>


I don't think we should use "Safe". Just "HTML parsing methods" is fine. For the same reason we don't say "safe" in APIs.

Moved them together with the "unsafe" methods and explained the difference.

annevk · 2026-04-28T17:51:11Z

+  into an element's <code data-x="dom-Element-innerHTML">innerHTML</code> is fraught with risk, as
+  it can cause script execution in a number of unexpected ways.</p>
+
+  <p>Libraries like <cite>DOMPurify</cite> attempt to manage this problem by carefully parsing and


I think we should trim a bunch of this text. This is a standard, not a justification for the existence of this feature. We also can't assume familiarity with libraries so it's best to just not mention them.

Trimmed considerably

annevk · 2026-04-28T17:53:50Z

+   </li>
+  </ul>
+
+  <h4>Processing model</h4>


This section appears to define API. "Processing model" is generally reserved for something more abstract.

annevk · 2026-04-28T17:55:11Z

+   }</p></li>
+  </ul>
+
+  <h4 id="sanitizer-security-considerations">Security Considerations</h4>


A lot of the headings here don't appear to follow our title case convention.

noamr · 2026-04-30T12:25:29Z

I've refactored some of the sanitization constants to go into each element's definition instead of being in one huge table. I think that makes it less error prone when we add new elements in the future. If that's undesirable I'm happy to revert.

zcorpan · 2026-05-06T08:21:58Z

+  data-x="dom-SanitizerProcessingInstruction-target">target</code> member.</p>
+  </div>
+
+  <div algorithm>


Moving this to Infra SGTM.

zcorpan · 2026-05-06T08:24:12Z

+   <li><p>If <var>element</var> is a string, then return a new
+   <span>SanitizerElementNamespace</span> dictionary with its <code
+   data-x="dom-SanitizerElementNamespace-name">name</code> member set to <var>element</var> and its
+   <code data-x="dom-SanitizerElementNamespace-namespace">_namespace</code> member set to the


Remove the _ outside WebIDL syntax.

zcorpan · 2026-05-06T08:26:25Z

+     <li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/shapes.html#elementdef-line">SVG <code>line</code></dfn> element</li>
+     <li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/painting.html#elementdef-marker">SVG <code>marker</code></dfn> element</li>
+     <li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/struct.html#elementdef-metadata">SVG <code>metadata</code></dfn> element</li>
+     <li><dfn data-lt="SVG path" data-x="SVG path element" data-x-href="https://w3c.github.io/svgwg/svg2-draft/paths.html#elementdef-path">The SVG <code>path</code> element</dfn></li>


Why does this have "The"?

zcorpan · 2026-05-06T08:27:15Z

@@ -15939,7 +16036,7 @@ interface <dfn interface>DOMStringMap</dfn> {
   data-x="concept-element-accessibility-considerations">Accessibility considerations</span>:</dt>
   <dd><a href="https://w3c.github.io/html-aria/#el-html">For authors</a>.</dd>
   <dd><a href="https://w3c.github.io/html-aam/#el-html">For implementers</a>.</dd>
-   <dt><span data-x="concept-element-dom">DOM interface</span>:</dt>
+    <dt><span data-x="concept-element-dom">DOM interface</span>:</dt>


Remove extra indentation. (Also more cases below.)

zcorpan · 2026-05-06T08:35:01Z

+
+  <h5>Safe and unsafe</h5>
+
+  The "safe" methods will not generate any markup that executes script. That is, they are intended


Missing <p>

zcorpan · 2026-05-06T12:23:33Z

+  Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML
+  snippet without the correct context. In particular, when a parsed HTML fragment has been


Not necessarily, mXSS can happen even when the context is the same. It's just that the resulting DOM is different after serializing and parsing it again, and the difference is exploited to bypass sanitization.

Thus, the strategy parse→sanitize→serialize→parse is not safe.

This is copied from the current sanitizer S&P section... Happy for a rephrase. Maybe @mozfreddyb?

zcorpan · 2026-05-06T12:30:10Z

+  when inserted into a different parent element. An example for carrying out such an attack is by
+  relying on the change of parsing behavior for foreign content or mis-nested tags. The Sanitizer
+  API offers only functions that turn a string into a node tree. The context is supplied implicitly
+  by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML()


xref setHTML() (remove "Element." since it's not a static method) and Document.parseHTML()

zcorpan · 2026-05-06T12:30:45Z

+  by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML()
+  creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a
+  developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then
+  parse it again then mutated XSS may occur. We discourage this practice. If processing or passing


Don't use "may"

zcorpan · 2026-05-06T12:31:39Z

+  by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML()
+  creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a
+  developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then
+  parse it again then mutated XSS may occur. We discourage this practice. If processing or passing


Rephrase "We discourage this practice" (maybe "This practice is strongly discouraged.")

zcorpan · 2026-05-06T12:31:59Z

+  creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a
+  developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then
+  parse it again then mutated XSS may occur. We discourage this practice. If processing or passing
+  of HTML as a string should be necessary after all, then any string should be considered untrusted


Don't use "should"

zcorpan

Nits and reword text for mXSS

zcorpan · 2026-05-07T14:10:06Z

+  boolean <dfn dict-member for="SanitizerConfig" data-x="dom-SanitizerConfig-dataAttributes">dataAttributes</dfn>;
+};</code></pre>
+
+  <h5>Configuration invariants</h5>


Sorry, now the whole section is marked non-normative but I think part of it should be normative. Maybe split into a non-normative introduction and a normative section for the processing/algorithms (from the definition of equality onwards)? And replace normative keywords (may, should, must) in the non-normative part.

zcorpan · 2026-05-07T14:17:48Z

+
+  <div w-nodev>
+
+  <span data-x="list item">Items</span> of <code>SanitizerConfig</code>'s <code


Done (and changed a bit to use dictionary types)

zcorpan · 2026-05-07T14:20:34Z

+   <li><p>Duplicates and interactions between global and local lists:</p>
+    <ul>
+     <li><p>If a global <code data-x="dom-SanitizerConfig-attributes">attributes</code> allow list
+     exists, then all element's local lists:</p>


The <p>s should be on their own line (with a blank line after) here, it should only be "inline" when it's the only child of li. This applies in more places in the diff.

noamr · 2026-05-07T15:07:06Z

@zcorpan re #12395 (comment): I reordered things a bit so that the non-normative section only applies to the right part.

noamr marked this pull request as draft April 21, 2026 13:16

noamr changed the base branch from zcorpan/upstream-sanitizer-api to main April 21, 2026 13:17

noamr changed the title ~~WIP upstream sanitizer api~~ Upstream sanitizer api Apr 21, 2026

noamr force-pushed the zcorpan/upstream-sanitizer-api branch from 223a4d1 to d2034e5 Compare April 21, 2026 19:42

zcorpan reviewed Apr 22, 2026

View reviewed changes

Comment thread source

Comment thread source Outdated

noamr marked this pull request as ready for review April 22, 2026 10:56

zcorpan reviewed Apr 22, 2026

View reviewed changes

Comment thread source Outdated

noamr added the agenda+ To be discussed at a triage meeting label Apr 23, 2026

evilpie reviewed Apr 23, 2026

View reviewed changes

Comment thread source Outdated

noamr removed the agenda+ To be discussed at a triage meeting label Apr 23, 2026

noamr closed this Apr 23, 2026

noamr reopened this Apr 23, 2026

noamr commented Apr 28, 2026

View reviewed changes

otherdaniel reviewed Apr 28, 2026

View reviewed changes

Comment thread source Outdated

Comment thread source Outdated

Comment thread source Outdated

otherdaniel reviewed Apr 28, 2026

View reviewed changes

noamr force-pushed the zcorpan/upstream-sanitizer-api branch from ea79a5b to 1e065df Compare April 28, 2026 13:07

annevk reviewed Apr 28, 2026

View reviewed changes

zcorpan requested changes May 6, 2026

View reviewed changes

noamr added 20 commits May 6, 2026 14:35

nits

b482512

Include built-ins

faa9083

Add intro h

d27807f

Use JSON also for the other built-ins

77fe17f

Use linked tables instead of json

d50cf19

refs

dce4487

nits

e949601

nits

ce8a03e

Cleanup algorithms

e6df589

Clean up algos

d562a6c

Clean up algos

13bf984

nits

a2db66f

nits

9074a1a

secpriv

7f311d7

Add mXSS biblio

7f95a3f

wip

2777efd

Reduce intro

f0f01f6

Refer to event handlers table

253d7b5

Explode sanitization constants into elements for HTML

a345029

Lotsa nits

46b964f

noamr force-pushed the zcorpan/upstream-sanitizer-api branch from 43519ba to 46b964f Compare May 6, 2026 13:36

noamr added 4 commits May 7, 2026 10:33

Move stuff to infra

931c908

nits

edf004a

equality

dbe6500

Link to infra

76254df

noamr requested a review from zcorpan May 7, 2026 09:59

zcorpan reviewed May 7, 2026

View reviewed changes

nits

1f9863f

may-

6259dfd


		<h5>Safe and unsafe</h5>

		The "safe" methods will not generate any markup that executes script. That is, they are intended

		Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML
		snippet without the correct context. In particular, when a parsed HTML fragment has been


		<div w-nodev>

		<span data-x="list item">Items</span> of <code>SanitizerConfig</code>'s <code

Conversation

noamr commented Apr 21, 2026 • edited by pr-preview Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noamr commented Apr 21, 2026

Uh oh!

evilpie commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noamr commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

noamr commented Apr 22, 2026

Uh oh!

Uh oh!

annevk commented Apr 22, 2026

Uh oh!

noamr commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noamr commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zcorpan commented Apr 23, 2026

Uh oh!

Uh oh!

evilpie commented Apr 23, 2026

Uh oh!

noamr commented Apr 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

otherdaniel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

otherdaniel left a comment

Choose a reason for hiding this comment

Uh oh!

noamr commented Apr 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noamr Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noamr commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

noamr commented Apr 21, 2026 •

edited by pr-preview Bot

Loading

evilpie commented Apr 22, 2026 •

edited

Loading

noamr commented Apr 23, 2026 •

edited

Loading

noamr commented Apr 23, 2026 •

edited

Loading

noamr Apr 29, 2026 •

edited

Loading

noamr commented Apr 30, 2026 •

edited

Loading