Skip to content

Upstream sanitizer api#12395

Open
noamr wants to merge 42 commits intowhatwg:mainfrom
noamr:zcorpan/upstream-sanitizer-api
Open

Upstream sanitizer api#12395
noamr wants to merge 42 commits intowhatwg:mainfrom
noamr:zcorpan/upstream-sanitizer-api

Conversation

@noamr
Copy link
Copy Markdown
Collaborator

@noamr noamr commented Apr 21, 2026

Convert the incubated spec in https://wicg.github.io/sanitizer-api/ to the HTML format and make it part of the HTML standard.

(See WHATWG Working Mode: Changes for more details.)


/canvas.html ( diff )
/comms.html ( diff )
/dom.html ( diff )
/dynamic-markup-insertion.html ( diff )
/edits.html ( diff )
/embedded-content-other.html ( diff )
/form-elements.html ( diff )
/forms.html ( diff )
/grouping-content.html ( diff )
/iframe-embed-object.html ( diff )
/image-maps.html ( diff )
/imagebitmap-and-animations.html ( diff )
/index.html ( diff )
/indices.html ( diff )
/infrastructure.html ( diff )
/interaction.html ( diff )
/interactive-elements.html ( diff )
/microdata.html ( diff )
/parsing.html ( diff )
/references.html ( diff )
/rendering.html ( diff )
/sections.html ( diff )
/semantics.html ( diff )
/system-state.html ( diff )
/tables.html ( diff )
/text-level-semantics.html ( diff )
/timers-and-user-prompts.html ( diff )
/web-messaging.html ( diff )
/webstorage.html ( diff )
/workers.html ( diff )

@noamr noamr marked this pull request as draft April 21, 2026 13:16
@noamr noamr changed the base branch from zcorpan/upstream-sanitizer-api to main April 21, 2026 13:17
@noamr noamr changed the title WIP upstream sanitizer api Upstream sanitizer api Apr 21, 2026
@noamr noamr force-pushed the zcorpan/upstream-sanitizer-api branch from 223a4d1 to d2034e5 Compare April 21, 2026 19:42
@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented Apr 21, 2026

@zcorpan @evilpie @mozfreddyb @otherdaniel
initial review? :)
this is quite a big PR...

@evilpie
Copy link
Copy Markdown
Member

evilpie commented Apr 22, 2026

Amazing, thanks for working on this.

The built-in safe default configuration is pretty integral to the API, where did I go?

For anyone else looking at this, the gist of the changes are in dynamic-markup-insertion.html.

@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented Apr 22, 2026

Amazing, thanks for working on this.

The built-in safe default configuration is pretty integral to the API, where did I go?

Oh you're right I had it on my todo list and forgot. Getting to it. Thanks!

Comment thread source
Comment thread source Outdated
@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented Apr 22, 2026

Amazing, thanks for working on this.
The built-in safe default configuration is pretty integral to the API, where did I go?

Oh you're right I had it on my todo list and forgot. Getting to it. Thanks!

Done.

@noamr noamr marked this pull request as ready for review April 22, 2026 10:56
Comment thread source Outdated
@annevk
Copy link
Copy Markdown
Member

annevk commented Apr 22, 2026

I thought as part of moving this into the HTML standard we'd also address the parser integration issue?

@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented Apr 23, 2026

I thought as part of moving this into the HTML standard we'd also address the parser integration issue?

This is a huge PR so I thought doing it in two stages, the first one being a purely technical upstream, would be easier to review?

Open and happy to incorporate the stream-while-parsing changes in this PR if you and @zcorpan are ok to review that in one go.

@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented Apr 23, 2026

@zcorpan @annevk can we align on whether we upstream the sanitizer as is and then change it to stream-while-parsing, or do it in one go? I'm perfectly happy with both options.

@noamr noamr added the agenda+ To be discussed at a triage meeting label Apr 23, 2026
@zcorpan
Copy link
Copy Markdown
Member

zcorpan commented Apr 23, 2026

I prefer doing the parser integration in a follow-up PR.

Comment thread source Outdated
@noamr noamr removed the agenda+ To be discussed at a triage meeting label Apr 23, 2026
@evilpie
Copy link
Copy Markdown
Member

evilpie commented Apr 23, 2026

@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented Apr 23, 2026

I think these three PRs would be good to merge before merging into the HTML standard:

Since some security sensitive changes rely on "sanitizing while parsing", and that in turn relies on the current post-processing sanitizer being upstreamed, I don't think we should delay upstreaming any further.

Can we race it? If any of these go in before the upstream PR is in I'll incorporate them into the HTML PR.

@noamr noamr closed this Apr 23, 2026
@noamr noamr reopened this Apr 23, 2026
Comment thread source Outdated
data-x="dom-SanitizerProcessingInstruction-target">target</code> member.</p>
</div>

<div algorithm>
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These algorithms look like they belong in infra... would people be open to adding an optional comparator predicate to those, or to the definition of list/order set?

@annevk @zcorpan

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to Infra SGTM.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened whatwg/infra#709 for now.
I'm not sure about the comparator thing - infra doesn't really say what it means that two items in a list are the same. Would it be enough to mention here whaht makes items of attributes/elements/... lists "equal"

Copy link
Copy Markdown
Contributor

@otherdaniel otherdaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and I'm super happy to see this happening!


I wonder if we can link to the "Security Considerations" section in the current spec; or have them in a supplementary document somewhere?

Comment thread source Outdated
Comment thread source Outdated
Comment thread source Outdated
Copy link
Copy Markdown
Contributor

@otherdaniel otherdaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and I'm super happy to see this happening!


I wonder if we can link to the "Security Considerations" section in the current spec; or have them in a supplementary document somewhere?

@noamr noamr force-pushed the zcorpan/upstream-sanitizer-api branch from ea79a5b to 1e065df Compare April 28, 2026 13:07
@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented Apr 28, 2026

Thank you, and I'm super happy to see this happening!

I wonder if we can link to the "Security Considerations" section in the current spec; or have them in a supplementary document somewhere?

I've upstreamed them instead into a security consideration subsection

Comment thread source Outdated
<li><p>Return <var>document</var>.</p></li>
</ol>
</div>

</div>

<!-- https://github.com/WICG/sanitizer-api/commit/c4e328037ab6cd9c753b12694f5dcfc14988dec5 -->

<h4>Safe HTML parsing methods</h4>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use "Safe". Just "HTML parsing methods" is fine. For the same reason we don't say "safe" in APIs.

Copy link
Copy Markdown
Collaborator Author

@noamr noamr Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved them together with the "unsafe" methods and explained the difference.

Comment thread source Outdated
into an element's <code data-x="dom-Element-innerHTML">innerHTML</code> is fraught with risk, as
it can cause script execution in a number of unexpected ways.</p>

<p>Libraries like <cite>DOMPurify</cite> attempt to manage this problem by carefully parsing and
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should trim a bunch of this text. This is a standard, not a justification for the existence of this feature. We also can't assume familiarity with libraries so it's best to just not mention them.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimmed considerably

Comment thread source Outdated
</li>
</ul>

<h4>Processing model</h4>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section appears to define API. "Processing model" is generally reserved for something more abstract.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread source Outdated
}</p></li>
</ul>

<h4 id="sanitizer-security-considerations">Security Considerations</h4>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the headings here don't appear to follow our title case convention.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented Apr 30, 2026

I've refactored some of the sanitization constants to go into each element's definition instead of being in one huge table. I think that makes it less error prone when we add new elements in the future. If that's undesirable I'm happy to revert.

Comment thread source Outdated
data-x="dom-SanitizerProcessingInstruction-target">target</code> member.</p>
</div>

<div algorithm>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to Infra SGTM.

Comment thread source Outdated
<li><p>If <var>element</var> is a string, then return a new
<span>SanitizerElementNamespace</span> dictionary with its <code
data-x="dom-SanitizerElementNamespace-name">name</code> member set to <var>element</var> and its
<code data-x="dom-SanitizerElementNamespace-namespace">_namespace</code> member set to the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the _ outside WebIDL syntax.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread source Outdated
<li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/shapes.html#elementdef-line">SVG <code>line</code></dfn> element</li>
<li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/painting.html#elementdef-marker">SVG <code>marker</code></dfn> element</li>
<li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/struct.html#elementdef-metadata">SVG <code>metadata</code></dfn> element</li>
<li><dfn data-lt="SVG path" data-x="SVG path element" data-x-href="https://w3c.github.io/svgwg/svg2-draft/paths.html#elementdef-path">The SVG <code>path</code> element</dfn></li>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have "The"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread source Outdated
@@ -15939,7 +16036,7 @@ interface <dfn interface>DOMStringMap</dfn> {
data-x="concept-element-accessibility-considerations">Accessibility considerations</span>:</dt>
<dd><a href="https://w3c.github.io/html-aria/#el-html">For authors</a>.</dd>
<dd><a href="https://w3c.github.io/html-aam/#el-html">For implementers</a>.</dd>
<dt><span data-x="concept-element-dom">DOM interface</span>:</dt>
<dt><span data-x="concept-element-dom">DOM interface</span>:</dt>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra indentation. (Also more cases below.)

Comment thread source Outdated

<h5>Safe and unsafe</h5>

The "safe" methods will not generate any markup that executes script. That is, they are intended
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing <p>

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread source Outdated
Comment on lines +128432 to +128433
Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML
snippet without the correct context. In particular, when a parsed HTML fragment has been
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, mXSS can happen even when the context is the same. It's just that the resulting DOM is different after serializing and parsing it again, and the difference is exploited to bypass sanitization.

Thus, the strategy parse→sanitize→serialize→parse is not safe.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the current sanitizer S&P section... Happy for a rephrase. Maybe @mozfreddyb?

Comment thread source Outdated
when inserted into a different parent element. An example for carrying out such an attack is by
relying on the change of parsing behavior for foreign content or mis-nested tags. The Sanitizer
API offers only functions that turn a string into a node tree. The context is supplied implicitly
by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref setHTML() (remove "Element." since it's not a static method) and Document.parseHTML()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread source Outdated
by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML()
creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a
developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then
parse it again then mutated XSS may occur. We discourage this practice. If processing or passing
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use "may"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread source Outdated
by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML()
creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a
developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then
parse it again then mutated XSS may occur. We discourage this practice. If processing or passing
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase "We discourage this practice" (maybe "This practice is strongly discouraged.")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread source Outdated
creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a
developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then
parse it again then mutated XSS may occur. We discourage this practice. If processing or passing
of HTML as a string should be necessary after all, then any string should be considered untrusted
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use "should"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@noamr noamr force-pushed the zcorpan/upstream-sanitizer-api branch from 43519ba to 46b964f Compare May 6, 2026 13:36
@noamr noamr requested a review from zcorpan May 7, 2026 09:59
Copy link
Copy Markdown
Member

@zcorpan zcorpan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits and reword text for mXSS

Comment thread source Outdated
Comment thread source Outdated
Comment thread source Outdated
Comment thread source Outdated
boolean <dfn dict-member for="SanitizerConfig" data-x="dom-SanitizerConfig-dataAttributes">dataAttributes</dfn>;
};</code></pre>

<h5>Configuration invariants</h5>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, now the whole section is marked non-normative but I think part of it should be normative. Maybe split into a non-normative introduction and a normative section for the processing/algorithms (from the definition of equality onwards)? And replace normative keywords (may, should, must) in the non-normative part.

Comment thread source Outdated
Comment thread source Outdated

<div w-nodev>

<span data-x="list item">Items</span> of <code>SanitizerConfig</code>'s <code
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add <p>

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (and changed a bit to use dictionary types)

Comment thread source Outdated
Comment thread source Outdated
Comment on lines +126417 to +126420
<li><p>Duplicates and interactions between global and local lists:</p>
<ul>
<li><p>If a global <code data-x="dom-SanitizerConfig-attributes">attributes</code> allow list
exists, then all element's local lists:</p>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <p>s should be on their own line (with a blank line after) here, it should only be "inline" when it's the only child of li. This applies in more places in the diff.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread source Outdated
Comment thread source Outdated
@noamr
Copy link
Copy Markdown
Collaborator Author

noamr commented May 7, 2026

@zcorpan re #12395 (comment): I reordered things a bit so that the non-normative section only applies to the right part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants