Upstream sanitizer api#12395
Conversation
223a4d1 to
d2034e5
Compare
|
@zcorpan @evilpie @mozfreddyb @otherdaniel |
|
Amazing, thanks for working on this. The built-in safe default configuration is pretty integral to the API, where did I go? For anyone else looking at this, the gist of the changes are in dynamic-markup-insertion.html. |
Oh you're right I had it on my todo list and forgot. Getting to it. Thanks! |
Done. |
|
I thought as part of moving this into the HTML standard we'd also address the parser integration issue? |
This is a huge PR so I thought doing it in two stages, the first one being a purely technical upstream, would be easier to review? Open and happy to incorporate the stream-while-parsing changes in this PR if you and @zcorpan are ok to review that in one go. |
|
I prefer doing the parser integration in a follow-up PR. |
|
I think these three PRs would be good to merge before merging into the HTML standard: |
Since some security sensitive changes rely on "sanitizing while parsing", and that in turn relies on the current post-processing sanitizer being upstreamed, I don't think we should delay upstreaming any further. Can we race it? If any of these go in before the upstream PR is in I'll incorporate them into the HTML PR. |
| data-x="dom-SanitizerProcessingInstruction-target">target</code> member.</p> | ||
| </div> | ||
|
|
||
| <div algorithm> |
There was a problem hiding this comment.
Opened whatwg/infra#709 for now.
I'm not sure about the comparator thing - infra doesn't really say what it means that two items in a list are the same. Would it be enough to mention here whaht makes items of attributes/elements/... lists "equal"
otherdaniel
left a comment
There was a problem hiding this comment.
Thank you, and I'm super happy to see this happening!
I wonder if we can link to the "Security Considerations" section in the current spec; or have them in a supplementary document somewhere?
otherdaniel
left a comment
There was a problem hiding this comment.
Thank you, and I'm super happy to see this happening!
I wonder if we can link to the "Security Considerations" section in the current spec; or have them in a supplementary document somewhere?
ea79a5b to
1e065df
Compare
I've upstreamed them instead into a security consideration subsection |
| <li><p>Return <var>document</var>.</p></li> | ||
| </ol> | ||
| </div> | ||
|
|
||
| </div> | ||
|
|
||
| <!-- https://github.com/WICG/sanitizer-api/commit/c4e328037ab6cd9c753b12694f5dcfc14988dec5 --> | ||
|
|
||
| <h4>Safe HTML parsing methods</h4> |
There was a problem hiding this comment.
I don't think we should use "Safe". Just "HTML parsing methods" is fine. For the same reason we don't say "safe" in APIs.
There was a problem hiding this comment.
Moved them together with the "unsafe" methods and explained the difference.
| into an element's <code data-x="dom-Element-innerHTML">innerHTML</code> is fraught with risk, as | ||
| it can cause script execution in a number of unexpected ways.</p> | ||
|
|
||
| <p>Libraries like <cite>DOMPurify</cite> attempt to manage this problem by carefully parsing and |
There was a problem hiding this comment.
I think we should trim a bunch of this text. This is a standard, not a justification for the existence of this feature. We also can't assume familiarity with libraries so it's best to just not mention them.
There was a problem hiding this comment.
Trimmed considerably
| </li> | ||
| </ul> | ||
|
|
||
| <h4>Processing model</h4> |
There was a problem hiding this comment.
This section appears to define API. "Processing model" is generally reserved for something more abstract.
| }</p></li> | ||
| </ul> | ||
|
|
||
| <h4 id="sanitizer-security-considerations">Security Considerations</h4> |
There was a problem hiding this comment.
A lot of the headings here don't appear to follow our title case convention.
|
I've refactored some of the sanitization constants to go into each element's definition instead of being in one huge table. I think that makes it less error prone when we add new elements in the future. If that's undesirable I'm happy to revert. |
| data-x="dom-SanitizerProcessingInstruction-target">target</code> member.</p> | ||
| </div> | ||
|
|
||
| <div algorithm> |
| <li><p>If <var>element</var> is a string, then return a new | ||
| <span>SanitizerElementNamespace</span> dictionary with its <code | ||
| data-x="dom-SanitizerElementNamespace-name">name</code> member set to <var>element</var> and its | ||
| <code data-x="dom-SanitizerElementNamespace-namespace">_namespace</code> member set to the |
There was a problem hiding this comment.
Remove the _ outside WebIDL syntax.
| <li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/shapes.html#elementdef-line">SVG <code>line</code></dfn> element</li> | ||
| <li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/painting.html#elementdef-marker">SVG <code>marker</code></dfn> element</li> | ||
| <li><dfn data-x-href="https://w3c.github.io/svgwg/svg2-draft/struct.html#elementdef-metadata">SVG <code>metadata</code></dfn> element</li> | ||
| <li><dfn data-lt="SVG path" data-x="SVG path element" data-x-href="https://w3c.github.io/svgwg/svg2-draft/paths.html#elementdef-path">The SVG <code>path</code> element</dfn></li> |
| @@ -15939,7 +16036,7 @@ interface <dfn interface>DOMStringMap</dfn> { | |||
| data-x="concept-element-accessibility-considerations">Accessibility considerations</span>:</dt> | |||
| <dd><a href="https://w3c.github.io/html-aria/#el-html">For authors</a>.</dd> | |||
| <dd><a href="https://w3c.github.io/html-aam/#el-html">For implementers</a>.</dd> | |||
| <dt><span data-x="concept-element-dom">DOM interface</span>:</dt> | |||
| <dt><span data-x="concept-element-dom">DOM interface</span>:</dt> | |||
There was a problem hiding this comment.
Remove extra indentation. (Also more cases below.)
|
|
||
| <h5>Safe and unsafe</h5> | ||
|
|
||
| The "safe" methods will not generate any markup that executes script. That is, they are intended |
| Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML | ||
| snippet without the correct context. In particular, when a parsed HTML fragment has been |
There was a problem hiding this comment.
Not necessarily, mXSS can happen even when the context is the same. It's just that the resulting DOM is different after serializing and parsing it again, and the difference is exploited to bypass sanitization.
Thus, the strategy parse→sanitize→serialize→parse is not safe.
There was a problem hiding this comment.
This is copied from the current sanitizer S&P section... Happy for a rephrase. Maybe @mozfreddyb?
| when inserted into a different parent element. An example for carrying out such an attack is by | ||
| relying on the change of parsing behavior for foreign content or mis-nested tags. The Sanitizer | ||
| API offers only functions that turn a string into a node tree. The context is supplied implicitly | ||
| by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML() |
There was a problem hiding this comment.
xref setHTML() (remove "Element." since it's not a static method) and Document.parseHTML()
| by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML() | ||
| creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a | ||
| developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then | ||
| parse it again then mutated XSS may occur. We discourage this practice. If processing or passing |
| by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML() | ||
| creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a | ||
| developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then | ||
| parse it again then mutated XSS may occur. We discourage this practice. If processing or passing |
There was a problem hiding this comment.
Rephrase "We discourage this practice" (maybe "This practice is strongly discouraged.")
| creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS. If a | ||
| developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then | ||
| parse it again then mutated XSS may occur. We discourage this practice. If processing or passing | ||
| of HTML as a string should be necessary after all, then any string should be considered untrusted |
43519ba to
46b964f
Compare
zcorpan
left a comment
There was a problem hiding this comment.
Nits and reword text for mXSS
| boolean <dfn dict-member for="SanitizerConfig" data-x="dom-SanitizerConfig-dataAttributes">dataAttributes</dfn>; | ||
| };</code></pre> | ||
|
|
||
| <h5>Configuration invariants</h5> |
There was a problem hiding this comment.
Sorry, now the whole section is marked non-normative but I think part of it should be normative. Maybe split into a non-normative introduction and a normative section for the processing/algorithms (from the definition of equality onwards)? And replace normative keywords (may, should, must) in the non-normative part.
|
|
||
| <div w-nodev> | ||
|
|
||
| <span data-x="list item">Items</span> of <code>SanitizerConfig</code>'s <code |
There was a problem hiding this comment.
Done (and changed a bit to use dictionary types)
| <li><p>Duplicates and interactions between global and local lists:</p> | ||
| <ul> | ||
| <li><p>If a global <code data-x="dom-SanitizerConfig-attributes">attributes</code> allow list | ||
| exists, then all element's local lists:</p> |
There was a problem hiding this comment.
The <p>s should be on their own line (with a blank line after) here, it should only be "inline" when it's the only child of li. This applies in more places in the diff.
|
@zcorpan re #12395 (comment): I reordered things a bit so that the non-normative section only applies to the right part. |
Convert the incubated spec in https://wicg.github.io/sanitizer-api/ to the HTML format and make it part of the HTML standard.
(See WHATWG Working Mode: Changes for more details.)
/canvas.html ( diff )
/comms.html ( diff )
/dom.html ( diff )
/dynamic-markup-insertion.html ( diff )
/edits.html ( diff )
/embedded-content-other.html ( diff )
/form-elements.html ( diff )
/forms.html ( diff )
/grouping-content.html ( diff )
/iframe-embed-object.html ( diff )
/image-maps.html ( diff )
/imagebitmap-and-animations.html ( diff )
/index.html ( diff )
/indices.html ( diff )
/infrastructure.html ( diff )
/interaction.html ( diff )
/interactive-elements.html ( diff )
/microdata.html ( diff )
/parsing.html ( diff )
/references.html ( diff )
/rendering.html ( diff )
/sections.html ( diff )
/semantics.html ( diff )
/system-state.html ( diff )
/tables.html ( diff )
/text-level-semantics.html ( diff )
/timers-and-user-prompts.html ( diff )
/web-messaging.html ( diff )
/webstorage.html ( diff )
/workers.html ( diff )