New ATTRIBUTES block#523
Conversation
|
Marking as Ready for Review despite there being some TODOs remaining in the diff. Feedback requested from @gkatsev, @silviapfeiffer, @nigelmegitt, @eric-carlson, @chrisn, @bdougherty, et al. |
|
I think this PR is merge-ready now once a TTWG reviewer approves. The previously included TODO comments are now removed from this PR in favor of addressing that follow-on work in: |
nigelmegitt
left a comment
There was a problem hiding this comment.
I think this would benefit from more clarity (i.e. more text) about the intended use of ATTRIBUTES and its relationship to attributes on the HTML <track> element, and any other potential usage context.
|
Needs a review from the Editor, @gkatsev . |
|
See also another operational practice described for doing this at #485 (comment) in which YouTube is doing it in the Header section. We should probably make sure that whatever we specify here aligns with practice, or if not, that folk using that alternative approach are happy to change. Update: @gkatsev had the action to draft a PR to resolve #485, and that looks to me as though it would overlap strongly with this PR. |
gkatsev
left a comment
There was a problem hiding this comment.
Thanks for the work on this @cookiecrook and for your patience!
What YouTube has definitely seems to align with it. Would be great if we can find someone from there to comment on this, if they could potentially move to the new block when it exists.
I think ultimately, particularly, with HLS's usage of the header, we do still want the header as "point of extensibility" there. But, we may want to discourage new uses of it, since the majority of these use cases could use the ATTRIBUTES block. As an example, the ECMAScript spec added the |
|
@gkatsev wrote:
Yes, I have taken that as an action. |
|
The Timed Text Working Group just discussed
The full IRC log of that discussion<nigel> Subtopic: New ATTRIBUTES block #523<nigel> github: https://github.com//pull/523 <nigel> jcraig: Need to assess, if there's a mismatch, which one wins. <nigel> .. I don't have a strong opinion. <nigel> .. Main reason we want it in the VTT is sometimes there's no HTML as an intermediary, <nigel> .. so the track information is missing. <nigel> gkatsev: That makes sense, my question would be would this also allow a change in HTML <nigel> .. if we're adding precedence to those attributes. <nigel> jcraig: I would expect that whatever we land on, we could have the same thing reflected on the track element <nigel> eric_carlson: We also would want to add some text to the spec describing the rules of precedence, <nigel> .. whichever way we go. <nigel> Nigel: How are you going to work out which way to go? <nigel> eric_carlson: We'll put a stick in the ground and we'll be right. <nigel> gkatsev: My assumption is the track element takes precedence <nigel> eric_carlson: That would be right, it would override what's in the file. <nigel> gkatsev: Can still have the precedence rules in the VTT spec but would also need them in HTML <nigel> eric_carlson: HTML doesn't say anything about how to process the contents of the VTT file, <nigel> .. so maybe no change needed. <nigel> jcraig: The other editorial suggestions seem fine to me. <nigel> .. There was a section question, was just my unfamiliarity with bikeshed. <nigel> gkatsev: Should we open an HTML issue now around this to get the conversation going. <nigel> jcraig: I'll take that as an action and write it into PR 523 so I don't forget it. <nigel> pal: On that last point, if you run into any issues, I can help, please don't hesitate to ask. <nigel> jcraig: You mean pushback on HTML? <nigel> pal: Sure, ultimately the question is who writes those tests. <nigel> .. As Nigel pointed out there's a large body of tests to port to WPT. <nigel> .. With reference renderers once we've overcome the issue of should we have that at all in WPT, <nigel> .. I'm confident that we'll have the resources to make that happen. <nigel> group confusion caused because Pierre misheard! <nigel> gkatsev: With that we can, unless there's anything specific to WebVTT right now we can end <nigel> SUMMARY: @cookiecrook to raise issue on HTML about track attribute precedence |
|
I've also taken an action to file an issue to HTML to propose any reflected attributes from the VTT to HTML Track content attributes. |
|
Open Question @nigelmegitt wrote:
Both XML Name and HTML attribute name can be hyphenated... such as Regardless, I'm happy to take the working group's decision on this point, whatever that is. I'm collating a list of "Open Questions" here for an in-person discussion. Once we wrap up as much as we can in the thread, I'll research and summarize the remaining opens before scheduling more time with the WG. |
Not sure how big a deal this is? People generally work around this kind of pattern in their chosen programming language. For example there are many CSS properties including a hyphen, such as |
|
I still think this should be in the header instead of a new block (see #523 (review) ). |
|
I think we should not use XML Name. WebVTT already borrows HTML syntax and parsing for named character references; it would be inconsistent and unnecessary to use XML syntax for something else in WebVTT. HTML attribute name syntax prevents |
|
@zcorpan wrote:
There are several existing in-the-wild use cases that would become non-conforming if this wasn't a new block. Those existing ambiguous, loosely defined or entirely undefined uses of VTT metadata will continue to persist in perpetuity. I recall a a core TTWG member (@gkatsev IIRC) suggested the [Update: @gkatsev proposed METADATA (# or #:~:text), but either @eric-carlson or I renamed it to ATTRIBUTES to avoid redundancy, since it's primary use will be for |
It's true, the design choices made historically for WebVTT have been extremely XML-unfriendly, so maintaining separation from XML would be consistent. However, since we are primarily talking about metadata, and there may be use cases where folk want to transfer keys and values to/from other syntaxes, it would be good to be as open as possible within those existing constraints to minimise the need to rename keys.
Or remove the requirement for the e.g. |
They will not become non-conforming, they are already non-conforming. The space for headers in the spec was reserved for the standards' own future use, it was not intended for private use (hence it is currently non-conforming to have anything there). In fact, the opposite is true: if we start allowing custom headers, then existing content that has custom headers, assuming the syntax matches what we allow, such content can go from non-conforming to conforming. I think this shouldn't be a goal, though, only that we shouldn't shy away from using the spec's intended space for future extensibility because users of the spec have broken the contract and are using invalid WebVTT.
Yeah the interesting question is whether it would break any existing usage if we add some standardized attributes in the header. If they keys used in the wild are different from the ones we're standardizing, then nothing will break, as far as I can tell. Even if the key is the same, so long as the value is different, still nothing would break. What is the web compat issue? |
IMO this makes it less clear that they are key/value pairs, and it's inconsistent with WebVTT cue settings. |
The I also just remembered that some classification schemas etc for metadata use URNs for keys, which often include |
|
That looks like a spec bug. The syntax for each individual cue setting requires the Edit: filed #541 |
Open Questions for TTWG Meeting Discussion (Date TBD)
|
|
@nigelmegitt @gkatsev @zcorpan Please help if I missed anything in summary of open questions above. Thanks. I know that @eric-carlson would like to attend said meeting, and is out this week. My next availability at the normal TTWG meeting time (8am Pacific) is April 9th or April 23rd. |
|
My next availability is also April 9th. I still need to review the new discussion but should note that I'm leaning towards using the headers space assuming that we allow non-conforming usage like mentioned here #523 (comment) |
|
Agenda for TTWG 2026-04-09 is w3c/ttwg#331, which I'm unable to attend, but have added this as a possible agenda item. 23rd April would work for me, alternatively. |
To clarify, I objected to the claim that existing WebVTT content would go from being conforming to being non-conforming, as it's currently invalid WebVTT to use non-blank lines immediately after the There could be a webcompat concern, but it's separate from whether existing content is conforming or non-conforming. I asked for examples where the new processing of headers would break existing content. The fact that existing content is using headers doesn't mean that that content will break if browsers start to process some known headers with some known values. I think there's only a potential issue if the standardized key and value are exactly the same as what already exists in the wild, but the semantics are different. |
If we don't need to allow characters outside of ASCII, an option is to align with HTML PI target: whatwg/html#12118 Right now it's |
Since the metadata keys may originate or relate to schemas external to WebVTT and to HTML, my view is that only allowing ASCII characters is likely to be overly restrictive. It's also unnecessary to be so limiting. |
|
Since Nigel can't make it on April 9th, can everyone make it on April 23rd? If not, let's go ahead with April 9th. |
|
Yeah, I think the 23rd works a bit better for me as well. We're going to cancel the April 9th meeting and have the agenda item be on the 23rd. |
…e subtitles example.
|
Some comments on this ahead of the meeting.
Originally, I thought a new block would be better to minimize issues, but if we can support both the syntax we want while making existing non-conforming usage conforming (albeit a non-standard usage) , I think that would be best. I'm not sure if there's precedent for this type of thing in the W3C, but for JavaScript, there's some precedent when they standardized the previously non-standard
Is there a need to limit it to ASCII+ specifically? For example, the cue identifier says it's anything except --> and it also specifies that it must be unique, so, theoretically, unicode normalization and what not should already be happening. Lines 1547 to 1553 in 9966609 Also, the VTT Region identifier is similar and disallows spaces. Also, should this be case-insensitive?
I do prefer lang as well. Is there a case to support both and have it normalized internally? Being case-insensitive and supporting both would allow supporting the YT attr without work on their side.
Having a type could imply a metadata type, though, probably not worth adding. If we'd want to extend type for other kinds, it might make it a bit harder.
Yeah, I think I would prefer to err on the permissive side to have less requirements and less parser errors, if it doesn't cause any major issues.
could alternate kinds across different examples instead, maybe? Like, update some of the existing examples with the new attrs. |
|
The Timed Text Working Group just discussed The full IRC log of that discussion<nigel> Topic: WebVTT<nigel> Gary: The main topic is the ATTRIBUTES block PR. <nigel> .. Background is Apple wanting to add a new metadata type for their Dim Flashing Lights feature. <nigel> James: Currently you might see just a warning at the beginning, <nigel> .. it would be better to take action during the content. <nigel> .. The metadata issue and PR are pre-requisites for shipping that feature to other platforms <nigel> .. and to open up VTT metadata to being a broader more reusable metadata structure. <nigel> .. Sounds like we're really close to a resolution here. <nigel> Gary: I think we're close. Thanks for providing a summary of the main open questions. <nigel> .. The main one is whether to be a new block or part of the header location. <nigel> James: [shares window] <nigel> i/Gary: The main/Subtopic: New ATTRIBUTES block <nigel> github: https://github.com//pull/523 <nigel> James: If there's no concern of webcompat then I'm not concerned about not having a new block, <nigel> .. and just moving the newly proposed data into the header. <nigel> Simon: That's my preference, it's what that was intended for. <nigel> Gary: I think it would be good to have the = permitted even if not recommended. <nigel> .. I don't know if there's any precedence for this in W3C specs, but in Javascript the __proto property <nigel> .. was added with a note to tell people not to use it. <nigel> Simon: HTML has similar precedent, for various elements that were never standardised but browsers implemented <nigel> .. were specified in HTML with the "obsolete" status immediately applied. <atai> q+ <nigel> James: Is that a list, or a recommendation against using any prior non-standard way. <nigel> Simon: Here it's using = instead of : as a separator. <nigel> James: Use = as a separator but say it's not recommended. <nigel> ack at <nigel> Andreas: For context, is this the same issue that James presented in TPAC in Seville more than 2 years ago? <nigel> James: Yes <nigel> Andreas: I liked the proposal at the time but haven't followed it. <nigel> .. We made some comments, but in this case will there be a 2 weeks review period so I can check it? <nigel> q+ <zcorpan> q+ <nigel> James: I have some time off next week so can't promise 2 weeks. <nigel> Nigel: [explains TTWG working mode] <nigel> James: I'm planning to summarise the discussion from today and then add it as a comment to the PR <nigel> .. So if people want to clarify any details or I've misunderstood then people can see that. <nigel> .. I'll leave it as a draft PR until then. <nigel> q? <nigel> ack n <nigel> ack zcorpan <nigel> Simon: Case sensitivity: currently WebVTT is mainly case sensitive, so it would be consistent to <nigel> .. be case sensitive here as well. Is there a reason not to be? <nigel> Gary: For the attribute names? <nigel> Simon: Yes <nigel> James: Yes there is pre-existing use by YouTube and others who have capitalised attribute names. <nigel> Simon: Are those attribute names the same as the standard ones we're trying to add? <nigel> Gary: It depends on which one - like kind etc. <nigel> .. The case-insensitive part comes up later in the summary of bullets. <nigel> .. The next point is how we want to define parsing of the attribute names. <nigel> .. Right now what is implemented is ASCII+, other proposals include XML Name or HTML attribute name. <nigel> .. My question is: does it need to be ASCII only? <nigel> .. In WebVTT we already have IDs like cue and region identifier, where any character except --> or space <nigel> .. is permitted. It already specifies that the cue identifier needs to be unique in the file, <nigel> .. so the parser would need to do unicode normalisation and other things to make sure the IDs are unique. <nigel> q? <nigel> James: The only concerns I brought up are bidi and reverse chars, look-alike characters, or zero-width joiners, or emoji variant combinations? <nigel> Simon: At the higher level do we need to make it conforming to use non-standard metadata attributes <nigel> .. in the first place. If not this is moot, we just list the permitted attribute names. <nigel> James: The reason for this is, depending on the type that is chosen, I think it is appropriate for the TTWG <nigel> .. to be the keyholder of what the type is, but once that type is defined it could be some other standards <nigel> .. body that defines the keys. <nigel> .. Maybe we don't need to do that. <nigel> .. At the same time I put in the concept of a custom metadata block that is freeform, not standardisable, <nigel> .. but parsed, so the implementation could use it. <nigel> .. If that's the case we should allow freeform characters of whatever character set we define. <nigel> Gary: I think it would be helpful to allow custom attributes. <nigel> .. The reason we had the previous issue is because WebVTT didn't have an official way to include metadata. <nigel> .. If there was something official then e.g. HLS could have used that for the timestamp map. <nigel> .. Now I wonder if we want to limit the official names, with an x- prefix, to make it easier for us to add new attributes. <nigel> James: Or use the HTML pattern data- <nigel> Simon: Or just a - at the beginning, like in HTML. <nigel> Gary: In that case would it make sense to point to the HTML attribute parsing, or do we still <nigel> .. need to specify the - for custom names. <nigel> Simon: It's not so much that, which we may need to define ourselves, but we can be aligned that <nigel> .. no dash means it's defined by the standard. <nigel> James: For Gary's question, do we want to point it to the HTML attribute name for parsing? <nigel> Simon: I think no. <nigel> .. We need to ban --> for example, although > is not allowed in attribute names, but it's a different rule. <nigel> Gary: We probably want to be similar to how region identifiers are implemented. <nigel> q+ <nigel> Simon: I don't think we need to worry about lookalike characters. These are all possible in identifiers right now <nigel> .. I think. <nigel> Nigel: +1 to aligning with HTML attribute name syntax, because we want to allow the attributes to be reused in HTML. <gkatsev_cloud> q+ <nigel> ack n <nigel> Simon: Non-standard attributes would be invalid HTML unless they begin with data- <nigel> Nigel: OK, imagine people want to push WebVTT attributes into the HTML for their own reasons. <nigel> James: Are you proposing some model for tunnelling these attributes through from VTT to HTML? <nigel> Nigel: We haven't discussed that as a processing model, but I don't think we need to. But we should <nigel> .. make the path easy in case people do want to do that. <nigel> q? <nigel> James: Are we saying any Unicode char except bidi and --> ? <nigel> Gary: I think so, yes. Just remove the "ASCII only" part. <nigel> ack gk <nigel> .. Being case-sensitive is probably easiest. The potential benefit of being case-insensitive is if we were <nigel> .. to allow language for the lang attribute, because that will allow the current YouTube metadata to work <nigel> .. without any further work on their part. <nigel> Simon: Making YouTube's content magically do something should be a non-goal. <nigel> .. They can update that in an afternoon. We shouldn't complicate the language for that. <nigel> .. Maybe that means we should not care about case-insensitive or the = sign either. <nigel> .. If we were using the ATTRIBUTES block then the existing headers would still do nothing in the standard <nigel> .. implementation, so we can decide that if the syntax doesn't match then it still does nothing. <nigel> .. It's not breaking any more than it's broken today if we ignore it. <nigel> James: Follow-on question: if any of these attribute lines is invalid do we invalidate the whole block or just that line? <nigel> Gary: Just that line. Looking at the HLS timestamp map there is a colon used. <nigel> .. It wouldn't match exactly, but the attribute name would not just be the [scribe missed] <nigel> .. The main reason is that it's so ubiquitous in HLS content that having it be valid would be helpful. <nigel> Simon: HLS compat is more compelling than YouTube compat. <nigel> Gary: Yes. If and when we have a follow-up of a GetAttributes() method then we could decide how to handle <nigel> .. non-standard attribute names from being returned. <nigel> .. We could just not worry about them at the moment. <nigel> Simon: If the intent is that they should be usable from Javascript then there needs to be a way for them <nigel> .. to be usable other than fetching and parsing the file again. <nigel> Gary: Yes but that doesn't need to be part of this PR. <nigel> James: Sounds like we're aligned on 2, moving on, to the question of the reserved kay name being lang or language. <nigel> Nigel: I can't recall my original comment, and I can't find it! <nigel> .. Matching HTML attributes in the track element makes sense, I have some recollection that care was needed <nigel> .. for translation subtitles when indicating the language from which they were translated. <nigel> James: I'll see if I can find that comment, but work on the basis of `lang` for now. <nigel> .. Item 4 is about if `kind` is required, or should `kind: metadata` be required in order to use metadata type. <nigel> .. I feel that we should require it but if we only require it with a metadata value then it seems complex, <nigel> .. and I'd appreciate advice on how to do that, if that is what we decide. <nigel> Gary: It doesn't seem worth requiring it, to me. <nigel> .. If you have a type then potentially you could assume kind: metadata but even that seems unnecessary, <nigel> .. because what if in the feature we decide to have multiple types of captions, or something like that. <nigel> .. We could extend it for other kinds as well. <zcorpan> q+ <nigel> James: Then the parsing rule I will attempt to write will be: <nigel> .. if there is a type, then we would need a kind as well, in order to use the type. <nigel> .. if there is a type but no kind then the implementation wouldn't be sure how to use it. <nigel> q+ <nigel> .. It might be some future type with an unidentified kind. <nigel> Simon: Isn't there always a `kind` when sourced from HTML even if it is the default? <nigel> James: Yes, so the implementation can fall back. <nigel> Simon: Yes, the UA will have a kind even if it is the default. <nigel> James: One of the primary use cases is in Apple's services that could be used on other non-Apple hardware <nigel> .. such as the AppleTV app on another brand of TV, or HBO's content on AppleTV hardware. <nigel> .. I feel we need to know this outside the context of just the web. <gkatsev_cloud> q+ <nigel> .. I'll try to write it up logically in the PR. <gkatsev_cloud> ack z <gkatsev_cloud> ack n <zcorpan> q+ <gkatsev_cloud> ack g <nigel> Nigel: Until we define a processing model for all this metadata then we should not restrict anything. <nigel> .. We can add the syntax but then add restrictions later if we define a processing model. <nigel> Gary: I was going to say the same thing. <nigel> Simon: Just to comment on the processing model, at least for the standard attributes that we're adding then <nigel> .. it makes sense to specify the processing model in HTML, then you can use the file-provided attributes <nigel> .. as fallback. Then in WebVTT all you need to do is make that data available so that other specs can look into it. <nigel> James: To make sure I understand "the value provided", the kind attribute in HTML would override the value <nigel> .. in the WebVTT if provided, otherwise the WebVTT value would be used? <nigel> Simon: Potentially, yes. <nigel> James: So there'd be no need for a parser warning? <nigel> Gary: I agree <nigel> Nigel: I'd be more concerned about errors than warnings. I'd only expect a warning if there's a SHOULD <nigel> .. and I don't think there are any here. <nigel> James: I'll check that. <nigel> Gary: If the line is invalid from a parsing perspective we would skip it. <nigel> Simon: But if --> is present then it would be invalid so we would expect some parser message <nigel> Gary: Even then the parser might just ignore it and move on. <nigel> Simon: Sure, sometimes the parser has errors even if it also ignores. <nigel> James: The final point is about the examples, and I'm happy to concede to consensus. <nigel> Gary: You could add subtitles to an existing example. <nigel> James: Fair point. |
Consensus from today's meeting as I understand it.
Consensus of other items discussed:
Please fill in if I've missed anything. Thanks everyone for your time today. |
|
I also plan to close this PR and create a new one with the rewrite because the thread is getting unmanageably long. |
|
Thanks for the summary! Seems accurate to what was discussed. For item 9, there's probably nothing needed on the WebVTT spec side as it'll be part of HTML or related specs. For item 11 follow-up, might be fine to not worry if it starts with a |
Closes #511
Add new ATTRIBUTES block parsing rules and examples as discussed in #511.
This is also a prerequisite for:
…and its duplicate in the WHATWG repo:
whatwg/html#11333
Preview | Diff