[css-syntax] Review requested of new Parsing text · Issue #8834 · w3c/csswg-drafts (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open

tabatkins opened this issue

May 12, 2023

· 21 comments

Comments

@tabatkins

Since we resolved on the new Nesting behavior (try to parse as a declaration, then parse as a rule if that's invalid), I had to do some decent rewriting of Syntax's algorithms, and went ahead and dove into a larger rewrite to clean it up in general. I've implemented the new text in my CSS parser library and the (fairly limited, admittedly) tests I've run look good, but I'd appreciate a larger review.

Significant changes from the previous version:


Aside from allowing the new nesting behavior, all of these changes should be only editorial, with one exception: blocks that previously only contained rules (@media, @keyframes, etc) previously used the "consume a list of rules", but now use the unified "consume a block's contents", which means their error-recovery in the face of semicolons changes.

For example, @media { garbage; bar {...} } previously would contain a style rule with a garbage; bar selector. (This is what happens at the top-level of a stylesheet, still.) Now the rule's selector will be just bar, since the garbage; part will get dropped as an invalid declaration. This means that rules which were accidentally invalid and dropped due to garbage might now be valid, if there's a semicolon separating them from preceding garbage.

I suspect this is fine, and I'd really like it to be, because it means the overall parsing behavior doesn't need to branch on grammar knowledge (and thus, whether a rule is known or unknown won't change its generic parsing). It used to be the case that parsing depended on this kind of knowledge, and it was super awkward to use in tooling. Also, it means that parsing doesn't change between a top-level @media and a nested one, except for declarations becoming valid instead of invalid; all the rules inside of the @media remain precisely the same.

But if necessary, we can hardcode some at-rules to trigger a different parsing behavior that preserves backwards compatibility more completely.

(Technically parsing in general depends on grammar knowledge anyway, since you need to know whether a declaration is valid in a given context to tell if you should try and redo parsing as a rule. But it turns out there's a simple and reliable rule you can use generically to get approximately the right behavior without having to know anything about grammars.)

@tabatkins

Agenda+ to get approval on this, and ask for a new publication of Syntax.

tabatkins added a commit that referenced this issue

May 18, 2023

@tabatkins

@tabatkins

@cdoublev Please don't delete your comments, it's very confusing reading an email, coming here to answer the questions, and finding there's no comment anymore.

Restoring the comment:

In examples 12 and 13, I do not understand why @keyframes is not defined with <qualified-rule-list>, and why @page is defined with <declaration-rule-list> (in which at-rules - like margin rules - are automatically invalid).

Ah, it's because I'm dumb. @keyframes definitely should be <qualified-rule-list>, and I misdefined <declaration-rule-list> - it should allow declarations and at-rules, and disallow qualified rules. Both fixed now.

I am not sure what type of item represents <block-contents> and its specific sub-productions: is it a <simple-block> (ie. a component) or an undefined structure (containing declarations and/or rules)? The return value of consume a block's contents is a bit handwavey. But this may be intentional...

The return value isn't at all handwavey - it's a list of declarations and a list of rules. <block-contents> represents the contents of a block; it is not the block itself (it doesn't represent the wrapping {}). These productions aren't substantially different from what was there before, I just reshuffled them a little bit.

@cdoublev

Please don't delete your comments, it's very confusing reading an email, coming here to answer the questions, and finding there's no comment anymore.

Sorry about that. My intent was to avoid hijacking the thread with a minor mistake and something I was wrong about, indeed.


<declaration-at-rule-list> is perhaps clearer than <declaration-rule-list>:

<declaration-rule-list>: declarations and at-rules are allowed; qualified-rules are automatically invalid.

...and more consistent with <at-rule-list>.


A link to consume a block's contents in the description of <block-contents> and its subproductions might also be usefull, perhaps in this paragraph:

The CSS parser is agnostic as to the contents of blocks—they’re all parsed with the same algorithm, and differentiate themselves solely by what constructs are valid.


5.4.5. Parse a list of declarations

To parse a block’s contents from input:

I think it should be "To parse a list of declarations from input:".

@tabatkins

My intent was to avoid hijacking the thread with a minor mistake and something I was wrong about, indeed.

If you'd been wrong about it I wouldn't have said anything, but I had to repost your comment because you did catch some mistakes.

A link to consume a block's contents

Done.

I think it should be "To parse a list of declarations from input:".

No, it's the heading that I didn't update. I fixed the algo name and the heading's own ID, but not the heading's text. ^_^ Fixed.

@emilio

I think we should treat error recovery in nested and unnested rules the same (#7961 (comment)), but other than that the new text looks good to me.

I've enabled this in Firefox Nightly in bug 1835066, fwiw.

@tabatkins

@emilio I'm a little confused - the error recovery is the same. I noted that as the only (non-Nesting related) normative change, because switching the algos around meant that I was changing the non-nested error-recovery (to match the nested error recovery).

@tabatkins

Unless you're referring to wanting to change top-level error-recovery to be the same as what you get inside of rule bodies? (afaict this isn't what you were asking for in your linked comment, but I can't tell what you're asking for otherwise)

@emilio

No, I meant that as long as we assure that all rule bodies recover errors the same regardless of whether we're nested on a style rule, we're good I believe.

Which is the case with the current spec text AFAICT, I just wanted to explicitly point out that, since that was one of the breaking changes I noticed while implementing this, sorry if I wasn't clear on that :)

@emilio

Sorry, I guess I shouldn't have worded it with "other than that" :)

@css-meeting-bot

The CSS Working Group just discussed [css-syntax] Review requested of new Parsing text, and agreed to the following:

@zstarvit

I don't know if this is particularly the place to make a public comment, but I wanted to make sure that y'all are aware for the spec that there is legacy CSS code floating around that leans on certain CSS property declarations being invalid as a load bearing compatibility """feature""".

For example, older versions of Bootstrap contain declarations like *width: 100%; with the full understanding that *width is an invalid property and won't do anything on most browsers, but does do something in some browsers, notably IE7.

@tabatkins

Yes, that's very specifically still supported. It will fail to parse as both a property and a rule, and stop at the semicolon, exactly as it does today.

@cdoublev

Can you please clarify (not necessarily in the spec) the meaning of valid in the current context in consume a declaration, consume an at-rule, consume a qualified rule.

I first thought it was about what rules/properties/descriptors are accepted in the style sheet or rule:

The CSS parser is agnostic as to the contents of blocks [...].

Accompanying prose must define what is valid and invalid in this context.

But then I wondered if their grammar must also be valid, and how these two validations would address the new needs for parsing nested style rules.

Does it correspond to the validations in isValidInContext() in your CSS parser?

@tabatkins

I'm not sure how to clarify it further than what's already there. The current parsing context (some declaration or rule, possible with parent rules, etc) determines whether any given thing is valid, according to specific grammars and/or prose in the relevant specs.

The function in my CSS parser library is just a generic, relatively dumb version of this that will usually correctly tell apart a rule from a declaration, without knowing anything about what is actually allowed. You just need something that can spot that a:hover { color: blue; } is not a valid declaration, so it can get parsed properly as a rule.

@cdoublev

You just need something that can spot that a:hover { color: blue; } is not a valid declaration, so it can get parsed properly as a rule.

Understood, thanks.

Assuming that consume an at-rule receives @import following a style rule, it would return nothing if valid in the context includes looking at the rules preceding @import. But CSSStyleSheet.insertRule() involves consume an at-rule and insert a CSS rule, which must throw HierarchyRequestError in this situation.

@tabatkins

Yeah. It's purposely somewhat handwavy about what "current context" is, because it's obvious in context what the necessary behavior is in any given algo and there's no real need to lock it down more precisely. It just needs a way to fail on invalid stuff.

@cdoublev

I am not sure what type of item represents <block-contents> and its specific sub-productions: is it a simple block (ie. a component) or an undefined structure (containing declarations and/or rules)? The return value of consume a block's contents is a bit handwavey. But this may be intentional...

The return value isn't at all handwavey - it's a list of declarations and a list of rules. <block-contents> represents the contents of a block; it is not the block itself (it doesn't represent the wrapping {}). These productions aren't substantially different from what was there before, I just reshuffled them a little bit.

I think my point was that I find a bit inconsistent that the algorithms for consuming rules no longer process one level at a time, leaving lower levels as a list of component values. I would expect to process the block with parse a block’s contents before checking its grammar:

The CSS parser is agnostic as to the contents of blocks—they’re all parsed with the same algorithm

But it expects a list of component values, as it is only intended for an independent block, eg. style.cssText. It cannot handle a block resulting from parse a rule.

When rule blocks were consumed with consume a simple block while consuming a rule, the resulting simple block value was a list of component values to consume with the appropriate algorithm before checking the grammar.

Anyway, it is probably not important.

@cdoublev

You just need something that can spot that a:hover { color: blue; } is not a valid declaration, so it can get parsed properly as a rule.

Isn't it possible to implement something that can spot this in consume a declaration and consume a list of component values?

The former would run the latter with a flag when the declaration name is not a custom property. When this flag is set, consume a list of component values would return nothing when it processes <{-token>. In turn, consume a declaration would consume the remnants of a bad declaration and return nothing.

edit: 77e9601 now fulfills this requirement, so there may no longer be a need for consume a declaration, consume an at-rule, consume a qualified rule, to require that the declaration or rule be valid in the current context.

@tabatkins

I think my point was that I find a bit inconsistent that the algorithms for consuming rules no longer process one level at a time, leaving lower levels as a list of component values.

The only reason the old spec did so was because I didn't have a unified parsing model for all possible block contents. You had to know whether to parse a block's contents as "at-rules and declarations" or "at-rules and qualified rules", and that depended on what type of rule the block was in; it was easier to write the spec to have it delay this decision until later, when doing grammar-checking.

The new rules don't actually require having any grammar knowledge, because we've baked some restrictions into the grammar itself, so I was able to unify the two algorithms into one. (Knowing the grammar just lets you parse more efficiently; it doesn't change the end result.) So there was no longer any need to delay the parsing, and we could just get everything done in one pass.

But it expects a list of component values, as it is only intended for an independent block, eg. style.cssText. It cannot handle a block resulting from parse a rule.

Correct, but I'm not sure why that's relevant. The "parse" algorithms are the high-level algos called by other specs, while the "consume" algorithms are the low-level things that actually do the parsing. "Parse" algorithms call into "consume" ones; not vice versa.

so there may no longer be a need for consume a declaration, consume an at-rule, consume a qualified rule, to require that the declaration or rule be valid in the current context.

In theory, you can indeed remove the check from all of these places. You could just check, in "consume a block's contents", whether the returned declaration contains an improperly-used {}-block, and then do all the rest of the validity checking after parsing. But that wouldn't be very efficient; every top-level rule that starts like div:hover would consume the entire rest of the stylesheet as a declaration, then the parser would realize it contains a {}-block in the wrong place, and re-parse a single rule from it. Then do the same thing again, and again, and again. The end result would be identical to one that checks validity as it goes, as the spec currently says, it would just be massively less efficient.

So, it's better to write the spec in a way that guides implementations to do the better thing. This also ensures that if we did accidentally make a change that would cause the "check validity now" and "check validity later" impls to diverge, the "check validity now" impls would stay correct; the spec wouldn't be accidentally requiring impls to change to the less efficient method.

@cdoublev

[_The "parse" algorithms do not expect high level objects as input_]

Correct, but I'm not sure why that's relevant. [...]

I thought that the "parse [a CSS]" algos could be used recursively. This is no longer possible with the unified model, because nested contents are already consumed.

You also cannot use parse something according to a CSS grammar with a <block-contents> subproduction anymore, because this entry point consumes a list of component values, which consumes simple blocks, whereas consume a block's contents expects tokens and consumes lists of rules and declarations.


every top-level rule that starts like div:hover would consume the entire rest of the stylesheet as a declaration, then the parser would realize it contains a {}-block in the wrong place

In my comment, I suggested to stop consuming a list of component values (for a non-custom property value) when encoutering the { following hover in the declaration value. However, I was wrong about consuming the remnants of the (bad) declaration: the algo would just return nothing, and consume a block's contents would backtrack to div and parse a rule.


But for both of these concerns, the current spec is fine with me. I understand that this is not a detailed implementation guide. When you try to conform to it as closely as possible, it can just be tricky to put pieces together sometimes.

Thank you for your clarifications.

@AtkinsSJ

I hope feedback is still welcome! I have a few notes from implementing this myself last week: