Rewrite the compiletest directive parser by tgross35 · Pull Request #128070 · rust-lang/rust (original) (raw)

Just some quick replies:

I have not yet looked at your PR specifics (intentionally as aforementioned)

Well you can look at it 😆 only thing is the code isn't as clean as it could be - functions not in logical ordering, rough namings, TODOs. itemlist.rs (the initial parser) is reasonably complete, prepare.rs (next stage) slightly less so.

but here are some of the thoughts I had previously (I will check against this when I actually look at your solution sketch):

* I think we want a clean **phase separation**:
 
 * _Explanation_: we probably want to separate out the various steps involved in eventually running a concrete test. That is, instead of the current entangled logic with test collection / building / running / output processing / status calculation etc. all kinda intermingled together, we may want to "quarantine" each step out. I really want the steps to be testable in isolation (as unit tests) as well as collectively (as smoke tests / integration tests). Currently, there are "early" props (directives) and "late" directives, which is extra confusing to follow. The late props might also only fail when you actually try to run the test, I'm not too sure.
 * _Key idea_: we want directive handling to be **easy to test** and **easy to follow**.

With respect to this, are you only talking about handling of the directives within compiletest or do you mean changes to the tests themselves? Is this just a separation of Config/Testprops-like structures?

At least some of this can probably be accomplished here as far as making it easier to split things out, but I think I might need an example to see exactly what you mean.

* I think we probably want test suites (including child test runners) to **declaratively register** which directives are supported by the test suite.
 
 * _Remark_: this may want to have **compiletest-generic** preset directives, **test mode preset** directives versus **test suite specific** directives

Totally possible, this can be filtered either at parse time or as a pass at validation time. At its basic level, this can just be a pass that populates test-specific Config from Items, and rejects anything that doesn't apply to that test.

* We want directive handling to be **robust**, **strict** and have **good error reporting**.
 
 * _Remark_: I've wanted to change the `ignore-*`/`only-*` directives for targets to become `ignore-target-*`, because it's possible that in `ignore-foo` `foo` could be both a target substring and another ignore kind, then you have ambiguity which is resolved by either failing or somehow having a precedence (neither of which are ideal).
 * _Remark_: some directive syntaxes are not documented at all, have ambiguities, or are not robust (consider revisions).

* We want to **fail-fast** and give **helpful suggestions** where possible.

I think that the prefix directives could probably use some thought about what should be accepted. Maybe they take some other form like //@ ignore: x86.

However, regarding fail-fast and helpful suggestions: yes, huge goal here! E.g. typing //@ rustc-en will recommend //@ rustc-env, the matcher for UI directives tries to find things that look like like directives but aren't (and otherwise would be skipped), and a list of all syntax errors in the file is reported early.

* We want directives to be **documented**: their syntax and behavior.

As in, autogenerated? I had a few ideas for this but didn't yet implement it.

* We might not want to use **regex** where not needed -- or change the syntax such that regex _isn't_ needed -- it's very hard to understand at first glance and is very subtle.

Here, regex is limited to a few places that have a somewhat wide range of valid inputs:

UI directives
Filecheck directives
Revision name validity checking

Everything else is just standard string operations.

 * _Remark_: as I understand it, we may have directives which want syntax that are not necessarily _regular_ (in the simple regular expression sense, not PCRE sense) that may warrant improved parsing because we may want to support escapes. A lot of the directives use split-by-whitespace, which does not handle string escapes!

Could you give an example of this, assuming there is more than just normalize? Plugging in a separate splitter function sounds easy enough, currently splitting on whitespace is only used for revision names (but I also haven't implemented a lot of the areas that may make use of it).