Test slice patterns more by matthewjasper · Pull Request #67467 · rust-lang/rust (original) (raw)

…=matthewjasper

Stabilize #![feature(slice_patterns)] in 1.42.0

Stabilization report

The following is the stabilization report for #![feature(slice_patterns)]. This report is the collaborative effort of @matthewjasper and @Centril.

Tracking issue: rust-lang#62254 Version target: 1.42 (2020-01-30 => beta, 2020-03-12 => stable).

Backstory: slice patterns

It is already possible to use slice patterns on stable Rust to match on arrays and slices. For example, to match on a slice, you may write:

fn foo(slice: &[&str]) {
    match slice {
        [] => { dbg!() }
        [a] => { dbg!(a); }
        [a, b] => { dbg!(a, b); }
        _ => {}
    //  ^ Fallback -- necessary because the length is unknown!
    }
}

To match on an array, you may instead write:

fn bar([a, b, c]: [u8; 3]) {}
//     --------- Length is known, so pattern is irrefutable.

However, on stable Rust, it is not yet possible to match on a subslice or subarray.

A quick user guide: Subslice patterns

The ability to match on a subslice or subarray is gated under #![feature(slice_patterns)] and is what is proposed for stabilization here.

The syntax of subslice patterns

Subslice / subarray patterns come in two flavors syntactically.

Common to both flavors is they use the token .., referred as a "rest pattern" in a pattern context. This rest pattern functions as a variable-length pattern, matching whatever amount of elements that haven't been matched already before and after.

When .. is used syntactically as an element of a slice-pattern, either directly (1), or as part of a binding pattern (2), it becomes a subslice pattern.

On stable Rust, a rest pattern .. can also be used in a tuple or tuple-struct pattern with let (x, ..) = (1, 2, 3); and let TS(x, ..) = TS(1, 2, 3); respectively.

(1) Matching on a subslice without binding it

fn base(string: &str) -> u8 {
    match string.as_bytes() {
        [b'0', b'x', ..] => 16,
        [b'0', b'o', ..] => 8,
        [b'0', b'b', ..] => 2,
        _ => 10,
    }
}

fn main() {
    assert_eq!(base("0xFF"), 16);
    assert_eq!(base("0x"), 16);
}

In the function base, the pattern [b'0', b'x', ..] will match on any byte-string slice with the prefix 0x. Note that .. may match on nothing, so 0x is a valid match.

(2) Binding a subslice:

fn main() {
    #[derive(PartialEq, Debug)]
    struct X(u8);
    let xs: Vec<X> = vec![X(0), X(1), X(2)];

    if let [start @ .., end] = &*xs {
        //              --- bind on last element, assuming there is one.
        //  ---------- bind the initial elements, if there are any.
        assert_eq!(start, &[X(0), X(1)] as &[X]);
        assert_eq!(end, &X(2));
        let _: &[X] = start;
        let _: &X = end;
    }
}

In this case, [start @ .., end] will match any non-empty slice, binding the last element to end and any elements before that to start. Note in particular that, as above, start may match on the empty slice.

Only one `..` per slice pattern

In today's stable Rust, a tuple (struct) pattern (a, b, c) can only have one subtuple pattern (e.g., (a, .., c)). That is, if there is a rest pattern, it may only occur once. Any .. that follow, as in e.g., (a, .., b, ..) will cause an error, as there is no way for the compiler to know what b applies to. This rule also applies to slice patterns. That is, you may also not write [a, .., b, ..].

Motivation

[PR rust-lang#67569]: https://github.com/rust-lang/rust/pull/67569/files

Slice patterns provide a natural and efficient way to pattern match on slices and arrays. This is particularly useful as slices and arrays are quite a common occurence in modern software targeting modern hardware. However, as aforementioned, it's not yet possible to perform incomplete matches, which is seen in fn base, an example taken from the rustc codebase itself. This is where subslice patterns come in and extend slice patterns with the natural syntax xs @ .. and .., where the latter is already used for tuples and tuple structs. As an example of how subslice patterns can be used to clean up code, we have [PR rust-lang#67569]. In this PR, slice patterns enabled us to improve readability and reduce unsafety, at no loss to performance.

Technical specification

Grammar

The following specification is a sub-set of the grammar necessary to explain what interests us here. Note that stabilizing subslice patterns does not alter the stable grammar. The stabilization contains purely semantic changes.

Binding = reference:"ref"? mutable:"mut"? name:IDENT;

Pat =
  | ... // elided
  | Rest: ".."
  | Binding:{ binding:Binding { "@" subpat:Pat }? }
  | Slice:{ "[" elems:Pat* %% "," "]" }
  | Paren:{ "(" pat:Pat ")" }
  | Tuple:{ path:Path? "(" elems:Pat* &% "," ")" }
  ;

Notes:

(..) is interpreted as a Tuple, not a Paren. This means that [a, (..)] is interpreted as Slice[Binding(a), Tuple[Rest]] and not Slice[Binding(a), Paren(Rest)].

Name resolution

A slice pattern is resolved as a product context and .. is given no special treatment.

Abstract syntax of slice patterns

The abstract syntax (HIR level) is defined like so:

enum PatKind {
    ... // Other unimportant stuff.
    Wild,
    Binding {
        binding: Binding,
        subpat: Option<Pat>,
    },
    Slice {
        before: List<Pat>,
        slice: Option<Pat>,
        after: List<Pat>,
    },
}

The executable definition is found in hir::PatKind.

Lowering to abstract syntax

Lowering a slice pattern to its abstract syntax proceeds by:

Lowering each element pattern of the slice pattern, where:
1. .. is lowered to _, recording that it was a subslice pattern,
2. binding @ .. is lowered to binding @ _, recording that it was a subslice pattern,
3. and all other patterns are lowered as normal, recording that it was not a subslice pattern.
Taking all lowered elements until the first subslice pattern.
Take all following elements.

If there are any,
1. The head is the sub-slice pattern.
2. The tail (after) must not contain a subslice pattern, or an error occurs.

The full executable definition can be found in LoweringContext::lower_pat_slice.

Type checking slice patterns

Default binding modes

A slice pattern is a non-reference pattern as defined in is_non_ref_pat. This means that when type checking a slice pattern, as many immediate reference types are peeled off from the expected type as possible and the default binding mode is adjusted to by-reference before checking the slice pattern. See rust-lang#63118 for an algorithmic description.

See RFC 2359's guide-level explanation and the tests listed below for examples of what effect this has.

Checking the pattern

Type checking a slice pattern proceeds as follows:

Resolve any type variables by a single level. If the result still is a type variable, error.
Determine the expected type for any subslice pattern (slice_ty) and for elements (inner_ty) depending on the expected type.
1. If the expected type is an array ([E; N]):
  1. Evaluate the length of the array. If the length couldn't be evaluated, error. This may occur when we have e.g., const N: usize. Now N is known.
  2. If there is no sub-slice pattern, check len(before) == N, and otherwise error.
  3. Otherwise, set S = N - len(before) - len(after), and check N >= 0 and otherwise error. Set slice_ty = [E; S].
  Set inner_ty = E.
2. If the expected type is a slice ([E]), set inner_ty = E and slice_ty = [E].
3. Otherwise, error.
Check each element in before and after against inner_ty.
If it exists, check slice against slice_ty.

For an executable definition, see check_pat_slice.

Typed abstract syntax of slice and array patterns

The typed abstract syntax (HAIR level) is defined like so:

enum PatKind {
    ... // Other unimportant stuff.
    Wild,
    Binding {
        ... // Elided.
    }
    Slice {
        prefix: List<Pat>,
        slice: Option<Pat>,
        suffix: List<Pat>,
    },
    Array {
        prefix: List<Pat>,
        slice: Option<Pat>,
        suffix: List<Pat>,
    },
}

The executable definition is found in hair::pattern::PatKind.

Lowering to typed abstract syntax

Lowering a slice pattern to its typed abstract syntax proceeds by:

Lowering each pattern in before into prefix.
Lowering the slice, if it exists, into slice.
1. A Wild pattern in abstract syntax is lowered to Wild.
2. A Binding pattern in abstract syntax is lowered to Binding { .. }.
Lowering each pattern in after into after.
If the type is [E; N], construct PatKind::Array { prefix, slice, after }, otherwise PatKind::Slice { prefix, slice, after }.

The executable definition is found in PatCtxt::slice_or_array_pattern.

Exhaustiveness checking

Let E be the element type of a slice or array.

For array types, [E; N] with a known length N, the full set of constructors required for an exahustive match is the sequence ctors(E)^N where ctors denotes the constructors required for an exhaustive match of E.
Otherwise, for slice types [E], or for an array type with an unknown length [E; ?L], the full set of constructors is the infinite sequence ⋃_i=0^∞ ctors(E)^i. This entails that an exhaustive match without a cover-all pattern (e.g. _ or binding) or a subslice pattern (e.g., [..] or [_, _, ..]) is impossible.
PatKind::{Slice, Array}(prefix, None, suffix @ []) cover a sequence of of len(prefix) covered by patterns. Note that suffix.len() > 0 with slice == None is unrepresentable.
PatKind::{Slice, Array}(prefix, Some(s), suffix) cover a sequence with prefix as the start and suffix as the end and where len(prefix) + len(suffix) <= len(sequence). The .. in the middle is interpreted as an unbounded number of _s in terms of exhaustiveness checking.

MIR representation

The relevant MIR representation for the lowering into MIR, which is discussed in the next section, includes:

enum Rvalue {
    // ...
    /// The length of a `[X]` or `[X; N]` value.
    Len(Place),
}

struct Place {
    base: PlaceBase,
    projection: List<PlaceElem>,
}

enum ProjectionElem {
    // ...
    ConstantIndex {
        offset: Nat,
        min_length: Nat,
        from_end: bool,
    },
    Subslice {
        from: Nat,
        to: Nat,
        from_end: bool,
    },
}

Lowering to MIR

For a slice pattern matching a slice, where the pattern has N elements specified, there is a check that the Rvalue::Len of the slice is at least N to decide if the pattern can match.
There are two kinds of ProjectionElem used for slice patterns:
1. ProjectionElem::ConstantIndex is an array or slice element with a known index. As a shorthand it's written base[offset of min_length] if from_end is false and base[-offset of min_length] if from_end is true. base[-offset of min_length] is the len(base) - offsetth element of base.
2. ProjectionElem::Subslice is a subslice of an array or slice with known bounds. As a shorthand it's written base[from..to] if from_end is false and base[from:-to] if from_end is true. base[from:-to] is the subslice base[from..len(base) - to].
- Note that ProjectionElem::Index is used for indexing expressions, but not for slice patterns. It's written base[idx].
When binding an array pattern, any individual element binding is lowered to an assignment or borrow of base[offset of len] where offset is the element's index in the array and len is the array's length.
When binding a slice pattern, let N be the number of elements that have patterns. Elements before the subslice pattern (prefix) are lowered to base[offset of N] where offset is the element's index from the start. Elements after the subslice pattern (suffix) are lowered to base[-offset of N] where offset is the element's index from the end, plus 1.
Subslices of arrays are lowered to base[from..to] where from is the number of elements before the subslice pattern and to = len(array) - len(suffix) is the length of the array minus the number of elements after the subslice pattern.
Subslices of slices are lowered to base[from:-to] where from is the number of elements before the subslice pattern (len(prefix)) and to is the number of elements after the subslice pattern (len(suffix)).

Safety and const checking

Subslice patterns do not introduce any new unsafe operations.
As subslice patterns for arrays are irrefutable, they are allowed in const contexts. As are [..] and [ref y @ ..] patterns for slices. However, ref mut bindings are only allowed with feature(const_mut_refs) for now.
As other subslice patterns for slices require a match, if let, or while let, they are only allowed with feature(const_if_match, const_fn) for now.
Subslice patterns may occur in promoted constants.

Borrow and move checking

A subslice pattern can be moved from if it has an array type [E; N] and the parent array can be moved from.
Moving from an array subslice pattern moves from all of the elements of the array within the subslice.
- If the subslice contains at least one element, this means that dynamic indexing (arr[idx]) is no longer allowed on the array.
- The array can be reinitialized and can still be matched with another slice pattern that uses a disjoint set of elements.
A subslice pattern can be mutably borrowed if the parent array/slice can be mutably borrowed.
When determining whether an access conflicts with a borrow and at least one is a slice pattern:
- x[from..to] always conflicts with x and x[idx] (where idx is a variable).
- x[from..to] conflicts with x[idx of len] if from <= idx and idx < to (that is, idx ∈ from..to).
- x[from..to] conflicts with x[from2..to2] if from < to2 and from2 < to (that is, (from..to) ∩ (from2..to2) ≠ ∅).
- x[from:-to] always conflicts with x, x[idx], and x[from2:-to2].
- x[from:-to] conflicts with x[idx of len] if from <= idx.
- x[from:-to] conflicts with x[-idx of len] if to < idx.
A constant index from the end conflicts with other elements as follows:
- x[-idx of len] always conflicts with x and x[idx].
- x[-idx of len] conflicts with x[-idx2 of len2] if idx == idx2.
- x[-idx of len] conflicts with x[idx2 of len2] if idx + idx2 >= max(len, len2).

Tests

The tests can be primarily seen in the PR itself. Here are some of them:

Parsing (3)

Testing that .. patterns are syntactically allowed in all pattern contexts (2)
- pattern/rest-pat-syntactic.rs
- ignore-all-the-things.rs
Slice patterns allow a trailing comma, including after .. (1)
- trailing-comma.rs

Lowering (2)

@ .. isn't allowed outside of slice patterns and only allowed once in each pattern (1)
- pattern/rest-pat-semantic-disallowed.rs
Mulitple .. patterns are not allowed (1)
- parser/match-vec-invalid.rs

Type checking (5)

Default binding modes apply to slice patterns (2)
- rfc-2005-default-binding-mode/slice.rs
- rfcs/rfc-2005-default-binding-mode/slice.rs
Array patterns cannot have more elements in the pattern than in the array (2)
- match/match-vec-mismatch.rs
- error-codes/E0528.rs
Array subslice patterns have array types (1)
- array-slice-vec/subslice-patterns-pass.rs

Misc (1)

Exercising a case where const-prop cased an ICE (1)
- consts/const_prop_slice_pat_ice.rs

History

2012-12-08, commit rust-lang@1968cb3 Author: Jakub Wieczorek Reviewers: @graydon

This is where slice patterns were first implemented. It is particularly instructive to read the vec-tail-matching.rs test.
2013-08-20, issue rust-lang#8636 Author: @huonw Fixed by @mikhail-m1 in rust-lang#51894

The issue describes a problem wherein the borrow-checker would not consider disjointness when checking mutable references in slice patterns.
2014-09-03, RFC rust-lang/rfcs#164 Author: @brson Reviewers: The Core Team

The RFC decided to feature gate slice patterns due to concerns over lack of oversight and the exhaustiveness checking logic not having seen much love. Since then, the exhaustivenss checking algorithm, in particular for slice patterns, has been substantially refactored and tests have been added.
2014-09-03, RFC rust-lang/rfcs#202 Author: @krdln Reviewers: The Core Team

Change syntax of subslices matching from ..xs to xs.. to be more consistent with the rest of the language and allow future backwards compatible improvements.

In 2019, rust-lang/rfcs#2359 changed the syntax again in favor of .. and xs @ ...
2014-09-08, PR rust-lang#17052 Author: @pcwalton Reviewers: @alexcrichton and @sfackler

This implemented the feature gating as specified in rust-lang/rfcs#164.
2015-03-06, RFC rust-lang/rfcs#495 Author: @P1start Reviewers: The Core Team

The RFC changed array and slice patterns like so:
- Made them only match on arrays ([T; N]) and slice types ([T]), not references to slice types (& mut? [T]).
- Made subslice matching yield a value of type [T; N] or [T], not & mut? [T].
- Allowed multiple mutable references to be made to different parts of the same array or slice in array patterns.
These changes were made to fit with the introduction of DSTs like [T] as well as with e.g. box [a, b, c] (Box<[T]>) in the future. All points remain true today, in particular with the advent of default binding modes.
2015-03-22, PR rust-lang#23361 Author: @petrochenkov Reviewers: Unknown

The PR adjusted codegen ("trans") such that let ref a = *"abcdef" would no longer ICE, paving the way for rust-lang/rfcs#495.
2015-05-28, PR rust-lang#23794 Author: @brson Reviewers: @nrc

The PR feature gated slice patterns in more contexts.
2016-06-09, PR rust-lang#32202 Author: @arielb1 Reviewers: @eddyb and @nikomatsakis

This implemented RFC rust-lang/rfcs#495 via a MIR based implementation fixing some bugs.
2016-09-16, PR rust-lang#36353 Author: @arielb1 Reviewers: @nagisa, @pnkfelix, and @nikomatsakis

The PR made move-checker improvements prohibiting moves out of slices.
2018-02-17, PR rust-lang#47926 Author: @mikhail-m1 Reviewers: @nikomatsakis

This added the UniformArrayMoveOut which converted move-out-from-array by Subslice and ConstIndex {.., from_end: true } to ConstIndex move out(s) from the beginning of the array. This fixed some problems with the MIR borrow-checker and drop-elaboration of arrays.

Unfortunately, the transformation ultimately proved insufficient for soundness and was removed and replaced in rust-lang#66650.
2018-02-19, PR rust-lang#48355 Author: @mikhail-m1 Reviewers: @nikomatsakis

After rust-lang#47926, this restored some MIR optimizations after drop-elaboration and borrow-checking.
2018-03-20, PR rust-lang#48516 Author: @petrochenkov Reviewers: @nikomatsakis

This stabilized fixed length slice patterns [a, b, c] without variable length subslices and moved subslice patterns into #![feature(slice_patterns). See rust-lang#48836 wherein the language team accepted the proposal to stabilize.
2018-07-06, PR rust-lang#51894 Author: @mikhail-m1 Reviewers: @nikomatsakis

rust-lang#8636 was fixed such that the borrow-checker would consider disjointness with respect to mutable references in slice patterns.
2019-06-30, RFC rust-lang/rfcs#2359 Author: @petrochenkov Reviewers: The Language Team

The RFC switched the syntax of subslice patterns to {$binding @}? .. as opposed to .. $pat? (which was what the RFC originally proposed). This RFC reignited the work towards finishing the implementation and the testing of slice patterns which eventually lead to this stabilization proposal.
2019-06-30, RFC rust-lang/rfcs#2707 Author: @petrochenkov Reviewers: The Language Team

This RFC built upon rust-lang/rfcs#2359 turning .. into a full-fledged pattern (Pat |= Rest:".." ;), as opposed to a special part of slice and tuple patterns, moving previously syntactic restrictions into semantic ones.
2019-07-03, PR rust-lang#62255 Author: @Centril Reviewers: @varkor

This closed the old tracking issue (rust-lang#23121) in favor of the new one (rust-lang#62254) due to the new RFCs having been accepted.
2019-07-28, PR rust-lang#62550 Author: @Centril Reviewers: @petrochenkov and @eddyb

Implemented RFCs rust-lang/rfcs#2707 and rust-lang/rfcs#2359 by introducing the .. syntactic rest pattern form as well as changing the lowering to subslice and subtuple patterns and the necessary semantic restrictions as per the RFCs.

Moreover, the parser was cleaned up to use a more generic framework for parsing sequences of things. This framework was employed in parsing slice patterns.

Finally, the PR introduced parser recovery for half-open ranges (e.g., ..X, ..=X, and X..), demonstrating in practice that the RFCs proposed syntax will enable half-open ranges if we want to add those (which is done in rust-lang#67258).
2019-07-30, PR rust-lang#63111 Author: @Centril Reviewers: @estebank

Added a test which comprehensively exercised the parsing of .. rest patterns. That is, the PR exercised the specification in rust-lang/rfcs#2707. Moreover, a test was added for the semantic restrictions noted in the RFC.
2019-07-31, PR rust-lang#63129 Author: @Centril Reviewers: @oli-obk

Hardened the test-suite for subslice and subarray patterns with a run-pass tests. This test exercises both type checking and dynamic semantics.
2019-09-15, PR rust-lang/rust-analyzer#1848 Author: @ecstatic-morse Reviewers: @matklad

This implemented the syntactic change (rest patterns, ..) in rust-analyzer.
2019-11-05, PR rust-lang#65874 Author: @Nadrieril Reviewers: @varkor, @arielb1, and @Centril

Usefulness / exhaustiveness checking saw a major refactoring clarifying the analysis by emphasizing that each row of the matrix can be seen as a sort of stack from which we pop constructors.
2019-11-12, PR rust-lang#66129 Author: @Nadrieril Reviewers: @varkor, @Centril, and @estebank

Usefulness / exhaustiveness checking of slice patterns were refactored in favor of clearer code. Before the PR, variable-length slice patterns were eagerly expanded into a union of fixed-length slices. They now have their own special constructor, which allows expanding them more lazily. As a side-effect, this improved diagnostics. Moreover, the test suite for exhaustiveness checking of slice patterns was hardened.
2019-11-20, PR rust-lang#66497 Author: @Nadrieril Reviewers: @varkor and @Centril

Building on the previous PR, this one fixed a bug rust-lang#53820 wherein sufficiently large subarray patterns (match [0u8; 16*1024] { [..] => {}}) would result in crashing the compiler with a stack-overflow. The PR did this by treating array patterns in a more first-class way (using a variable-length mechanism also used for slices) rather than like large tuples. This also had the effect of improving diagnostics for non-exhaustive matches.
2019-11-28, PR rust-lang#66603 Author: @Nadrieril Reviewers: @varkor

Fixed a bug rust-lang#65413 wherein constants, slice patterns, and exhaustiveness checking interacted in a suboptimal way conspiring to suggest that a reachable arm was in fact unreachable.
2019-12-12, PR rust-lang#66650 Author: @matthewjasper Reviewers: @pnkfelix and @Centril

Removed the UniformArrayMoveOut MIR transformation pass in favor of baking the necessary logic into the borrow-checker, drop elaboration and MIR building itself. This fixed a number of bugs, including a soundness hole rust-lang#66502. Moreover, the PR added a slew of tests for borrow- and move-checking of slice patterns as well as a test for the dynamic semantics of dropping subslice patterns.
2019-12-16, PR rust-lang#67318 Author: @Centril Reviewers: @matthewjasper

Improved documentation for AST->HIR lowering + type checking of slice as well as minor code simplification.
2019-12-21, PR rust-lang#67467 Author: @matthewjasper Reviewers: @oli-obk, @RalfJung, and @Centril

Fixed bugs in the const evaluation of slice patterns and added tests for const evaluation as well as borrow- and move-checking.
2019-12-22, PR rust-lang#67439 Author: @Centril Reviewers: @matthewjasper

Cleaned up HAIR lowering of slice patterns, removing special cased dead code for the unrepresentable [a, b] @ ... The PR also refactored type checking for slice patterns.
2019-12-23, PR rust-lang#67546 Author: @oli-obk Reviewers: @varkor and @RalfJung

Fixed an ICE in the MIR interpretation of slice patterns.
2019-12-24, PR rust-lang#66296 Author: @Centril Reviewers: @pnkfelix and @matthewjasper

This implemented #![feature(bindings_after_at)] which allows writing e.g. a @ Some([_, b @ ..]). This is not directly linked to slice patterns other than with patterns in general. However, the combination of the feature and slice_patterns received some testing in the PR.
2020-01-09, PR rust-lang#67990 Author: @Centril Reviewers: @matthewjasper

This hardened move-checker tests for match expressions in relation to rust-lang#53114.
This PR stabilizes slice_patterns.

There is on-going work to improve pattern matching in other ways (the relevance of some of these are indirect, and only by composition):

OR-patterns, pat_0 | .. | pat_n is almost implemented. Tracking issue: rust-lang#54883
Bindings after @, e.g., x @ Some(y) is implemented. Tracking issue: rust-lang#65490
Half-open range patterns, e.g., X.., ..X, and ..=X as well as exclusive range patterns, e.g., X..Y. Tracking issue: rust-lang#67264 and rust-lang#37854 The relevance here is that this work demonstrates, in practice, that there are no syntactic conflicts introduced by the stabilization of subslice patterns.

As for more direct improvements to slice patterns, some avenues could be:

Box patterns, e.g., box [a, b, .., c] to match on Box<[T]>. Tracking issue: rust-lang#29641 This issue currently has no path to stabilization.

Note that it is currently possible to match on Box<[T]> or Vec<T> by first dereferencing them to slices.
DerefPure, which would allow e.g., using slice patterns to match on Vec<T> (e.g., moving out of it).

Another idea which was raised by RFC 2707 and RFC 2359 was to allow binding a subtuple pattern. That is, we could allow (a, xs @ .., b). However, while we could allow by-value bindings to .. as in xs @ .. at zero cost, the same cannot be said of by-reference bindings, e.g. (a, ref xs @ .., b). The issue here becomes that for a reference to be legal, we have to represent xs contiguously in memory. In effect, we are forced into a HList based representation for tuples.