Test slice patterns more by matthewjasper · Pull Request #67467 · rust-lang/rust (original) (raw)

…=matthewjasper

Stabilize #![feature(slice_patterns)] in 1.42.0

Stabilization report

The following is the stabilization report for #![feature(slice_patterns)]. This report is the collaborative effort of @matthewjasper and @Centril.

Tracking issue: rust-lang#62254 Version target: 1.42 (2020-01-30 => beta, 2020-03-12 => stable).

Backstory: slice patterns

It is already possible to use slice patterns on stable Rust to match on arrays and slices. For example, to match on a slice, you may write:

fn foo(slice: &[&str]) {
    match slice {
        [] => { dbg!() }
        [a] => { dbg!(a); }
        [a, b] => { dbg!(a, b); }
        _ => {}
    //  ^ Fallback -- necessary because the length is unknown!
    }
}

To match on an array, you may instead write:

fn bar([a, b, c]: [u8; 3]) {}
//     --------- Length is known, so pattern is irrefutable.

However, on stable Rust, it is not yet possible to match on a subslice or subarray.

A quick user guide: Subslice patterns

The ability to match on a subslice or subarray is gated under #![feature(slice_patterns)] and is what is proposed for stabilization here.

The syntax of subslice patterns

Subslice / subarray patterns come in two flavors syntactically.

Common to both flavors is they use the token .., referred as a "rest pattern" in a pattern context. This rest pattern functions as a variable-length pattern, matching whatever amount of elements that haven't been matched already before and after.

When .. is used syntactically as an element of a slice-pattern, either directly (1), or as part of a binding pattern (2), it becomes a subslice pattern.

On stable Rust, a rest pattern .. can also be used in a tuple or tuple-struct pattern with let (x, ..) = (1, 2, 3); and let TS(x, ..) = TS(1, 2, 3); respectively.

(1) Matching on a subslice without binding it

fn base(string: &str) -> u8 {
    match string.as_bytes() {
        [b'0', b'x', ..] => 16,
        [b'0', b'o', ..] => 8,
        [b'0', b'b', ..] => 2,
        _ => 10,
    }
}

fn main() {
    assert_eq!(base("0xFF"), 16);
    assert_eq!(base("0x"), 16);
}

In the function base, the pattern [b'0', b'x', ..] will match on any byte-string slice with the prefix 0x. Note that .. may match on nothing, so 0x is a valid match.

(2) Binding a subslice:

fn main() {
    #[derive(PartialEq, Debug)]
    struct X(u8);
    let xs: Vec<X> = vec![X(0), X(1), X(2)];

    if let [start @ .., end] = &*xs {
        //              --- bind on last element, assuming there is one.
        //  ---------- bind the initial elements, if there are any.
        assert_eq!(start, &[X(0), X(1)] as &[X]);
        assert_eq!(end, &X(2));
        let _: &[X] = start;
        let _: &X = end;
    }
}

In this case, [start @ .., end] will match any non-empty slice, binding the last element to end and any elements before that to start. Note in particular that, as above, start may match on the empty slice.

Only one .. per slice pattern

In today's stable Rust, a tuple (struct) pattern (a, b, c) can only have one subtuple pattern (e.g., (a, .., c)). That is, if there is a rest pattern, it may only occur once. Any .. that follow, as in e.g., (a, .., b, ..) will cause an error, as there is no way for the compiler to know what b applies to. This rule also applies to slice patterns. That is, you may also not write [a, .., b, ..].

Motivation

[PR rust-lang#67569]: https://github.com/rust-lang/rust/pull/67569/files

Slice patterns provide a natural and efficient way to pattern match on slices and arrays. This is particularly useful as slices and arrays are quite a common occurence in modern software targeting modern hardware. However, as aforementioned, it's not yet possible to perform incomplete matches, which is seen in fn base, an example taken from the rustc codebase itself. This is where subslice patterns come in and extend slice patterns with the natural syntax xs @ .. and .., where the latter is already used for tuples and tuple structs. As an example of how subslice patterns can be used to clean up code, we have [PR rust-lang#67569]. In this PR, slice patterns enabled us to improve readability and reduce unsafety, at no loss to performance.

Technical specification

Grammar

The following specification is a sub-set of the grammar necessary to explain what interests us here. Note that stabilizing subslice patterns does not alter the stable grammar. The stabilization contains purely semantic changes.

Binding = reference:"ref"? mutable:"mut"? name:IDENT;

Pat =
  | ... // elided
  | Rest: ".."
  | Binding:{ binding:Binding { "@" subpat:Pat }? }
  | Slice:{ "[" elems:Pat* %% "," "]" }
  | Paren:{ "(" pat:Pat ")" }
  | Tuple:{ path:Path? "(" elems:Pat* &% "," ")" }
  ;

Notes:

  1. (..) is interpreted as a Tuple, not a Paren. This means that [a, (..)] is interpreted as Slice[Binding(a), Tuple[Rest]] and not Slice[Binding(a), Paren(Rest)].

Name resolution

A slice pattern is resolved as a product context and .. is given no special treatment.

Abstract syntax of slice patterns

The abstract syntax (HIR level) is defined like so:

enum PatKind {
    ... // Other unimportant stuff.
    Wild,
    Binding {
        binding: Binding,
        subpat: Option<Pat>,
    },
    Slice {
        before: List<Pat>,
        slice: Option<Pat>,
        after: List<Pat>,
    },
}

The executable definition is found in hir::PatKind.

Lowering to abstract syntax

Lowering a slice pattern to its abstract syntax proceeds by:

  1. Lowering each element pattern of the slice pattern, where:

    1. .. is lowered to _, recording that it was a subslice pattern,

    2. binding @ .. is lowered to binding @ _, recording that it was a subslice pattern,

    3. and all other patterns are lowered as normal, recording that it was not a subslice pattern.

  2. Taking all lowered elements until the first subslice pattern.

  3. Take all following elements.

    If there are any,

    1. The head is the sub-slice pattern.
    2. The tail (after) must not contain a subslice pattern, or an error occurs.

The full executable definition can be found in LoweringContext::lower_pat_slice.

Type checking slice patterns

Default binding modes

A slice pattern is a non-reference pattern as defined in is_non_ref_pat. This means that when type checking a slice pattern, as many immediate reference types are peeled off from the expected type as possible and the default binding mode is adjusted to by-reference before checking the slice pattern. See rust-lang#63118 for an algorithmic description.

See RFC 2359's guide-level explanation and the tests listed below for examples of what effect this has.

Checking the pattern

Type checking a slice pattern proceeds as follows:

  1. Resolve any type variables by a single level. If the result still is a type variable, error.

  2. Determine the expected type for any subslice pattern (slice_ty) and for elements (inner_ty) depending on the expected type.

    1. If the expected type is an array ([E; N]):

      1. Evaluate the length of the array. If the length couldn't be evaluated, error. This may occur when we have e.g., const N: usize. Now N is known.

      2. If there is no sub-slice pattern, check len(before) == N, and otherwise error.

      3. Otherwise, set S = N - len(before) - len(after), and check N >= 0 and otherwise error. Set slice_ty = [E; S].

      Set inner_ty = E.

    2. If the expected type is a slice ([E]), set inner_ty = E and slice_ty = [E].

    3. Otherwise, error.

  3. Check each element in before and after against inner_ty.

  4. If it exists, check slice against slice_ty.

For an executable definition, see check_pat_slice.

Typed abstract syntax of slice and array patterns

The typed abstract syntax (HAIR level) is defined like so:

enum PatKind {
    ... // Other unimportant stuff.
    Wild,
    Binding {
        ... // Elided.
    }
    Slice {
        prefix: List<Pat>,
        slice: Option<Pat>,
        suffix: List<Pat>,
    },
    Array {
        prefix: List<Pat>,
        slice: Option<Pat>,
        suffix: List<Pat>,
    },
}

The executable definition is found in hair::pattern::PatKind.

Lowering to typed abstract syntax

Lowering a slice pattern to its typed abstract syntax proceeds by:

  1. Lowering each pattern in before into prefix.
  2. Lowering the slice, if it exists, into slice.
    1. A Wild pattern in abstract syntax is lowered to Wild.
    2. A Binding pattern in abstract syntax is lowered to Binding { .. }.
  3. Lowering each pattern in after into after.
  4. If the type is [E; N], construct PatKind::Array { prefix, slice, after }, otherwise PatKind::Slice { prefix, slice, after }.

The executable definition is found in PatCtxt::slice_or_array_pattern.

Exhaustiveness checking

Let E be the element type of a slice or array.

MIR representation

The relevant MIR representation for the lowering into MIR, which is discussed in the next section, includes:

enum Rvalue {
    // ...
    /// The length of a `[X]` or `[X; N]` value.
    Len(Place),
}

struct Place {
    base: PlaceBase,
    projection: List<PlaceElem>,
}

enum ProjectionElem {
    // ...
    ConstantIndex {
        offset: Nat,
        min_length: Nat,
        from_end: bool,
    },
    Subslice {
        from: Nat,
        to: Nat,
        from_end: bool,
    },
}

Lowering to MIR

Safety and const checking

Borrow and move checking

Tests

The tests can be primarily seen in the PR itself. Here are some of them:

Parsing (3)

Lowering (2)

Type checking (5)

Exhaustiveness and usefulness checking (20)

Borrow checking (28)

MIR lowering (1)

Evaluation (19)

Misc (1)

History

There is on-going work to improve pattern matching in other ways (the relevance of some of these are indirect, and only by composition):

As for more direct improvements to slice patterns, some avenues could be:

Another idea which was raised by RFC 2707 and RFC 2359 was to allow binding a subtuple pattern. That is, we could allow (a, xs @ .., b). However, while we could allow by-value bindings to .. as in xs @ .. at zero cost, the same cannot be said of by-reference bindings, e.g. (a, ref xs @ .., b). The issue here becomes that for a reference to be legal, we have to represent xs contiguously in memory. In effect, we are forced into a HList based representation for tuples.