Builder in regex_automata::meta - Rust (original) (raw)

pub struct Builder { /* private fields */ }

Available on crate feature meta only.

Expand description

A builder for configuring and constructing a Regex.

The builder permits configuring two different aspects of a Regex:

Once configured, the builder can then be used to construct a Regex from one of 4 different inputs:

The latter two methods in particular provide a way to construct a fully feature regular expression matcher directly from an Hir expression without having to first convert it to a string. (This is in contrast to the top-level regex crate which intentionally provides no such API in order to avoid making regex-syntax a public dependency.)

As a convenience, this builder may be created via Regex::builder, which may help avoid an extra import.

§Example: change the line terminator

This example shows how to enable multi-line mode by default and change the line terminator to the NUL byte:

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().multi_line(true))
    .configure(Regex::config().line_terminator(b'\x00'))
    .build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));

§Example: disable UTF-8 requirement

By default, regex patterns are required to match UTF-8. This includes regex patterns that can produce matches of length zero. In the case of an empty match, by default, matches will not appear between the code units of a UTF-8 encoded codepoint.

However, it can be useful to disable this requirement, particularly if you’re searching things like &[u8] that are not known to be valid UTF-8.

use regex_automata::{meta::Regex, util::syntax, Match};

let mut builder = Regex::builder();
// Disables the requirement that non-empty matches match UTF-8.
builder.syntax(syntax::Config::new().utf8(false));
// Disables the requirement that empty matches match UTF-8 boundaries.
builder.configure(Regex::config().utf8_empty(false));

// We can match raw bytes via \xZZ syntax, but we need to disable
// Unicode mode to do that. We could disable it everywhere, or just
// selectively, as shown here.
let re = builder.build(r"(?-u:\xFF)foo(?-u:\xFF)")?;
let hay = b"\xFFfoo\xFF";
assert_eq!(Some(Match::must(0, 0..5)), re.find(hay));

// We can also match between code units.
let re = builder.build(r"")?;
let hay = "☃";
assert_eq!(re.find_iter(hay).collect::<Vec<Match>>(), vec![
    Match::must(0, 0..0),
    Match::must(0, 1..1),
    Match::must(0, 2..2),
    Match::must(0, 3..3),
]);

Source§

Source

Creates a new builder for configuring and constructing a Regex.

Source

Builds a Regex from a single pattern string.

If there was a problem parsing the pattern or a problem turning it into a regex matcher, then an error is returned.

§Example

This example shows how to configure syntax options.

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().crlf(true).multi_line(true))
    .build(r"^foo$")?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));

Source

Builds a Regex from many pattern strings.

If there was a problem parsing any of the patterns or a problem turning them into a regex matcher, then an error is returned.

§Example: finding the pattern that caused an error

When a syntax error occurs, it is possible to ask which pattern caused the syntax error.

use regex_automata::{meta::Regex, PatternID};

let err = Regex::builder()
    .build_many(&["a", "b", r"\p{Foo}", "c"])
    .unwrap_err();
assert_eq!(Some(PatternID::must(2)), err.pattern());
§Example: zero patterns is valid

Building a regex with zero patterns results in a regex that never matches anything. Because this routine is generic, passing an empty slice usually requires a turbo-fish (or something else to help type inference).

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .build_many::<&str>(&[])?;
assert_eq!(None, re.find(""));

Source

Builds a Regex directly from an Hir expression.

This is useful if you needed to parse a pattern string into an Hirfor other reasons (such as analysis or transformations). This routine permits building a Regex directly from the Hir expression instead of first converting the Hir back to a pattern string.

When using this method, any options set via Builder::syntax are ignored. Namely, the syntax options only apply when parsing a pattern string, which isn’t relevant here.

If there was a problem building the underlying regex matcher for the given Hir, then an error is returned.

§Example

This example shows how one can hand-construct an Hir expression and build a regex from it without doing any parsing at all.

use {
    regex_automata::{meta::Regex, Match},
    regex_syntax::hir::{Hir, Look},
};

// (?Rm)^foo$
let hir = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("foo".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
    .build_from_hir(&hir)?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));

Ok::<(), Box<dyn std::error::Error>>(())

Source

Builds a Regex directly from many Hir expressions.

This is useful if you needed to parse pattern strings into Hirexpressions for other reasons (such as analysis or transformations). This routine permits building a Regex directly from the Hirexpressions instead of first converting the Hir expressions back to pattern strings.

When using this method, any options set via Builder::syntax are ignored. Namely, the syntax options only apply when parsing a pattern string, which isn’t relevant here.

If there was a problem building the underlying regex matcher for the given Hir expressions, then an error is returned.

Note that unlike Builder::build_many, this can only fail as a result of building the underlying matcher. In that case, there is no single Hir expression that can be isolated as a reason for the failure. So if this routine fails, it’s not possible to determine whichHir expression caused the failure.

§Example

This example shows how one can hand-construct multiple Hirexpressions and build a single regex from them without doing any parsing at all.

use {
    regex_automata::{meta::Regex, Match},
    regex_syntax::hir::{Hir, Look},
};

// (?Rm)^foo$
let hir1 = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("foo".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
// (?Rm)^bar$
let hir2 = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("bar".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
    .build_many_from_hir(&[&hir1, &hir2])?;
let hay = "\r\nfoo\r\nbar";
let got: Vec<Match> = re.find_iter(hay).collect();
let expected = vec![
    Match::must(0, 2..5),
    Match::must(1, 7..10),
];
assert_eq!(expected, got);

Ok::<(), Box<dyn std::error::Error>>(())

Source

Configure the behavior of a Regex.

This configuration controls non-syntax options related to the behavior of a Regex. This includes things like whether empty matches can split a codepoint, prefilters, line terminators and a long list of options for configuring which regex engines the meta regex engine will be able to use internally.

§Example

This example shows how to disable UTF-8 empty mode. This will permit empty matches to occur between the UTF-8 encoding of a codepoint.

use regex_automata::{meta::Regex, Match};

let re = Regex::new("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches only occur at the beginning and end of the snowman.
assert_eq!(got, vec![
    Match::must(0, 0..0),
    Match::must(0, 3..3),
]);

let re = Regex::builder()
    .configure(Regex::config().utf8_empty(false))
    .build("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches now occur at every position!
assert_eq!(got, vec![
    Match::must(0, 0..0),
    Match::must(0, 1..1),
    Match::must(0, 2..2),
    Match::must(0, 3..3),
]);

Ok::<(), Box<dyn std::error::Error>>(())

Source

Configure the syntax options when parsing a pattern string while building a Regex.

These options only apply when Builder::build or Builder::build_manyare used. The other build methods accept Hir values, which have already been parsed.

§Example

This example shows how to enable case insensitive mode.

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().case_insensitive(true))
    .build(r"δ")?;
assert_eq!(Some(Match::must(0, 0..2)), re.find(r"Δ"));

Ok::<(), Box<dyn std::error::Error>>(())

§

§

§

§

§

§