Builder in regex_automata::meta - Rust (original) (raw)
pub struct Builder { /* private fields */ }
Available on crate feature meta
only.
Expand description
A builder for configuring and constructing a Regex
.
The builder permits configuring two different aspects of a Regex
:
- Builder::configure will set high-level configuration options as described by a Config.
- Builder::syntax will set the syntax level configuration options as described by a util::syntax::Config. This only applies when building a
Regex
from pattern strings.
Once configured, the builder can then be used to construct a Regex
from one of 4 different inputs:
- Builder::build creates a regex from a single pattern string.
- Builder::build_many creates a regex from many pattern strings.
- Builder::build_from_hir creates a regex from aregex-syntax::Hir expression.
- Builder::build_many_from_hir creates a regex from manyregex-syntax::Hir expressions.
The latter two methods in particular provide a way to construct a fully feature regular expression matcher directly from an Hir
expression without having to first convert it to a string. (This is in contrast to the top-level regex
crate which intentionally provides no such API in order to avoid making regex-syntax
a public dependency.)
As a convenience, this builder may be created via Regex::builder, which may help avoid an extra import.
§Example: change the line terminator
This example shows how to enable multi-line mode by default and change the line terminator to the NUL byte:
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().multi_line(true))
.configure(Regex::config().line_terminator(b'\x00'))
.build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));
§Example: disable UTF-8 requirement
By default, regex patterns are required to match UTF-8. This includes regex patterns that can produce matches of length zero. In the case of an empty match, by default, matches will not appear between the code units of a UTF-8 encoded codepoint.
However, it can be useful to disable this requirement, particularly if you’re searching things like &[u8]
that are not known to be valid UTF-8.
use regex_automata::{meta::Regex, util::syntax, Match};
let mut builder = Regex::builder();
// Disables the requirement that non-empty matches match UTF-8.
builder.syntax(syntax::Config::new().utf8(false));
// Disables the requirement that empty matches match UTF-8 boundaries.
builder.configure(Regex::config().utf8_empty(false));
// We can match raw bytes via \xZZ syntax, but we need to disable
// Unicode mode to do that. We could disable it everywhere, or just
// selectively, as shown here.
let re = builder.build(r"(?-u:\xFF)foo(?-u:\xFF)")?;
let hay = b"\xFFfoo\xFF";
assert_eq!(Some(Match::must(0, 0..5)), re.find(hay));
// We can also match between code units.
let re = builder.build(r"")?;
let hay = "☃";
assert_eq!(re.find_iter(hay).collect::<Vec<Match>>(), vec![
Match::must(0, 0..0),
Match::must(0, 1..1),
Match::must(0, 2..2),
Match::must(0, 3..3),
]);
Creates a new builder for configuring and constructing a Regex.
Builds a Regex
from a single pattern string.
If there was a problem parsing the pattern or a problem turning it into a regex matcher, then an error is returned.
§Example
This example shows how to configure syntax options.
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().crlf(true).multi_line(true))
.build(r"^foo$")?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));
Builds a Regex
from many pattern strings.
If there was a problem parsing any of the patterns or a problem turning them into a regex matcher, then an error is returned.
§Example: finding the pattern that caused an error
When a syntax error occurs, it is possible to ask which pattern caused the syntax error.
use regex_automata::{meta::Regex, PatternID};
let err = Regex::builder()
.build_many(&["a", "b", r"\p{Foo}", "c"])
.unwrap_err();
assert_eq!(Some(PatternID::must(2)), err.pattern());
§Example: zero patterns is valid
Building a regex with zero patterns results in a regex that never matches anything. Because this routine is generic, passing an empty slice usually requires a turbo-fish (or something else to help type inference).
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.build_many::<&str>(&[])?;
assert_eq!(None, re.find(""));
Builds a Regex
directly from an Hir
expression.
This is useful if you needed to parse a pattern string into an Hir
for other reasons (such as analysis or transformations). This routine permits building a Regex
directly from the Hir
expression instead of first converting the Hir
back to a pattern string.
When using this method, any options set via Builder::syntax are ignored. Namely, the syntax options only apply when parsing a pattern string, which isn’t relevant here.
If there was a problem building the underlying regex matcher for the given Hir
, then an error is returned.
§Example
This example shows how one can hand-construct an Hir
expression and build a regex from it without doing any parsing at all.
use {
regex_automata::{meta::Regex, Match},
regex_syntax::hir::{Hir, Look},
};
// (?Rm)^foo$
let hir = Hir::concat(vec![
Hir::look(Look::StartCRLF),
Hir::literal("foo".as_bytes()),
Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
.build_from_hir(&hir)?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));
Ok::<(), Box<dyn std::error::Error>>(())
Builds a Regex
directly from many Hir
expressions.
This is useful if you needed to parse pattern strings into Hir
expressions for other reasons (such as analysis or transformations). This routine permits building a Regex
directly from the Hir
expressions instead of first converting the Hir
expressions back to pattern strings.
When using this method, any options set via Builder::syntax are ignored. Namely, the syntax options only apply when parsing a pattern string, which isn’t relevant here.
If there was a problem building the underlying regex matcher for the given Hir
expressions, then an error is returned.
Note that unlike Builder::build_many, this can only fail as a result of building the underlying matcher. In that case, there is no single Hir
expression that can be isolated as a reason for the failure. So if this routine fails, it’s not possible to determine whichHir
expression caused the failure.
§Example
This example shows how one can hand-construct multiple Hir
expressions and build a single regex from them without doing any parsing at all.
use {
regex_automata::{meta::Regex, Match},
regex_syntax::hir::{Hir, Look},
};
// (?Rm)^foo$
let hir1 = Hir::concat(vec![
Hir::look(Look::StartCRLF),
Hir::literal("foo".as_bytes()),
Hir::look(Look::EndCRLF),
]);
// (?Rm)^bar$
let hir2 = Hir::concat(vec![
Hir::look(Look::StartCRLF),
Hir::literal("bar".as_bytes()),
Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
.build_many_from_hir(&[&hir1, &hir2])?;
let hay = "\r\nfoo\r\nbar";
let got: Vec<Match> = re.find_iter(hay).collect();
let expected = vec![
Match::must(0, 2..5),
Match::must(1, 7..10),
];
assert_eq!(expected, got);
Ok::<(), Box<dyn std::error::Error>>(())
Configure the behavior of a Regex
.
This configuration controls non-syntax options related to the behavior of a Regex
. This includes things like whether empty matches can split a codepoint, prefilters, line terminators and a long list of options for configuring which regex engines the meta regex engine will be able to use internally.
§Example
This example shows how to disable UTF-8 empty mode. This will permit empty matches to occur between the UTF-8 encoding of a codepoint.
use regex_automata::{meta::Regex, Match};
let re = Regex::new("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches only occur at the beginning and end of the snowman.
assert_eq!(got, vec![
Match::must(0, 0..0),
Match::must(0, 3..3),
]);
let re = Regex::builder()
.configure(Regex::config().utf8_empty(false))
.build("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches now occur at every position!
assert_eq!(got, vec![
Match::must(0, 0..0),
Match::must(0, 1..1),
Match::must(0, 2..2),
Match::must(0, 3..3),
]);
Ok::<(), Box<dyn std::error::Error>>(())
Configure the syntax options when parsing a pattern string while building a Regex
.
These options only apply when Builder::build or Builder::build_manyare used. The other build methods accept Hir
values, which have already been parsed.
§Example
This example shows how to enable case insensitive mode.
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().case_insensitive(true))
.build(r"δ")?;
assert_eq!(Some(Match::must(0, 0..2)), re.find(r"Δ"));
Ok::<(), Box<dyn std::error::Error>>(())