Parsing.String.Replace - purescript-parsing - Pursuit (original) (raw)

Package

Repository

This module is for finding patterns in a String, and also replacing or splitting on the found patterns. This activity is traditionally done withRegex, but this module uses parsers instead for the pattern matching.

Functions in this module are ways to run a parser on an input String, like runParser or runParserT.

Why would we want to do pattern matching and substitution with parsers instead of regular expressions?

Monadic parsers have a nicer syntax thanregular expressions, which are notoriouslydifficult to read. With monadic parsers we can perform textual pattern-matching in plain PureScript rather than using a special regex domain-specific programming language.
Regular expressions can do “group capture” on sections of the matched pattern, but they can only return stringy lists of the capture groups. Parsers can construct typed data structures based on the capture groups, guaranteeing no disagreement between the pattern rules and the rules that we're using to build data structures based on the pattern matches.
For example, consider scanning a string for numbers. A lot of different things can look like a number, and can have leading plus or minus signs, or be in scientific notation, or have commas, or whatever. If we try to parse all of the numbers out of a string using regular expressions, then we have to make sure that the regular expression and the string-to-number conversion function agree about exactly what is and what isn't a numeric string. We can get into an awkward situation in which the regular expression says it has found a numeric string but the string-to-number conversion function fails. A typed parser will perform both the pattern match and the conversion, so it will never be in that situation.Parse, don't validate.
Regular expressions are only able to pattern-matchregular grammars. Monadic parsers are able pattern-match context-free (by recursion) or context-sensitive (by monad transformer) grammars.
The replacement expression for a traditional regular expression-based substitution command is usually just a string template in which the Nth “capture group” can be inserted with the syntax \N. With this library, instead of a template, we can perform any replacement computation, including Effects.

Implementation Notes

All of the functions in this module work by calling runParserTwith the anyTill combinator. We can expect the speed of parser-based pattern matching to be about 10× worse than regex-based pattern matching in a JavaScript runtime environment. This module is based on the Haskell packagesreplace-megaparsecandreplace-attoparsec.

#breakCap Source

breakCap :: forall a. String -> Parser String a -> Maybe (T3 String a String)

Break on and capture one pattern

Find the first occurence of a pattern in the input String, capture the found pattern, and break the input String on the found pattern.

This function can be used instead ofData.String.indexOforData.String.Regex.searchorData.String.Regex.replaceand it allows using a parser for the pattern search.

This function can be used instead ofData.String.takeWhileorData.String.dropWhileand it is predicated beyond more than just the next single CodePoint.

Output

Nothing when no pattern match was found.
Just (prefix /\ parse_result /\ suffix) for the result of parsing the pattern match, and the prefix string before and the suffix string after the pattern match. prefix and suffix may be zero-length strings.

Access the matched section of text

To capture the matched string combine the pattern parser sep with the match combinator.

With the matched string, we can reconstruct the input string. For all input, sep, if

let (Just (prefix /\ (infix /\ _) /\ suffix)) =
      breakCap input (match sep)

then

input == prefix <> infix <> suffix

Example

Find the first pattern match and break the input string on the pattern.

breakCap "hay needle hay" (string "needle")

Result:

Just ("hay " /\ "needle" /\ " hay")

Example

Find the first pattern match, capture the matched text and the parsed result.

breakCap "abc 123 def" (match intDecimal)

Result:

Just ("abc " /\ ("123" /\ 123) /\ " def")

#splitCap Source

splitCap :: forall a. String -> Parser String a -> NonEmptyList (Either String a)

Split on and capture all patterns

Find all occurences of the pattern parser sep, split the input String, capture all the matched patterns and the splits.

This function can be used instead ofData.String.Common.splitorData.String.Regex.splitorData.String.Regex.matchorData.String.Regex.search.

The input string will be split on every leftmost non-overlapping occurence of the pattern sep. The output list will contain the parsed result of input string sections which match the sep pattern in Right a, and non-matching sections in Left String.

Access the matched section of text

To capture the matched strings combine the pattern parser sep with the match combinator.

With the matched strings, we can reconstruct the input string. For all input, sep, if

let output = splitCap input (match sep)

then

input == fold (either identity fst <$> output)

Example

Split the input string on all Int pattern matches.

splitCap "hay 1 straw 2 hay" intDecimal

Result:

[Left "hay ", Right 1, Left " straw ", Right 2, Left " hay"]

Example

Find the beginning positions of all pattern matches in the input.

catMaybes $ hush <$> splitCap ".𝝺...\n...𝝺." (position <* string "𝝺")

Result:

[ Position {index: 1, line: 1, column: 2 }
, Position { index: 9, line: 2, column: 4 }
]

Example

Find groups of balanced nested parentheses. This pattern is an example of a “context-free” grammar, a pattern thatcan't be expressed by a regular expression. We can express the pattern with a recursive parser.

balancedParens :: Parser String Unit
balancedParens = do
  void $ char '('
  void $ manyTill (balancedParens <|> void anyCodePoint) (char ')')

rmap fst <$> splitCap "((🌼)) (()())" (match balancedParens)

Result:

[Right "((🌼))", Left " ", Right "(()())"]

#splitCapT Source

splitCapT :: forall m a. Monad m => MonadRec m => String -> ParserT String m a -> m (NonEmptyList (Either String a))

Monad transformer version of splitCap. The sep parser will run in the monad context.

Example

Count the pattern matches.

Parse in a State monad to remember state in the parser. This stateful letterCount parser counts the number of pattern matches which occur in the input, and also tags each match with its index.

letterCount :: ParserT String (State Int) (Tuple Char Int)
letterCount = do
  x <- letter
  i <- modify (_+1)
  pure (x /\ i)

flip runState 0 $ splitCapT "A B" letterCount

Result:

[Right ('A' /\ 1), Left " ", Right ('B' /\ 2)] /\ 2

#replace Source

replace :: String -> Parser String String -> String

Find-and-replace

Also called “match-and-substitute”. Find all of the leftmost non-overlapping sections of the input string which match the pattern parser sep, and replace them with the result of the parser. The sep parser must return a result of type String for the replacement.

This function can be used instead ofData.String.replaceAllorData.String.Regex.replace'.

Access the matched section of the input string

To get access to the matched string for calculating the replacement, combine the pattern parser sepwith the match combinator. This allows us to write a sep parser which can choose to not replace the match and just leave it as it is.

So, for all sep:

replace input (fst <$> match sep) == input

Example

Find and uppercase the "needle" pattern.

replace "hay needle hay" (toUpper <$> string "needle")

Result:

"hay NEEDLE hay"

Example

Find integers and double them.

replace "1 6 21 107" (show <$> (_*2) <$> intDecimal)

Result:

"2 12 42 214"

#replaceT Source

replaceT :: forall m. Monad m => MonadRec m => String -> ParserT String m String -> m String

Monad transformer version of replace.

Example

Find an environment variable in curly braces and replace it with its value from the environment. We can read from the environment with lookupEnv because replaceT is running the sep parser in Effect.

replaceT "◀ {HOME} ▶" do
  _ <- string "{"
  Tuple variable _ <- anyTill (string "}")
  lift (lookupEnv variable) >>= maybe empty pure

Result:

"◀ /home/jbrock ▶"