Parsing.String.Replace - purescript-parsing - Pursuit (original) (raw)

Package

purescript-parsing

Repository

purescript-contrib/purescript-parsing

This module is for finding patterns in a String, and also replacing or splitting on the found patterns. This activity is traditionally done withRegex, but this module uses parsers instead for the pattern matching.

Functions in this module are ways to run a parser on an input String, like runParser or runParserT.

Why would we want to do pattern matching and substitution with parsers instead of regular expressions?

Implementation Notes

All of the functions in this module work by calling runParserTwith the anyTill combinator. We can expect the speed of parser-based pattern matching to be about 10× worse than regex-based pattern matching in a JavaScript runtime environment. This module is based on the Haskell packagesreplace-megaparsecandreplace-attoparsec.

#breakCap Source

breakCap :: forall a. String -> Parser String a -> Maybe (T3 String a String)

Break on and capture one pattern

Find the first occurence of a pattern in the input String, capture the found pattern, and break the input String on the found pattern.

This function can be used instead ofData.String.indexOforData.String.Regex.searchorData.String.Regex.replaceand it allows using a parser for the pattern search.

This function can be used instead ofData.String.takeWhileorData.String.dropWhileand it is predicated beyond more than just the next single CodePoint.

Output

Access the matched section of text

To capture the matched string combine the pattern parser sep with the match combinator.

With the matched string, we can reconstruct the input string. For all input, sep, if

let (Just (prefix /\ (infix /\ _) /\ suffix)) =
      breakCap input (match sep)

then

input == prefix <> infix <> suffix

Example

Find the first pattern match and break the input string on the pattern.

breakCap "hay needle hay" (string "needle")

Result:

Just ("hay " /\ "needle" /\ " hay")

Example

Find the first pattern match, capture the matched text and the parsed result.

breakCap "abc 123 def" (match intDecimal)

Result:

Just ("abc " /\ ("123" /\ 123) /\ " def")

#splitCap Source

splitCap :: forall a. String -> Parser String a -> NonEmptyList (Either String a)

Split on and capture all patterns

Find all occurences of the pattern parser sep, split the input String, capture all the matched patterns and the splits.

This function can be used instead ofData.String.Common.splitorData.String.Regex.splitorData.String.Regex.matchorData.String.Regex.search.

The input string will be split on every leftmost non-overlapping occurence of the pattern sep. The output list will contain the parsed result of input string sections which match the sep pattern in Right a, and non-matching sections in Left String.

Access the matched section of text

To capture the matched strings combine the pattern parser sep with the match combinator.

With the matched strings, we can reconstruct the input string. For all input, sep, if

let output = splitCap input (match sep)

then

input == fold (either identity fst <$> output)

Example

Split the input string on all Int pattern matches.

splitCap "hay 1 straw 2 hay" intDecimal

Result:

[Left "hay ", Right 1, Left " straw ", Right 2, Left " hay"]

Example

Find the beginning positions of all pattern matches in the input.

catMaybes $ hush <$> splitCap ".𝝺...\n...𝝺." (position <* string "𝝺")

Result:

[ Position {index: 1, line: 1, column: 2 }
, Position { index: 9, line: 2, column: 4 }
]

Example

Find groups of balanced nested parentheses. This pattern is an example of a “context-free” grammar, a pattern thatcan't be expressed by a regular expression. We can express the pattern with a recursive parser.

balancedParens :: Parser String Unit
balancedParens = do
  void $ char '('
  void $ manyTill (balancedParens <|> void anyCodePoint) (char ')')

rmap fst <$> splitCap "((🌼)) (()())" (match balancedParens)

Result:

[Right "((🌼))", Left " ", Right "(()())"]

#splitCapT Source

splitCapT :: forall m a. Monad m => MonadRec m => String -> ParserT String m a -> m (NonEmptyList (Either String a))

Monad transformer version of splitCap. The sep parser will run in the monad context.

Example

Count the pattern matches.

Parse in a State monad to remember state in the parser. This stateful letterCount parser counts the number of pattern matches which occur in the input, and also tags each match with its index.

letterCount :: ParserT String (State Int) (Tuple Char Int)
letterCount = do
  x <- letter
  i <- modify (_+1)
  pure (x /\ i)

flip runState 0 $ splitCapT "A B" letterCount

Result:

[Right ('A' /\ 1), Left " ", Right ('B' /\ 2)] /\ 2

#replace Source

replace :: String -> Parser String String -> String

Find-and-replace

Also called “match-and-substitute”. Find all of the leftmost non-overlapping sections of the input string which match the pattern parser sep, and replace them with the result of the parser. The sep parser must return a result of type String for the replacement.

This function can be used instead ofData.String.replaceAllorData.String.Regex.replace'.

Access the matched section of the input string

To get access to the matched string for calculating the replacement, combine the pattern parser sepwith the match combinator. This allows us to write a sep parser which can choose to not replace the match and just leave it as it is.

So, for all sep:

replace input (fst <$> match sep) == input

Example

Find and uppercase the "needle" pattern.

replace "hay needle hay" (toUpper <$> string "needle")

Result:

"hay NEEDLE hay"

Example

Find integers and double them.

replace "1 6 21 107" (show <$> (_*2) <$> intDecimal)

Result:

"2 12 42 214"

#replaceT Source

replaceT :: forall m. Monad m => MonadRec m => String -> ParserT String m String -> m String

Monad transformer version of replace.

Example

Find an environment variable in curly braces and replace it with its value from the environment. We can read from the environment with lookupEnv because replaceT is running the sep parser in Effect.

replaceT "◀ {HOME} ▶" do
  _ <- string "{"
  Tuple variable _ <- anyTill (string "}")
  lift (lookupEnv variable) >>= maybe empty pure

Result:

"◀ /home/jbrock ▶"

Perl Problems