[SOLVED] Parsing: how to get all available matches? (original) (raw)
October 10, 2024, 10:49pm #1
I’m trying to find matches in a text, would seem to be basic stuff. But I can’t get it to work. Or rather, the code I expect to do the job keeps throwing EOF parsing error.
Given this:
module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (runParser)
import Parsing.Combinators (many, manyTill)
import Parsing.String (anyChar, string)
main :: Effect Unit
main = do
let matchFoo = manyTill anyChar (string "foo")
logShow $ runParser "foo foo bar" (many matchFoo)
I keep getting
(ParseError "Unexpected EOF" (Position { column: 12, index: 11, line: 1 }))
Tried inserting various combinations of try
and optional
to no avail. This is also a minimal testcase from a somewhat larger parser that I’ve built that kept randomly failing.
Honestly, it feels like a bug, because many documentation says, quoting:
Match the phrase
p
as many times as possible.
So is it not correct for me to expect it to match matchFoo
2 times here? What am I missing?
October 11, 2024, 2:59am #2
Have you looked at manyTill_ and try?
Could you please elaborate, I’m not quite getting how these help. manyTill_
perhaps I could apply by replacing the manyTill
I am using, but this doesn’t help with the error. And try
“[on fail] backtracks the input stream to the unconsumed state”, which would result in infinite loop because at some point due to no matches stream will stop moving and will not be exiting either.
(useless text to avoid discourse error “body is similar to what you recently posted”)
Hm, I didn’t expect discourse shows up removed posts. I removed it because apparently I pressed the wrong “reply” button, so the other user wasn’t tagged. I do that sometimes on StackOverflow: if you posted a comment like 5 seconds ago, nobody had read it anyway, so remove it and repost it. Well, apparently it doesn’t work that well here…
The purescript-parsing
library includes the design decision that when faced with a choice between two different parses, the second option is only tried if the first fails having consumed no input, and otherwise the failure (or success) of the first option is raised up the parser stack. many
inherits this decision, as it is effectively a choice between running one more copy of its argument and stopping. So if the argument to many
fails after consuming some tokens, the entire many
will fail instead of backtracking to the last completed inner parse.
matchFoo
will eat anyChar
it sees, so when many
runs matchFoo
for the third time after the second "foo"
, it consumes " bar"
and then fails because it ran out of tokens to eat without seeing a "foo"
. Having consumed a non-zero amount of tokens, many
will propagate this failure up.
try
is what you’re looking for because try matchFoo
is a version of matchFoo
that, when it fails, will always act as if it never consumed any tokens, because it backtracks the stream to where it was before matchFoo
runs. However, it still propagates the failure up to many
so many
knows not to run its argument any more.
Oh, I see, thank you for elaboration! Okay, so many $ try matchFoo
it is. Indeed, this is not obvious, because if I’d be looking at this line without knowing about such library nuance, I’d be wondering why someone inserted try
in here. Will try (no pun intended) to apply it to my other code, thanks!
For this kind of “all available matches” parsing you might also be interested in Parsing.String.Replace.splitCap and Parsing.String.anyTill.
For example
logShow $ runParser "foo foo bar" (many $ try $ anyTill $ string "foo")
(Right ((Tuple "" "foo") : (Tuple " " "foo") : Nil))
logShow $ splitCap "foo foo bar" (string "foo")
(NonEmptyList (NonEmpty (Right "foo") ((Left " ") : (Right "foo") : (Left " bar") : Nil)))