Chido Parse (original) (raw)

8.15

William Hatch <william@hatch.uno>

1 Guideđź”—â„ą

NOTE - chido-parse is NOT YET STABLE, and APIs may change.

Chido Parse is an expressive, extensible, composable parsing system. It allows users to build and compose parsers using any mix of ad-hoc procedural parsing, parser combinators, and other parsing abstractions and interfaces, such as readtables and BNF. It allows ambiguous grammars, but provides tools to filter ambiguity, including declarative operator associativity and precidence rules in readtables and BNF rules. Chido Parse supports context-sensitive grammars, including data-dependent grammars, in addition to supporting the entire class of context-free grammars.

Chido Parse is also... slow. So until and unless it can be optimized about 1 order of magnitude faster, I don’t expect people to really use it except for quick prototyping. If it gets 1 order of magnitude faster it will still be slow, but probably faster than macro expansion (at least for s-expression based languages), which will make it viable for projects where extensible and composable concrete syntax extension is as important as macro-based extension. But if you want fast compilation... look elsewhere.

You can use Chido Parse by constructing parser objects and then using whole-parse to parse your input. The simplest kind of parser to use is a literal string, and you can easily use combinators like sequence.

(define abc-d (whole-parse "abc" (sequence "a" "b" "c")))

The above code will return a derivation object. We can get a semantic result out with parse-derivation-result:

(parse-derivation-result abc-d)

Which will return '("a" "b" "c").

One of the key features of Chido Parse is its support for procedural parsers. Parsing procedures take a port and return either a derivation or a parse failure. To use them as parsers, you need to wrap them with the proc-parser struct. We can easily re-use existing parsers this way.

(proc-parser (λ (port) (make-parse-derivation (read-syntax (object-name port) port))))

Actually, procedural parsers can return a stream tree of parse derivations and failures. This allows them to support ambiguous grammars. However, result streams for Chido Parse should be constructed with stream-cons or for/parse, unless you know what you’re doing.

Procedural parsers can recur back into the parsing system by using parse*.parse* is like whole-parse, except that it doesn’t have to consume the whole input and it returns a stream of derivations. Note that whole-parse actually returns a parse failure if there is more than one derivation. If you want multiple results from a top-level parse, you can use whole-parse*. While parsers can return a tree of streams, they are flattened and parse* and whole-parse* return a stream whose members are all parse-derivation? objects.

For example:

(define plus
(proc-parser
(λ (port)
(for/parse ([l (parse* port expression)])
(for/parse ([op (parse* port "+" #:start l)])
(for/parse ([r (parse* port expression #:start op)])
(make-parse-derivation
`(+ ,(parse-derivation-result l)
,(parse-derivation-result r))
#:derivations (list l op r))))))))

Note that parse* does not actually consume input from the port, so successive parses need the #:start keyword. But there’s an easier way. Instead of parse*, you can use parse*-direct, which uses delimited continuation magic to loop over a stream while looking like straight-line code:

(define plus
(proc-parser
(λ (port)
(define l (parse*-direct port expression))
(define op (parse*-direct port "+"))
(define r (parse*-direct port expression))
(make-parse-derivation
`(+ ,(parse-derivation-result l)
,(parse-derivation-result r))
#:derivations (list l op r)))))

Note that if we define the expression parser like so:

It turns out that our procedural plus parser is left-recursive! Everybody knows that if you make a left-recursive procedural parser you get infinite recursion! But not in Chido Parse. Chido Parse dynamically detects left-recursive dependency cycles and resolves them by capturing and descheduling first-class delimited continuations. When a cycle occurs, Chido Parse saves the continuation, finds other work, and then applies the continuation to a result when one is available from a different alternate. If there is no alternate that can make progress, Chido Parse breaks the cycle by returning a failure result to the left-recursive parser. Failing cycles this way is mostly so the parser can compute a good failure message, but conceivably you could implement a biased alternate this way. However, at present Chido Parse doesn’t guarantee any order to cycle breaking, so don’t.

Chido Parse also has BNF forms, of course:

(define-bnf dumb-statement-parser
[statement ["pass"]
["for" id "in" expression "do" statement]
["{" [: #:repeat-min 0 #:splice 1 statement] "}" #:name "block"]
[expression]]
[expression [number-parser]
[id]
[expression "+" expression #:associativity 'left]
[expression "*" expression
#:associativity 'right
#:precedence-greater-than '("+")]]
; Obviously this is a bad identifier parser, but I'm making dumb examples in a hurry.
[id [[: #:repeat-min 1 (char-range "az")]]])

The define-bnf form has a bunch of optional arguments for specifying automatic layout insertion (IE allowing whitespace between things). Note that these things are all just parsers. You can use a procedural parser anywhere in your BNF definition, you can stick a BNF parser in a combinator, etc.

Of course, the real reason I started Chido Parse was because I wanted better readtables. So Chido Parse has a readtable-style abstraction for making and extending fancy s-expression parsers.

(define cool-s-exp
(chido-readtable-add-list-parser "$(" ")" basic-chido-readtable
#:wrap '#%dollar-paren))

Now cool-s-exp can parse

(foo bar $(quux flub) zonkers)

as '(foo bar (#%dollar-paren quux flub) zonkers)

Chido readtables support infix/prefix/postfix operators as well as the kinds of extensions that Racket’s normal readtables support. Chido readtables support arbitrary-length prefixes unlike Racket readtables with their single character prefix..

Chido readtables and BNF parsers can be extended, eg. with extend-chido-readtable and extend-bnf. BNF parsers are actually made out of readtables, and you can do extend-readtable-as-bnf.

Parsers can be extended, modified, or loaded dynamically. Eg. you can use extend-readtable during parsing, and you can create parsers like Racket’s built-in #reader form. All these fancy parsing abstractions are built out of procedural parsers. If there is another parsing abstraction you know of and like, you can build that abstraction too!

This is clearly a bad guide, but I wanted to write at least something real quick.

TODO - write a better guide.

2 Referenceđź”—â„ą

NOTE - chido-parse is NOT YET STABLE, and APIs may change.

2.1 Gotchasđź”—â„ą

Don’t use parameter?s, use chido-parse-parameter?s. Don’t use stream-cons or other stream constructors, use parse-stream-cons and its derivatives provided by Chido Parse.

2.2 Core APIđź”—â„ą

2.2.1 Parsing Functionsđź”—â„ą

Parse the input with parser. If start is #f, start at the current port position (at the beginning if input is a string). Returns a stream of parse derivations. The empty stream element is a parse-failure? instead of the vanilla empty-stream.

If this is the first call into the parsing system, the input port is actually replaced for inner parses with a wrapper that can be rewound for multiple ambiguous parses. In any event, parse* does NOT change the current position of the port. If you do a series of parses with parse*, you should pass the previous derivation as the start position to the next call of parse*. Realistically, within a parser parse* should only be used in tail position or in combination with for/parse. For a more convenient function for use within parsers, see parse*-direct.

This is technically the core parsing function, but you probably want to use whole-parse or whole-parse* for the initial parse, and parse*-direct inside parsers.

Like parse*, but:

Like parse*, but only returns results that consume the entire input (starting at start position).

whole-parse is to whole-parse* as parse is to parse*. Always starts at the current port position, only returns results that consume the entire input, and fails if the result is ambiguous.

(parse*-direct input parser [#:start start #:failure failure-arg]) → parse-derivation?
input : input-port?
parser : parser?
start : (or/c #f number? parse-derivation?) = #f
failure-arg : (or/c #f (-> parse-failure? parse-failure?)) = #f

Can only be used inside a parser?. Like parse*, but it uses delimited continuation magic. The return to the call-site is always a parse-derivation?. However, parse*-direct captures the current (delimited) continuation of the parser and returns a stream which applies the continuation to every derivation returned by parse*. The overall return of the parser will be the stream created by parse*-direct. You can use multiple calls to parse*-direct within a function and it will nest the streams appropriately. When parse*-direct returns, the port argument of the parsing procedure will be set to the end-position of the derivation returned, so you don’t need to supply start positions to future calls of parse*-direct.

For example, here is a parser that parses the string "abc".

However, if you are doing some OTHER delimited continuation stuff or creating a stream of results using something other than parse-stream-cons (or for/parse), you may need to take care to use delimit-parse*-direct so the overall parse result is correct.

The failure-arg lets you synthesize a custom parse-failure?, giving the inner failure from the end of the parse* stream to your function.

(delimit-parse*-direct expression)

This is automatically done upon executing a parsing procedure and by parse-stream-cons, as well as by for/parse. If you use something else to construct a result stream, you need to manually use this to prevent parse*-direct from aborting out of your stream constructor and returning its own stream in its place.

But really just don’t use something other than parse-stream-cons (and its derivatives) to construct result streams!

2.2.2 Parse Derivationsđź”—â„ą

Make a parse derivation. Can only be called during the dynamic extent of a parser?. If end is #f, it determines the end by calculating the maximum end position of the subderivations supplied. If no subderivations are supplied and end is #f, it raises an error.

Result can be anything, but if it is a procedure it is assumed to be a delayed result. The procedure should be of the form:

(λ (src line col pos span derivations) (datum->syntax #f 'my-result (list src line col pos span)))

(parse-derivation? x) → bool/c
x : any/c

Predicate for parse-derivations.

Returns the result of a parse-derivation. May force the supplied procedure, but the result is cached and only forced once.

Return the parser? that was used to create pd.

Returns the source name of the input used for parsing.

Returns the line number of the derivation.

Returns the column number of the derivation.

Returns the start position number of the derivation.

Returns the end position number of the derivation.

Returns the list of subderivations used in constructing pd. Useful for filtering operator precedence and such.

2.2.3 Parse Failuresđź”—â„ą
(make-parse-failure [#:message message #:position position #:end end #:made-progress? made-progress? #:inner-failure inner-failure #:failures failures])
→ parse-failure?
message : (or/c #f string?) = #f
position : (or/c number? #f) = #f
end : (or/c number? #f) = #f
made-progress? : any/c = progress-default
inner-failure : (or/c parse-failure? #f) = #f
failures : (or/c (listof parse-failure?) #f) = #f

Make a parse-failure?. Can only be called during the dynamic extent of a parser?. The message and position are used to help the user figure out what went wrong. Inner failures are also used to improve the message, and provide a chain of failures. Everything aside from the message is used in comparing parse failures when there are ambiguous failures to try to find the best failure message to display to the user.

When the chido-parse-keep-multiple-failures? is true, all inner failures are kept (IE inner-failure and failures), though there isn’t currently a very useful way of looking at them. When chido-parse-keep-multiple-failures? is false, only the best inner failure is kept. If inner-failure is supplied, it is always used as the best inner failure. If inner-failure is not supplied, a heuristic sorting function is used on failures to choose one. Of course, a terminal parser will have no inner failures, and that’s fine too.

(parse-failure? x) → bool/c
x : any/c

Predicate for parse failure objects.

Note that parse failures implement the gen:stream interface as empty streams.

Convenience function for turning an exception into a parse failure. This is not done automatically – an unhandled exception during parsing will kill the entire parser. But if you want to run an existing parsing function that raises exceptions upon failure, catch the exception and use this.

Returns the location triple of a parse failure as a string. Eg. ::.

Returns a string for the parse failure. Includes location triple, name of the parser that failed, and the message if one is on the failure object.

Like parse-failure->string/simple, but searches the failure chain to find the first message if one is not on the outermost failure.

Formats the list of failures that cascaded into the creation of pf, showing the info from parse-failure->string/simple for each.

Like parse-failure->string/chain, except it shows ALL failures, not just the chain of best failures. It indents each message to show the tree relationship of failures. This is only useful if chido-parse-keep-multiple-failures? is set to true. Be prepared for a lot of text.

Parameter used to determine whether to store all sub-failures within a failure object. Keeping all failures is mostly an academic exercise.

2.2.4 Chido Parse Parametersđź”—â„ą
(chido-parse-parameter? x) → any/c
x : any/c

Predicate for chido-parse-parameters.

Don’t use parameter?s inside parsing procedures. Use chido-parse-parameters instead.

They behave similarly, but chido-parse-parameters are a key to the parsing cache (so you don’t accidentally pull out a cached result that was constructed by referring to a different parameter value, such as a different value for current-chido-readtable) and are preserved by parse-stream-cons, whereas regular parameters are NOT preserved by stream-cons and friends.

(chido-parse-parameter default) → chido-parse-parameter?
default : any/c

Construct a chido parse parameter with the given default value.

(chido-parse-parameterize ([cp-param value] ...) body ...)

2.2.5 Stream constructionđź”—â„ą

(parse-stream-cons head tail)

Use this instead of stream-cons. Like stream-cons, but plays nicely with chido-parse-parameters and parse*-direct.

(for/parse ([binding-id stream-expression]) body ...)

Similar to other for forms. But it returns a stream created using parse-stream-cons and the stream-expression must return a stream.

2.2.6 Parsers and core parser constructorsđź”—â„ą
(parser? x) → any/c
x : any/c

Don’t really use this. This should maybe be replaced by a higher-order contract. Right now it’s just a first-order check that things seem OK.

BUT - here I’ll at least document what this is meant to check, and what you can expect from functions that return parser?s.

Parsers are either proc-parsers, alt-parsers, string?s, structs that implement prop:custom-parser, non-cached-parser-thunk, or a thunk that returns one of those.

Thunks are allowed because parsers often need to recursively refer to each other. If parsers are mutually recursive, you have an ordering issue where neither can be constructed without the other. To break the cycle, put one in a thunk! Thunks that contain parsers are only forced once and cached, so you can rely on the results of the thunk being eq? (eg.for filtering purposes). Because sometimes you really do want the thunk to be re-evaluated, thunks wrapped with the non-cached-parser-thunk struct are NOT cached. Additionally, chido-parse-parameter?s used as parser thunks are not cached, because their whole point is to be this dynamically extensible thing to allow you to implement parsers independently then have them dynamically reference the right parser when recurring.

If you use a procedure as a parser that is NOT a thunk that returns (perhaps eventually through multiple layers of thunks) a primitive parser, you will get an exception when the scheduler actually tries to force it and use it as a parser.

(proc-parser proc [#:prefix prefix #:name name #:preserve-prefix? preserve-prefix? #:promise-no-left-recursion? promise-no-left-recursion?])
→ proc-parser?
proc : (-> input-port? (or/c any/c stream?))
prefix : string? = ""
name : string? = (format "~a" (object-name proc))
preserve-prefix? : any/c = #f
promise-no-left-recursion? : any/c = #f

Make a procedural parser. The procedure should accept an input port. It should return a parse-derivation? or parse-failure?. If it’s an ambiguous parser (or recurs with parse*, therefore inheriting any ambiguity of the recursive parser used), it should return a stream tree that ultimately contains parse-derivation?s and parse-failure?s.

If prefix is given, it is a literal prefix to match before the procedure is called. Unless preserve-prefix? is true, the port is set to just after the prefix, so your parsing function can focus on the interesting part of the parse.preserve-prefix? is mostly there as a convenience for wrapping existing parsing procedures.

The prefix may be queried and used in a semantically meaningful way, as by chido-readtable?s.

Parsers with prefixes (that don’t use preserve-prefix?) are statically known not to be left-recursive. The scheduler is optimized to try non-left-recursive parsers first. In RacketCS continuation capture is really cheap, but it’s somewhat expensive in RacketBC, so an additional optimization is that parsers that are marked as non-left-recursive DON’T get their continuations captured and aborted when recurring! If you really promise that your procedure is not left-recursive, you can mark it as such with promise-no-left-recursion. But if you lie your parser might go into an infinite loop!

(proc-parser? p) → any/c
p : any/c

Predicate for parsers constructed with proc-parser. If this is true, parser? is also true.

(alt-parser [#:name name] p ...) → alt-parser?
name : (or/c string? #f) = #f
p : parser?

Construct a parser that returns the results of all of the given parsers. IE it’s an alternation combinator.

(alt-parser? p) → any/c
p : any/c

Predicate for parsers constructed with alt-parser. If this is true, parser? is also true.

For when you want the parser to be computed fresh every time you use it. Note that if the thunk returns ANOTHER thunk, the inner thunk WILL be cached.

You can define your own structs that act as parsers.

TODO - I should document this. But really it should probably be gen:custom-parser and support multiple methods, eg. for name, prefix, etc.

IE this is not stable.

At any rate, while this is undocumented you can also just use a function that takes your struct and returns a parser, though that’s less satisfying.

Get the name of a parser.

Note that this has to force thunks.

If you try to use sub-parsers when constructing the name of a parser, make sure you don’t have a mutually recursive cycle going on, or you will have a bad time.

Get the prefix of a parser.

Note that this has to force thunks.

If you are making a combinator, you might want to use this to determine whether your whole result will be left recursive and use it when constructing a proc-parser.

Note that this has to force thunks.

If you are making a combinator, you might want to use this to determine whether your whole result will be left recursive and use it when constructing a proc-parser.

Note that this has to force thunks.

2.3 Literal Parsersđź”—â„ą

You can use string?s as literal parsers. You can always create proc-parsers that parse literals. The other literal parsers here are just proc-parsers that are convenient enough to be in the standard parser library.

Parser that succeeds when reading a port would return an eof-object?.

A parser that parses any single character.

2.4 Combinatorsđź”—â„ą

The alternation combinator alt-parser is documented in another section.

(sequence [#:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:between between #:before before #:after after] parser ...) → parser?
name : (or/c #f string?) = #f
derive : procedure? = #f
make-result/bare : procedure? = #f
make-result/stx : procedure? = #f
between : (or/c #f parser?) = #f
before : (or/c #f parser?) = #f
after : (or/c #f parser?) = #f
parser : parser?

Sequence combinator. You can add optional parsers between members of the sequence with between. The between parser is not put between the first and last normal parsers and the before and after parsers. For example (sequence #:before "<" "a" "b" #:after ">" #:between "-") would parse "" but not "<-a-b->".

The derive, result/bare, and result/stx arguments are for building the result derivation, and only one may be given (a non-#f value). They are all procedures that must accept one value for each of parser, but not the before, between, or after parsers.

derive should be a function that accepts one derivation for each parser, and should return a parse-derivation?.

result/bare should be a function that accepts the result values (IE using parse-derivation-result) from each parser and should return a value to be used as the result field of make-parse-derivation. You can also provide #t instead of a procedure to just get the results in a list.

result/stx is like result/bare, but it wraps the result you return as a syntax object with appropriate source location info. You can also provide #t instead of a procedure to just get the results in a syntax list.

If none of the result options are given, the default is as if result/stx is #t.

TODO - example usage.

(repetition [#:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:min min #:max max #:greedy? greedy? #:between between #:before before #:after after] parser) → parser?
name : (or/c #f string?) = #f
derive : procedure? = #f
make-result/bare : procedure? = #f
make-result/stx : procedure? = #f
min : number? = 0
max : number? = +inf.0
greedy? : any/c = #f
between : (or/c #f parser?) = #f
before : (or/c #f parser?) = #f
after : (or/c #f parser?) = #f
parser : parser?

Repetition combinator. Parse parser repeatedly.

The optional argument API is like sequence.between is used between iterations of parser, before and after are parsed before/after the first/last repetition of parser.

derive, result/bare, and result/stx act like sequence but should accept a single argument which will be a list of derivations returned by parser (in order). Again, result/bare and result/stx may be #t instead of a procedure, in which case the results will be put in a list or syntax list. If none of the options are non-false, result/stx is treated as #t.

You can specify the minimum and maximum number of acceptible repetitions with min and max. If greedy? is non-false, derivations will only be returned if they parsed the max number of repetitions or no more repetitions can be parsed after the last one. Note that the result may still be ambiguous depending on whether parser or before/between/after are ambiguous. If greedy? is false, all derivations with a number of repetitions between min and max, inclusive, are returned. Frankly, for any realistic use, you want greedy? to be true, because the extra ambiguity of non-greedy makes a lot of work for the system, even though most of it is immediately thrown away.

TODO - usage example.

UNSTABLE - I might change the default option for greedy?. On the one hand, false gives the more general parser and what most people would probably expect at first. But non-greedy repetition is just so much slower that any practical use really needs to reach for greedy.

(kleene-star [#:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:greedy? greedy? #:between between #:before before #:after after] parser) → parser?
name : (or/c #f string?) = #f
derive : procedure? = #f
make-result/bare : procedure? = #f
make-result/stx : procedure? = #f
greedy? : any/c = #f
between : (or/c #f parser?) = #f
before : (or/c #f parser?) = #f
after : (or/c #f parser?) = #f
parser : parser?

Like repetition with min 0 and max infinity.

UNSTABLE - I might change the default option for greedy?.

(kleene-plus [#:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:greedy? greedy? #:between between #:before before #:after after] parser) → parser?
name : (or/c #f string?) = #f
derive : procedure? = #f
make-result/bare : procedure? = #f
make-result/stx : procedure? = #f
greedy? : any/c = #f
between : (or/c #f parser?) = #f
before : (or/c #f parser?) = #f
after : (or/c #f parser?) = #f
parser : parser?

Like repetition with min 1 and max infinity.

UNSTABLE - I might change the default option for greedy?.

(kleene-question [#:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:greedy? greedy? #:between between #:before before #:after after] parser) → parser?
name : (or/c #f string?) = #f
derive : procedure? = #f
make-result/bare : procedure? = #f
make-result/stx : procedure? = #f
greedy? : any/c = #f
between : (or/c #f parser?) = #f
before : (or/c #f parser?) = #f
after : (or/c #f parser?) = #f
parser : parser?

Like repetition with min 0 and max 1.

UNSTABLE - I might change the default option for greedy?.

(permutation [#:name name #:derive derive #:result/bare make-result/bare #:result/stx make-result/stx #:between between #:before before #:after after] parser ...) → parser?
name : (or/c #f string?) = #f
derive : procedure? = #f
make-result/bare : procedure? = #f
make-result/stx : procedure? = #f
between : (or/c #f parser?) = #f
before : (or/c #f parser?) = #f
after : (or/c #f parser?) = #f
parser : parser?

Permutation combinator. The interface is essentially the same as sequence, but the parser accepts all permutations of the sequence of parsers.

(epsilon-parser [#:name name #:result result]) → parser?
name : string? = "epsilon"
result : (or/c #f any/c) = #f

Parser that parses the empty string. You can provide a custom name and result (IE the result for make-derivation).

(not-parser parser [#:name name #:result result]) → parser?
parser : parser?
name : string? = (format "not_~a" (parser-name parser))
result : any/c = #f

Create a parser that succeeds when parser fails. In other words, if the result of parser is a stream with any derivations, the not-parser fails. If the result of parser is a parse failure (IE empty stream), this parser succeeds.

You can supply result as the result field for make-parse-derivation.

(peek-parser parser) → parser?
parser : parser?

Return a parser that parses with parser but wraps the resulting derivation in another derivation whose end point is its start point. The wrapped derivation will have the same result as the inner derivation returned by parser.

You could use this with eg. parse*-direct to peek ahead without actually effecting the input port.

Parses characters in the given range. You can supply a string of length 2 for min and don’t provide max, or you can supply two characters or strings of length 1. The first character must have a lower code point value than the second.

Returns a parser that parses a regexp. The derivation’s result field is the result returned by regexp-match.

(as-syntax p) → parser?
p : parser?

Returns a parser whose derivation result is the same as p’s, but wrapped up as a syntax object with appropriate source location info.

(wrap-derivation p wrap-func [#:name name]) → parser?
p : parser?
wrap-func : (-> parse-derivation? parse-derivation?)
name : (or/c #f string?) = #f

Creates a parser that applies wrap-func to the derivations returned by p.

(binding-sequence elem ... global-option ...)
elem = parser | [: parser elem-optional ...] global-option = #:name parser-name #:derive derive #:result/bare result/bare #:result/stx result/stx #:splice splice-global #:inherit-between inherit-between #:between between-parser elem-optional = #:bind binding-name #:ignore ignore #:splice splice-number #:repeat-min repeat-min #:repeat-max repeat-max #:repeat-greedy? repeat-greedy?

Produce a sequence parser that can bind intermediate results and reference them in parsers later in the sequence. Each element is evaluated to construct parsers after the previous parser has returned a derivation, so binding-sequence can be used to construct data-dependent parsers.

To bind the derivation as a result, use the #:bind keyword to bind the derivation to name. Then just refer to name later!

Global options affect the entire sequence.between-parser is inserted between each element in the sequence. If between-parser is not specified but inherit-between is truthy, then the sequence inherits the between-parser from any binding-sequence that surrounds this one. If splice-global is a number greater than 0, the resulting list is spliced that many times (which is only useful if your sequence only has a single non-ignored element). The derive, result/bare, and result/stx arguments are similar to the similar arguments in sequence, though note that splices and ignores affect the list of arguments that will be passed to them. The default derivation result will be a syntax list of the results of the elements (taking into account splices and ignores).

Each element in the sequence may have extra options about binding, splicing, or repetition. The binding-name argument is bound to the derivation of that element for future elements in the sequence. The splice-number argument should be a positive integer, and the result list of that parser will be spliced that number of times into the outer result list. If the ignore argument is true, that element is not included in the result list. The repetition arguments act as in repetition, though the between-parser is inserted between repetitions.

2.4.1 Filtersđź”—â„ą
(parse-filter p filter-func [#:replace-derivation? replace-derivation? #:include-failures? include-failures?])
→ parser?
p : parser?
filter-func : (-> port derivation (or/c any/c parse-derivation?))
replace-derivation? : any/c = #f
include-failures? : any/c = #t

This is probably not your go-to filter combinator, but it is the most general.

The filter-func receieves the input port at the position after the parser’s derivation, as well as the derivation itself. If the filter-func returns false, that derivation is removed from the result stream of the filtered parser. If replace-derivation? is true, the filter-func’s non-false results must be new parse derivations, which replace the original derivations in the parse stream. Otherwise the original parse derivations are used even if the truthy value returned by filter-func is itself a parse derivation.

If include-failures? is true, each filtered derivation will be replaced in the stream by a parse-failure? with a message about the filtered derivation. It’s really only useful if you want to track all failures with chido-parse-keep-multiple-failures?.

(not-follow-filter main-parser not-follow-parser [#:include-failures? include-failures?])
→ parser?
main-parser : parser?
not-follow-parser : parser?
include-failures? : any/c = #t

Filters out derivations if not-follow-parser can succeed immediately after the derivation.

Eg. (not-follow-filter "abc" "d") parses the strings "abc" and (the first three characters of) "abce" but not the string "abcd"

(must-follow-filter main-parser must-follow-parser [#:include-failures? include-failures?])
→ parser?
main-parser : parser?
must-follow-parser : parser?
include-failures? : any/c = #t

Filters out derivations if must-follow-parser can not succeed immediately after the derivation.

Eg. (must-follow-filter "abc" "d") parses the first three characters of the string "abcd", but does not parse the strings "abc" or "abce".

(derivation-filter p filter-func [#:replace-derivation? replace-derivation? #:include-failures? include-failures?])
→ parser?
p : parser?
filter-func : (-> parse-derivation? (or/c any/c parse-derivation?))
replace-derivation? : any/c = #f
include-failures? : any/c = #t

Filter out derivations that don’t pass filter-func, also replace the derivations if replace-derivation? is true.

(result-filter p filter-func [#:replace-result? replace-result? #:include-failures? include-failures?]) → parser?
p : parser?
filter-func : (-> any/c any/c)
replace-result? : any/c = #f
include-failures? : any/c = #t

Filter out derivations whose derivation-result does not pass filter-func. If replace-result? is true, derivations are replaced with derivations whose result is the value returned by filter-func.

2.5 Chido-Readtable Parsersđź”—â„ą

Chido-readtables have an interface similar to Rackets readtables (see Readtables in the Racket Guide)

Chido-readtables are a fancy kind of alternative parser. Among their alternatives, chido-readtables have a built-in symbol parser whose behavior is modified by the other parsers. Chido-readtables are extended with parsers and a symbol denoting an effect on the built-in symbol parser, such as 'terminating or 'nonterminating-layout.

When the built-in symbol parser is used by a chido-readtable, the symbol continues until reaching the prefix of a 'terminating or 'terminating-layout parser or a successful parse from a 'soft-terminating or 'soft-terminating-layout parser.

Layout parsers are whitespace or comment parsers that delimit symbols and are generally allowed between other parsers inside lists. Results from layout parsers are ignored when creating parse results from readtable parsers, only the fact that they succeed at parsing is relevant. They come in 'terminating-layout, 'soft-terminating-layout, and 'nonterminating-layout varieties, which match the symbol effects of 'terminating, 'soft-terminating, and 'nonterminating parsers, respectively.

Terminating parsers are parsing alternatives that also delimit symbols (IE they terminate the built-in symbol parsing). In other words no space is necessary between a symbol and a following terminating parser.'soft-terminating parsers terminate a symbol only of they parse successfully. Hard terminating ('terminating) parsers terminate a symbol when their prefix matches, whether or not they parse successfully. Therefore the prefixes of hard terminating parsers are disallowed anywhere within symbols.

Nonterminating parsers are alternatives that do not delimit a symbol. Layout (whitespace) is required between a symbol and a following nonterminating parser.

Terminating and nonterminating parsers can follow each other with no layout in between, though layout is allowed between them, and usually good style to have.

Note that when using a chido-readtable as a parser directly, it parses only ONE form, equivalent to using chido-readtable->read1. When you want to parse a sequence using chido-readtables, instead of using a kleene star combinator, you should use chido-readtable->read*.

Ambiguous results are possible from parsing with a chido-readtable when terminating/nonterminating parsers themselves return ambiguous results or when multiple terminating/nonterminating parsers are successful at a given position. If another parser is successful, the symbol parser is not tried. The symbol parser never returns an ambiguous result.

One final parser flag is 'left-recursive-nonterminating. Parsers with the 'left-recursive-nonterminating flag are run after the symbol parser and do not effect symbol parsing. While chido-parse doesn’t normally need a flag to handle left-recursive parsers, because most readtable parsers are run before the symbol parser and used to effect the symbol parser, left-recursive parsers need a special flag in the readtable parser.

2.5.1 chido-readtable objectsđź”—â„ą

A chido-parse-parameter holding a chido-readtable?. Use this in any parser that needs to recur by parsing a chido readtable. Using it allows extension parsers to be written independently and combined while allowing each one to reference the combined readtable, rather than having to re-construct each extension with a reference to the proper extended readtable.

A chido-readtable? with NO parsers of ANY kind attached, except the built-in symbol parser.

TODO - list the flags that the table starts with, eg. chido-readtable-symbol-support?.

(chido-readtable? x) → any/c
x : any/c

Predicate for chido-readtables. Note that if chido-readtable? is true, then parser? is true too. When used directly as parsers, chido-readtables parse once with no leading or trailing layout. To properly parse multiple times with a chido-readtable, use chido-readtable->read* to use the readtable’s layout configuration.

2.5.2 Variations on chido-readtable parsersđź”—â„ą

Returns an opaque parser that parses a single form using rt (with NO leading whitespace allowed). The parser is equivalent to using the readtable directly as a parser.

Like chido-readtable->read1, but with leading layout (using the layout parsers within the readtable).

Returns an opaque parser that parses a sequence of forms using rt, returning a list. This is preferable to using a chido-readtable directly inside a kleene-star combinator because it consistently handles layout parsing and allows trailing layout. The parser allows leading, middle, and trailing layout per the layout parser configuration in rt, and can successfully parse an empty list of forms (which may still have layout).

Note that this does NOT parse parentheses at the start/end of a list – rather, this is a procedure that could be used on the inside of a list. Rather, if you want to parse several (potentially extended) s-expressions within a file that potentially has layout (whitespace) before, between, and after the forms, you would use this parser. Of course, you can also make a list parser with (sequence "(" (chido-readtable->read* rt) ")"). But chido-readtable-add-list-parser also handles the extra details of adding the list parser to the readtable for recursive parses, adding a terminating and failing parser for the right delimiter, properly using the current-chido-readtable, and other conveniences, so probably just use it instead.

Like chido-readtable->read*-parser but it requires at least one form. In other words, it fails on empty lists.

Returns an opaque parser that parses a single piece of layout using rt.

Returns an opaque parser that parses any amount of layout using rt.

Returns an opaque parser that parses one or more pieces of layout using rt.

Parser that applies chido-readtable->read1 to current-chido-readtable.

Parser that applies chido-readtable->layout* to current-chido-readtable.

Parser that applies chido-readtable->layout+ to current-chido-readtable.

Parses symbols using the symbol parser of current-chido-readtable. IE it uses all the extensions to determine what symbols can be parsed, but it is ONLY the symbol parser.

UNSTABLE - chido readtables don’t yet support symbols with escapes (IE with backslash) or as literals with pipes around them as Racket’s built-in readtable does. At some point there will be options to set on the readtable itself to toggle options about those.

2.5.3 Extending and modifying chido-readtablesđź”—â„ą

All readtable “modifiers” are functional. They do not actually mutate a readtable, they merely return a new modified readtable.

The same as

(or/c 'terminating
'soft-terminating
'nonterminating
'terminating-layout
'soft-terminating-layout
'nonterminating-layout
'left-recursive-nonterminating)
(extend-chido-readtable mode parser rt [#:operator operator-type #:precedence-less-than precedence-less-than #:precedence-greater-than precedence-greater-than #:associativity associativity])
→ chido-readtable?
mode : chido-readtable-symbol-effect/c
parser : parser/c
rt : chido-readtable?
operator-type : (or/c #f 'infix 'prefix 'postfix) = #f
precedence-less-than : (or/c any/c (listof any/c)) = #f
precedence-greater-than : (or/c any/c (listof any/c)) = #f
associativity : (or/c #f 'left 'right) = #f

Extend rt with parser and given mode. To understand the various modes, see Chido-Readtable Parsers.

Also, chido-readtables support operators. You can use the optional arguments to do so, but better yet see chido-readtable-add-mixfix-operator.

(extend-chido-readtable* rt arg ...) → chido-readtable?
rt : chido-readtable?
arg : any/c

Applies extend-chido-readtable multiple times. Eg.

(extend-chido-readtable* my-rt
'terminating at-reader-parser
'nonterminating some-other-parser)
(chido-readtable-add-list-parser l-delim r-delim rt [#:wrapper wrapper #:inside-readtable inside-readtable #:readtable-symbol-effect readtable-symbol-effect])
→ chido-readtable?
l-delim : string?
r-delim : string?
rt : chid-readtable?
wrapper : (or/c #f symbol? (-> syntax? syntax?)) = #f
inside-readtable : (or/c #f chido-readtable? (-> chido-readtable?)) = #f
readtable-symbol-effect : chido-readtable-symbol-effect/c = 'terminating

Adds a list parser with the given delimiters. Eg. the default parenthesis parser in Racket is similar to

(chido-readtable-add-list-parser "(" ")" my-rt)

.

If inside-readtable is #f, the current-readtable is queried for recursive parses (of both forms and layout). You can instead set inside-readtable to a particular readtable to use it, or to a thunk that provides a readtable, in which case the thunk will be called (with no caching) to get an inner readtable each time.

If wrapper is a symbol, that symbol is prepended to the list. Eg. if you use

(chido-readtable-add-list-parser "$(" ")" mostly-normal-rt #:wrapper '#%dollar-paren)

, then the string "$(foo bar)" would be parsed as '(#%dollar-paren foo bar).

You can also use a procedure for wrapper, in which case the procedure takes the result syntax as an argument and must return a syntax object.

Note that if l-delim and r-delim are the same character, you can’t nest lists. It’s a constant annoyance to me that people ever choose to use the same (likely single character) string as both left and right delimiters for things, because of course you can’t nest such things without some kind of escaping.

The readtable-symbol-effect should generally be 'terminating, but another reasonable choice is 'terminating-layout to get an s-expression comment form. I’m not sure it’s a terribly useful comment form, but it’s interesting at least.

UNSTABLE - these list forms do not yet support dots for improper lists or for operator moving (a la '(op l r)). I will eventually add them as optional keyword arguments (without using the keyword, no dots will be supported).

(chido-readtable-add-raw-string-parser l-delim r-delim rt [#:wrapper wrapper #:readtable-symbol-effect readtable-symbol-effect])
→ chido-readtable?
l-delim : string?
r-delim : string?
rt : chid-readtable?
wrapper : (or/c #f symbol? (-> syntax? syntax?)) = #f
readtable-symbol-effect : chido-readtable-symbol-effect/c = 'terminating

Adds a parser for raw strings with balanced inner delimiters. In other words, there are no escapes like "\n", every character in the source string exists in the parsed string, including newlines and embedded delimiters. If l-delim appears in the string before r-delim, it will cause an r-delim to be part of the string instead of the string terminator. This means that not all strings are representable with a given raw string parser added with this procedure. But if you really need a string that contains the same delimiters without being balanced, you can use a different string notation.

Raw strings are convenient for embedding source code as strings. For example, if we use

, we can parse the following as a single string:

TODO - this example is messing up my auto-indentation, so I’m commenting it out for the moment.

I mean, that’s a pretty dumb example, but I think you get the picture. When you have to embed HTML inside Javascript inside PHP inside ... escaping is super annoying. But realistically raw strings of source code are useful in Racket in particular because you can write macros to parse those string at compile time. And sometimes you just want to write strings that contain quotes or backslashes without a bunch of escaping nonsense. In other words, they are also nice for writing regular expressions (IE a special case of embedding source code as a string).

Note that if l-delim and r-delim are the same character, you can’t nest raw strings. It’s a constant annoyance to me that people ever choose to use the same (likely single character) string as both left and right delimiters for things, because of course you can’t nest such things without some kind of escaping.

The readtable-symbol-effect should generally be 'terminating, but another reasonable choice is 'terminating-layout to get a nestable multi-line comment form.

(chido-readtable-add-mixfix-operator name rt [#:layout layout #:precedence-greater-than precedence-greater-than #:precedence-less-than precedence-less-than #:associativity associativity])
→ chido-readtable?
name : (or/c symbol? string?)
rt : chido-readtable?
layout : (or/c 'required 'optional 'none) = 'required
precedence-greater-than : list? = '()
precedence-less-than : list? = '()
associativity : (or/c #f 'left 'right) = #f

Yes, that’s right, chido-readtables can have infix, prefix, and postfix operators. Whether or not that’s a good idea you can decide for yourself. But they’re supported!

This code:

adds a <+> operator to the readtable, so you could write eg:"(string-length foo) <+> (some-thunk)"and it would be read as:'(#%chido-readtable-infix-operator <+> (string-length foo) (some-thunk)).

The name must have an underscore everywhere you want a recursive parse to happen, thus determining whether the operator is infix, prefix, postfix, or “notfix” (meaning that it neither begins nor ends with a recursive parse). The fixity determines the implicit symbol that starts the result form, either '#%chido-readtable-infix-operator, '#%chido-readtable-prefix-operator, '#%chido-readtable-postfix-operator, or '#%chido-readtable-notfix-operator.

The second symbol in the result is the operator name, with any leading and trailing underscores stripped but any interior underscores preserved.

Example with inner holes (though the specific example is kind of dumb):

(chido-readtable-add-mixfix-operator 'IF_THEN_ELSE_ my-rt)

would parse the string"IF 5 THEN (foobar) ELSE quux"as'(#%chido-readtable-prefix-operator IF_THEN_ELSE 5 (foobar) quux).

The layout argument determines whether layout is required, optional, or disallowed between the operator components and the recursive parses. Obviously the only good answer here is 'required, but if you love poor language design where you can’t allow identifiers to include the characters used as operator names you can choose 'optional, just know that I think poorly of your choices. To round things out, you can also choose 'none, which I hope everyone can see is just a dumb choice. But I added it for the sake of completeness.

The associativity argument should not need much explanation. But it should be extended to allow associativity groups, which it currently doesn’t support.

The precedence-greater-than and precedence-less-than allow you to specify relative precedence of operators. Using these, your operator precedence forms a partial order, and any operators with no ordering between them require disambiguation (IE it’s an error to put them together). (I could have left it ambiguous, but really who wants that?) I could have written this with numeric precedence instead, but I like partial-order precedence a lot better, so that’s what you get in this implementation. (I mean, it could be extended to have the option of doing either, or maybe both! But that’s more work than I currently want to bother with.)

The symbol pieces of the operator name are blacklisted from the symbol parser. So if you add the operator '<>=<_>>#>_<$+¢>_ (channeling Haskell), the symbols '<>=<, '>>#>, and '<$+¢> would be unreadable by the readtable. This prevents ambiguous parses where you have a parse with the operator as an operator and a parse where you just have the operator name as a symbol in a list of forms.

(set-chido-readtable-symbol-result-transformer crt transformer)
→ chido-readtable?
crt : chido-readtable?
transformer : (-> syntax? any/c)

Parse results from the built-in symbol parsers are syntax objects by default. By setting a transformer, you can change the result (eg, perhaps you want datums instead of syntax objects).

The default transformer is the identify function.

Turn off or on the built-in symbol parser. Why would you want to turn it off? Mostly to re-use the declarative operator precedence and associativity support built into chido-readtables for alternates where you actually don’t want a symbol included.

In particular, the BNF DSL is implemented on top of readtables that (generally) have symbol support turned off.

Predicate for whether symbol support is on or off.

Produce a readtable where certain symbols are disallowed. In other words, the built-in symbol parser has a filter on it that checks whether the symbols read are in the blacklist.

Get the name of the readtable.

Set the name of the chido-readtable (functionally).

TODO - other APIs or parsers?

2.6 BNF DSLđź”—â„ą

Frankly, the BNF DSL is a little rougher than the rest of the system. It’s more likely to see some minor changes than the rest of the system.

(define-bnf-arm arm-name top-level-optional ... arm-alt-spec ...)
top-level-optional = #:layout layout-flag | #:layout-parsers layout-parsers #:ignore-arm-name? ignore-arm-name? #:result/stx result/stx #:result/bare result/bare layout-flag = (quoterequired) (quoteoptional) (quotenone) bnf-arm-alt-spec = binding-sequence-elem ... arm-alt-optional ... arm-alt-optional = #:name name #:associativity associativity #:precedence-less-than plt #:precedence-greater-than pgt #:result/stx result/stx #:result/bare result/bare

Construct a chido-readtable? that acts like one arm of a BNF grammar. The result has chido-readtable-symbol-support? off. If layout-flag is (quote required), layout parsers are automatically inserted between parser elements. If layout-flag is (quote optional), layout-or-epsilon parsers are automatically inserted between parser elements. The default layout-flag is ’optional.

In reality, a parser is always inserted between, but it checks a flag stored on the readtable...

The layout-parsers argument should be a list of parsers, and if unspecified defaults to the list (quote (" " "\t" "\r" "\n")). Note that these are simply layout parsers on the resulting readtable, so this list can be extended later.

The result/stx and result/bare arguments act as in the various combinators, but both a global default for the arm and per-alt overrides can be supplied. If ignore-arm-name? is false, the name of the arm is prepended to the result list by the default result constructior.

Each alternative in the “arm” is essentially a binding-sequence, but with different whole-sequence options. In particular, if the leftmost or rightmost sequence members are arm-name, the alternate is determined to be an operator, and the associativity and precedence arguments are used.

TODO - example.

(readtable-extend-as-bnf-arm rt option ... bnf-arm-alt-spec ...)
option = #:arm-name arm-name | #:ignore-arm-name? ignore-arm-name? #:result/stx result/stx #:result/bare result/bare

Extend rt in the manner of define-bnf-arm. Each bnf-arm-alt-spec is as defined for define-bnf-arm.

TODO - example.

(define-bnf name global-options ... [arm-name bnf-arm-option ... bnf-arm-alt-spec ...] ...)
global-options = #:layout-parsers layout-parsers | #:layout global-layout-flag bnf-arm-option = #:main-arm main-arm #:layout layout-flag-for-arm #:ignore-arm-name? ignore-arm-name #:result/stx result/stx #:result/bare result/bare

Define a bnf-parser? (which is also a parser?) using a BNF grammar. When the result parser is used directly, the main-arm is the parser that is actually used. The main-arm defaults to the top arm. Note that if you use the parser directly it does NOT parse surrounding layout. If you want a parser that accepts leading and trailing layout, use bnf-parser->with-surrounding-layout?

The bnf-arm-alt-specs you can use in each arm are the same as in define-bnf-arm. The layout flag can be set globally but overridden per-arm, with values of 'required, 'optional, or 'none as per define-bnf-arm. The layout-parsers again default to '(" " "\t" "\r" "\n")?

Each arm is a chido-readtable, as built with define-bnf-arm, and they can be accessed with bnf-parser->arm-parser.

TODO - example.

Predicate for parsers defined with define-bnf.

Returns a parser like p but that allows leading/trailing layout.

Returns the parser for the given arm of the BNF. It’s a chido-readtable, though probably not one that’s necessarily like you might make directly (except through define-bnf-arm).

2.7 TODOđź”—â„ą

TODO - document these

3 Code and Licenseđź”—â„ą

The code is availableon github.

This library is distributed under the MIT license and the Apache version 2.0 license, at your option.