[Python-Dev] Re: pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching (original) (raw)

Erik Heneryd erik at heneryd.com
Thu Aug 12 12:44:26 CEST 2004

Previous message: [Python-Dev] Re: pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching
Next message: [Python-Dev] Re: pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Mike Coleman wrote:

"Stephen J. Turnbull" <stephen at xemacs.org> writes:

Re maintenance, yeah regexp is pretty terse and ugly. Generally, though, I'd rather deal with a reasonably well-considered 80 char regexp than 100 lines of code that does the same thing.

ditto

It's not obvious to me how to make grammar rules pretty in Python, but implementing an SLR parser-generator should be at most a couple of days' work to get something you could live with.

There are several of these packages available and I'm all in favor of their use, particularly for problems of sufficient complexity. These are currently hindered, though, by the fact that none have been elected for inclusion in the standard library. Furthermore, since we have 're', and it's not going away, I'd really like to fix this repeated match deficiency, one way or another.

Well, I guess that if you want structmatch into the stdlib you'll have to show that it's better than it's alternatives. Including those parser packages.

BTW, do you have a sample implementation for re.structmatch? Fredrik seems to have some doubt about ease of implementation.

Erik Heneryd posted an intriguing Python prototype, which I'm still looking at.

You'd still have to do a real implementation. If it can't be done without rewriting a whole lot of code, that would be a problem.

My objection is that it throws away a lot of structure, and therefore is liable to return the wrong parse, or simply an error with no hint as to where the data is malformed.

Hmm. Regarding the lack of error diagnosis, I'm not too concerned about this, for the reason I mention above. When 're.structmatch' does fail, though, it returns a "farthest match position", which will usually be of some aid, I would think. Regarding getting the parse wrong, sure, this could happen. Users will have to be judicious. The answer here is to write RE's with some care, realize that some matches may require further checking after being returned by re.structmatch, and further realize that some parsing problems are too complex for this method and require a grammar-based approach instead. This doesn't really seem worse than the situation for re.match, though, to me.

Hmm... think this is the wrong approach. Your PEP is not just about "structured matching", it tries to deal with a couple of issues and I think it would be better to address them separately, one by one:

Parsing/scanning - this is mostly what's been discussed so far...
Capturing repeated groups - IMO nice-to-have (tm) but not something I would lose sleep over. Hard to do.
Partial matches - would be great for debugging more complex regexes. Why not a general re.RAISE flag raising an exception on failure?
... ?

Erik

Previous message: [Python-Dev] Re: pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching
Next message: [Python-Dev] Re: pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list