[Python-Dev] Re: pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching (original) (raw)

Mike Coleman mkc at mathdogs.com
Fri Aug 13 03:43:01 CEST 2004


Erik Heneryd <erik at heneryd.com> writes:

Well, I guess that if you want structmatch into the stdlib you'll have to show that it's better than it's alternatives. Including those parser packages.

As a practical matter, this may well be the case. I look at it kind of like

'structmatch' is to (a grammar parser) 
as 
'sed' is to (a full scripting language)

That is, it has its niche, but it certainly not as general or powerful as a full parsing package. I'm not sure, either, exactly how to show that it's better. That said, I use sed all the time, not because it's a better than full scripting languages, but because it nicely fits the problem I'm addressing better.

You'd still have to do a real implementation. If it can't be done without rewriting a whole lot of code, that would be a problem.

I agree. Being busy and lazy, I'm trying to get a bead first on whether this would be a wasted effort.

Hmm... think this is the wrong approach. Your PEP is not just about "structured matching", it tries to deal with a couple of issues and I think it would be better to address them separately, one by one:

* Parsing/scanning - this is mostly what's been discussed so far... * Capturing repeated groups - IMO nice-to-have (tm) but not something I would lose sleep over. Hard to do. * Partial matches - would be great for debugging more complex regexes. Why not a general re.RAISE flag raising an exception on failure?

This is true. For me, the second is fundamental--it's why I'm bothering with this. The third is a useful add-on, and as you suggest could probably be added orthogonally to several of the existing methods.

The first I'm not sure about. I don't think re.structmatch does scanning--that's not really the problem it tries to solve. As for "parsing", I guess it depends on what you mean by that.

Certainly it would be possible to address the "repeated groups" point without the whole structured return value thing, but I'm not seeing what would be better.

Mike



More information about the Python-Dev mailing list