[Python-Dev] pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Mon Aug 9 09:19:31 CEST 2004


"Mike" == Mike Coleman <mkc at mathdogs.com> writes:

Mike> Motivation
Mike> ==========

Mike> A notable limitation of the ``re.match`` method is that it
Mike> fails to capture all group match information for repeatedly
Mike> matched groups.  For example, in a call like this ::

Mike>     m0 = re.match(r'([A-Z]+|[a-z]+)*', 'XxxxYzz')

Mike> one would like to see that the group which matched four
Mike> times matched the strings ``'X'``, ``'xxx'``, ``'Y'`` and
Mike> ``'zz'``.

Sure, but regexp syntax is a horrible way to express that. This feature would be an attractive nuisance, IMHO. For example:

Mike> Parsing ``/etc/group``
Mike> ----------------------

Mike> If ``/etc/group`` contains the following lines ::

Mike>     root:x:0:
Mike>     daemon:x:1:
Mike>     bin:x:2:
Mike>     sys:x:3:

Mike> then it can be parsed as follows ::

Mike>     p = r'((?:(?:^|:)([^:\n]*))*\n)*\Z'

This is a easy one, but even it absolutely requires being written with (?xm) and lots of comments, don't you think? If you're going to be writing a multiline, verbose regular expression, why not write a grammar instead, which (assuming a modicum of library support) will be shorter and self-documenting?

-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.



More information about the Python-Dev mailing list