[Python-Dev] Re: pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching (original) (raw)

Fredrik Lundh fredrik at pythonware.com
Thu Aug 12 20🔞27 CEST 2004


Mike Coleman wrote:

Re maintenance, yeah regexp is pretty terse and ugly. Generally, though, I'd rather deal with a reasonably well-considered 80 char regexp than 100 lines of code that does the same thing.

well, the examples in your PEP can be written as:

data = [line[:-1].split(":") for line in open(filename)]

and

import ConfigParser

c = ConfigParser.ConfigParser()
c.read(filename)

data = []
for section in c.sections():
    data.append((section, c.items(section)))

both of which are shorter than your structparse examples.

and most of the one-liners in your pre-PEP can be handled with a combination of "match" and "finditer". here's a 16-line helper that parses strings matching the "a(b)*c" pattern into a prefix/list/tail tuple.

import re

def parse(string, pat1, pat2):
    """Parse a string having the form pat1(pat2)*"""
    m = re.match(pat1, string)
    i = m.end()
    a = m.group(1)
    b = []
    for m in re.compile(pat2 + "|.").finditer(string, i):
        try:
            token = m.group(m.lastindex)
        except IndexError:
            break
        b.append(token)
        i = m.end()
    return a, b, string[i:]

parse("hello 1 2 3 4 # 5", "(\w+)", "\s*(\d+)") ('hello', ['1', '2', '3', '4'], ' # 5')

tweak as necessary.



More information about the Python-Dev mailing list