[Python-Dev] Re: Discordance in documentation... (original) (raw)

Terry Reedy tjreedy at udel.edu
Thu Sep 4 21:43:31 EDT 2003


"gminick" <gminick at hacker.pl> wrote in message news:20030904200448.GA1164 at hannibal...

...or is this just me?

Let's take a look, Reference Lib, 4.2.1 Regular Expression Syntax says:

It is the Library Reference, not Ref Lib.

"|" A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. [...] REs separated by "|" are tried from left to right, and the first one that allows the complete pattern to match is considered the accepted branch. This means that if A matches, B will never be tested, even if it would produce a longer overall match. [...]

I think the following version of the last four lines is correct and clearer.

As the target string is scanned, REs separated by "|" are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match.

And now a little test:

import re a = "Fuentes Rushdie Marquez" print re.search("Rushdie|Fuentes", a).group() # returns "Fuentes" According to the documentation I suspected it will return "Rushdie" rather than "Fuentes", but it looks like it returns first part of the string that matches rather than first part of regular expression.

As I hope my rewrite makes clearer, consideration of alternatives is nested within scanning of the source, and not vice verse as you inferred from the current doc.

Doc bugs like this can (and should be) reported on SourceForge just like program bugs. http://sourceforge.net/tracker/?group_id=5470 That way, the report stays on the open list until someone decides either to make a fix or that no fix is needed.

Terry J. Reedy



More information about the Python-Dev mailing list