[Python-Dev] why we have both re.match and re.string? (original) (raw)

Steven D'Aprano steve at pearwood.info
Wed Feb 10 18:05:51 EST 2016

Previous message (by thread): [Python-Dev] why we have both re.match and re.string?
Next message (by thread): [Python-Dev] PEP 515: Underscores in Numeric Literals
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Feb 10, 2016 at 10:59:18PM +0100, Luca Sangiacomo wrote:

Hi, I hope the question is not too silly, but why I would like to understand the advantages of having both re.match() and re.search(). Wouldn't be more clear to have just one function with one additional parameters like this:

re.search(regexp, text, frombeginning=True|False) ?

I guess the most important reason now is backwards compatibility. The oldest Python I have installed here is version 1.5, and it has the brand new "re" module (intended as a replacement for the old "regex" module). Both have search() and match() top-level functions. So my guess is that you would have to track down the author of the original "regex" module.

But a more general answer is the principle, "Functions shouldn't take constant bool arguments". It is an API design principle which (if I remember correctly) Guido has stated a number of times. Functions should not take a boolean argument which (1) exists only to select between two different modes and (2) are nearly always given as a constant.

Do you ever find yourself writing code like this?

if some_calculation(): result = re.match(regex, string) else: result = re.search(regex, string)

If you do, that would be a hint that perhaps match() and search() should be combined so you can write:

result = re.search(regex, string, some_calculation())

But I expect that you almost never do. I would expect that if we combined the two functions into one, we would nearly always call them with a constant bool:

I always forget whether True means match from the start or not,

and which is the default...

result = re.search(regex, string, False)

which suggests that search() is actually two different functions, and should be split into two, just as we have now.

It's a general principle, not a law of nature, so you may find exceptions in the standard library. But if I were designing the re module from scratch, I would either keep the two distinct functions, or just provide search() and let users use ^ to anchor the search to the beginning.

In this way we prevent, as written in the documentation, people writing ".*" in front of the regexp used with re.match()

I only see one example that does that:

https://docs.python.org/3/library/re.html#checking-for-a-pair

Perhaps it should be changed.

-- Steve

Previous message (by thread): [Python-Dev] why we have both re.match and re.string?
Next message (by thread): [Python-Dev] PEP 515: Underscores in Numeric Literals
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list