[Python-Dev] why we have both re.match and re.string? (original) (raw)
Steven D'Aprano steve at pearwood.info
Wed Feb 10 18:05:51 EST 2016
- Previous message (by thread): [Python-Dev] why we have both re.match and re.string?
- Next message (by thread): [Python-Dev] PEP 515: Underscores in Numeric Literals
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Feb 10, 2016 at 10:59:18PM +0100, Luca Sangiacomo wrote:
Hi, I hope the question is not too silly, but why I would like to understand the advantages of having both re.match() and re.search(). Wouldn't be more clear to have just one function with one additional parameters like this:
re.search(regexp, text, frombeginning=True|False) ?
I guess the most important reason now is backwards compatibility. The oldest Python I have installed here is version 1.5, and it has the brand new "re" module (intended as a replacement for the old "regex" module). Both have search() and match() top-level functions. So my guess is that you would have to track down the author of the original "regex" module.
But a more general answer is the principle, "Functions shouldn't take constant bool arguments". It is an API design principle which (if I remember correctly) Guido has stated a number of times. Functions should not take a boolean argument which (1) exists only to select between two different modes and (2) are nearly always given as a constant.
Do you ever find yourself writing code like this?
if some_calculation(): result = re.match(regex, string) else: result = re.search(regex, string)
If you do, that would be a hint that perhaps match() and search() should be combined so you can write:
result = re.search(regex, string, some_calculation())
But I expect that you almost never do. I would expect that if we combined the two functions into one, we would nearly always call them with a constant bool:
I always forget whether True means match from the start or not,
and which is the default...
result = re.search(regex, string, False)
which suggests that search() is actually two different functions, and should be split into two, just as we have now.
It's a general principle, not a law of nature, so you may find exceptions in the standard library. But if I were designing the re module from scratch, I would either keep the two distinct functions, or just provide search() and let users use ^ to anchor the search to the beginning.
In this way we prevent, as written in the documentation, people writing ".*" in front of the regexp used with re.match()
I only see one example that does that:
https://docs.python.org/3/library/re.html#checking-for-a-pair
Perhaps it should be changed.
-- Steve
- Previous message (by thread): [Python-Dev] why we have both re.match and re.string?
- Next message (by thread): [Python-Dev] PEP 515: Underscores in Numeric Literals
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]