[Python-Dev] Misc re.match() complaint (original) (raw)

Guido van Rossum guido at python.org
Tue Jul 16 19:21:36 CEST 2013


On Tue, Jul 16, 2013 at 12:55 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

Is there a strong enough use case to change it? I can't say the current behaviour seems very useful either, but some people may depend on it.

This is the crucial question. I personally see the current behavior as an artifact of the (lack of) design process, not as a conscious decision. Given that we also have m.string, m.start(grp) and m.end(grp), those who need something matching the original type (or even something that is known to be a reference into the original object) can use that API; for most use cases, all you care about is is the selected group as a string, and it is more useful if that is always an immutable string (bytes or str).

The situation is most egregious if the target string is a bytearray, where there is currently no way to get the result as an immutable bytes object without an extra copy. (There's no API that lets you create a bytes object directly from a slice of a bytearray.)

In terms of backwards compatibility, I wouldn't want to do this in a bugfix release, but for a feature release I think it's fine -- the number of applications that could be bitten by this must be extremely small (and the work-around is backward-compatible: just use m.string[m.start() : m.stop()]).

I already find it a bit weird that you're passing a bytearray or memoryview to re.match(), to be honest :-)

Yes, this is somewhat of an odd corner, but actually most built-in APIs taking bytes also take anything else that can be coerced to bytes (io.open() seems to be the exception, and it feels like an accident -- os.open() does accept bytearray and friends). This is quite useful for code that interacts with C code or system calls -- often you have a large buffer shared between C and Python code for efficiency, and being able to do pretty much anything to the buffer that you can do to a bytes object (apart from using it as a dict key) helps a lot.

-- --Guido van Rossum (python.org/~guido)



More information about the Python-Dev mailing list