[Python-Dev] Misc re.match() complaint (original) (raw)
Guido van Rossum guido at python.org
Tue Jul 16 04:20:29 CEST 2013
- Previous message: [Python-Dev] Misc re.match() complaint
- Next message: [Python-Dev] Misc re.match() complaint
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Jul 15, 2013 at 7:03 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
Guido van Rossum writes:
> And I still think that any return type for group() except bytes or str > is wrong. (Except possibly a subclass of these.) I'm not sure I understand. Do you mean in the context of the match object API, where constructing "(target, match.start(), match.end())" to get a group-like object that refers to the target rather than copying the text is simple? (Such objects are very useful in the restricted application of constructing a programmable text editor.)
I'm not sure I understand you. :-(
The group() method on the match object returned by re.match() and re.search() returns a string-ish object representing the matched substring. (I'm using "string-ish" to allow for both unicode and bytes, which are exactly the two matching modes supported be the re module.) In most contexts (text editors excluded) the program will use this string just as it would use any other string, perhaps using it to open a file, perhaps as a key into some cache, and so on.
I can clearly see the reasons why you want the target string to allow other types besides str and bytes, in particular other things that are known to represent sequences of bytes, such as bytearray and memoryview. These reasons primarily have to do with optimizing the representation of the target string in case it takes up a large amount of memory, or other situations where we'd like to reduce the number of times each byte is copied before we see it.
But I don't see as much of a use case for group() returning an object of the same type as the target string. In particular in the case of a target string that is a bytearray, group() has to copy the bytes regardless of whether it creates a bytes or a bytearray instance. And I do see a use case for group() returning an immutable object.
Or is this something deeper, that a group is a new object in principle?
No, I just think of it as returning "a string" and I think it's most useful if that is always an immutable object, even if the target string is some other bytes buffer.
FWIW, it feels as if the change in behavior is probably just due to how slices work.
-- --Guido van Rossum (python.org/~guido)
- Previous message: [Python-Dev] Misc re.match() complaint
- Next message: [Python-Dev] Misc re.match() complaint
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]