[Python-Dev] Should we move to replace re with regex? (original) (raw)
Guido van Rossum guido at python.org
Sun Aug 28 05:54:13 CEST 2011
- Previous message: [Python-Dev] Should we move to replace re with regex?
- Next message: [Python-Dev] Should we move to replace re with regex?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Aug 27, 2011 at 5:48 PM, Terry Reedy <tjreedy at udel.edu> wrote:
Many of the things regex does differently might be called either bug fixes or feature changes, depending on one's viewpoint. Regex should definitely not be 'bug-compatible'.
Well, as you said, it depends on one's viewpoint. If there's a bug in the treatment of non-BMP character ranges, that's a bug, and fixing it shouldn't break anybody's code (unless it was worth breaking :-). But if there's a change that e.g. (hypothetical example) makes a different choice about how empty matches are treated in some edge case, and the old behavior was properly documented, that's a feature change, and I'd rather introduce a flag to select the new behavior (or, if we have to, a flag to preserve the old behavior, if the new behavior is really considered much better and much more useful).
I think regex should be unicode-standard compliant as much as possible, and let the chips fall where they may.
In most cases the Unicode improvements in regex are not where it is incompatible; e.g. adding \X and named ranges are fine new additions and IIUC the syntax was carefully designed not to introduce any incompatibilities (within the limitations of -escapes).
It's the many other "improvements" to the regex module that sometimes make it incompatible.There's a comprehensive list here: http://pypi.python.org/pypi/regex . Somebody should just go over it and for each difference make a recommendation for whether to treat this as a bugfix, a compatible new feature, or an incompatibility that requires some kind of flag. (We could have a single flag for all incompatibilities, or several flags.)
If so, it would be like the decimal module, which closely tracks the IEEE decimal standard, rather than the binary float standard.
Well, I would hope that for each "major" Python version (i.e. 3.2, 3.3, 3.4, ...) we would pick a specific version of the Unicode standard and declare our desire to be compliant with that Unicode standard version, and not switch allegiances in some bugfix version (e.g. 3.2.3, 3.3.1, ...).
Regex is already much more compliant than re, as shown by Tom Christiansen.
Nobody disagrees with this or thinks it's a bad thing. :-)
This is pretty obviously intentional on MB's part.
That's also clear.
It is also probably intentional that re not match today's Unicode TR18 specifications.
That I'm not so sure of. I think it's more the case that TR18 evolved and that the re modules didn't -- probably mostly because nobody had the time and nobody was aware of the TR18 changes.
These are reasons why both Ezio and I suggested on the tracker adding regex without deleting re. (I personally would not mind just replacing re with regex, but then I have no legacy re code to break. So I am not suggesting that out of respect for those who do.)
That option is definitely still on the table. At the very least a thorough review of the stated differences between re and regex should be done -- I trust that MR has been very thorough in his listing of those differences. The issues regarding maintenance and stability of MR's code can be solved in a number of ways -- if MR doesn't mind I would certainly be willing to give him core committer access (though I'd still recommend that he use his time primarily to train others in maintaining this important code base).
-- --Guido van Rossum (python.org/~guido)
- Previous message: [Python-Dev] Should we move to replace re with regex?
- Next message: [Python-Dev] Should we move to replace re with regex?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]