[Python-Dev] what is happening with the regex module going into Python 3.3? (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Sat Jun 2 07:14:02 CEST 2012


On Sat, Jun 2, 2012 at 10:37 AM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:

On 01/06/2012 18:27, Brett Cannon wrote:

About the only thing I can think of from the language summit that we discussed doing for Python 3.3 that has not come about is accepting the regex module and getting it into the stdlib. Is this still being worked towards? Umpteen versions of regex have been available on pypi for years. Umpteen bugs against the original re module have been fixed.  If regex can't now go into the standard library, what on earth can?

That's why it's approved in principle already. However, it's not a simple matter of dropping something into the standard library and calling it done, especially an extension module as complex as regex. Even integrating a simple pure Python module like ipaddr took substantial effort:

  1. The API had to be reviewed to see if it was suitable for someone that was not familiar with the problem domain, but was instead learning about it from the standard library documentation. This isn't a big concern for regex, since it is replacing the existing re module, but this is the main reason ipaddr became ipaddress before PEP 3144 was approved (The ipaddr API plays fast and loose with network terminology in a way that someone that already knows that terminology can easily grasp, but would have been incredibly confusing to someone that is discovering those terms for the first time).

  2. The code had to actually be added to the standard library (not a big effort for PEP 3144 - saving ipaddress.py into Lib/ and test_ipaddress.py into Lib/test/ pretty much covered it)

  3. Redundant 2.x cruft needed to be weeded out (ongoing)

  4. The howto guide needed to be incorporated into the documentation (and rewritten to be more suitable for genuine beginners)

  5. An API module reference still needs to be incorporated into the standard library reference

The effort to integrate regex is going to be substantially higher, since it's a significantly more complicated module:

  1. A new, non-trivial C extension needs to be incorporated into both the autotools and Windows build processes
  2. Due to PEP 393, there's a major change to the string implementation in 3.3. Does regex still build against that? Even if it builds, it should probably be ported to the new API for performance reasons.
  3. Does regex build cleanly on all platforms supported by CPython? If not, do we need to keep the existing re module around as a fallback mechanism?
  4. How do we merge the test suites? Do we keep the existing test suite, add the regex test suite, then filter for duplication afterwards?
  5. What, precisely, are the backwards incompatibilities between regex and re? Does the standard library trigger any of them? Does the test suite?
  6. How will the PyPI backport be maintained in the future? The amount of backwards compatibility cruft in standard library code should be minimised, but that potentially makes backports more difficult.

ipaddress is in the 3.3 standard library because Peter Moody cared enough about the concept to initially submit it for inclusion, and because I volunteered to drive the review and integration process forward and to be the final arbiter of what counted as "good enough" for inclusion. That hasn't happened yet for regex - either nobody has cared enough to write a PEP for it, or the bystander effect has kicked in and everyone that cares is assuming someone else will take up the burden of being the PEP champion.

So that's the first step: someone needs to take http://bugs.python.org/issue2636 and turn it into a PEP (searching the python-dev and python-ideas archives for references to previous discussions of the topic would also be good, along with summarising the open Unicode related re bugs reported by Tom Christensen where the answer is currently "use regex from PyPI instead of the standard library's re module" [1]).

[1] http://bugs.python.org/issue?%40search_text=&ignore=file%3Acontent&title=&%40columns=title&id=&%40columns=id&stage=&creation=&creator=tchrist&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40pagesize=50&%40startwith=0&%40queryname=&%40old-queryname=&%40action=search

Cheers, Nick.

-- Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-Dev mailing list