[Python-Dev] Difference in RE between 3.2 and 3.3 (or Aaron Swartz memorial) (original) (raw)

Victor Stinner victor.stinner at gmail.com
Wed Mar 6 19:34:10 CET 2013


Hi,

In short, Unicode was rewritten in Python 3.3 for the PEP 393. It's not surprising that minor details like singleton differ. You should not use "is" to compare strings in Python, or your program will fail on other Python implementations (like PyPy, IronPython, or Jython) or even on a different CPython version.

Anyway, you spotted a missed optimization: it's now "fixed" in Python 3.3 and 3.4 by the following commits. Copy/paste of the CIA IRC bot:

19:30 < irker555> cpython: Victor Stinner 3.3 * 82517:3dd2fa78fb89 / Objects/unicodeobject.c: _PyUnicode_Writer() now also reuses Unicode singletons: empty string and latin1 single character http://hg.python.org/cpython/rev/3dd2fa78fb89 19:30 < irker032> cpython: Victor Stinner default * 82518:fa59a85b373f / Objects/unicodeobject.c: (Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons: empty string and latin1 single character http://hg.python.org/cpython/rev/fa59a85b373f

Victor

2013/3/6 Amaury Forgeot d'Arc <amauryfa at gmail.com>:

So, in the end, I have went the long way and bisected cpython to find the commit which broke my tests, and it seems that the culprit is http://hg.python.org/cpython/rev/123f2dc08b3e so it is clearly something Unicode related.

Unfortunately, it really doesn't tell me what exactly is broken (is it a known regression) and if there is known workaround. Could anybody suggest a way how to find bugs on http://bugs.python.org related to some particular commit (plain search for 123f2dc0 didn’t find anything). I strongly suspect an incorrect usage of the "is" operator: https://github.com/mcepl/html2text/blob/master/html2text.py#L95 Identity of strings is not guaranteed... Does it change something if you use "==" instead? -- Amaury Forgeot d'Arc



More information about the Python-Dev mailing list