[Python-Dev] Difference in RE between 3.2 and 3.3 (or Aaron Swartz memorial) (original) (raw)

Xavier Morel catch-all at masklinn.net
Thu Mar 7 11:31:03 CET 2013


On 2013-03-07, at 11:08 , Matej Cepl wrote:

On 2013-03-06, 18:34 GMT, Victor Stinner wrote:

In short, Unicode was rewritten in Python 3.3 for the PEP 393. It's not surprising that minor details like singleton differ. You should not use "is" to compare strings in Python, or your program will fail on other Python implementations (like PyPy, IronPython, or Jython) or even on a different CPython version. I am sorry, I don't understand what you are saying. Even though this has been changed to https://github.com/mcepl/html2text/blob/fixtests/html2text.py#L90 the tests still fail. But, Amaury is right: the function doesn't make much sense. However, ... when I have “fixed it” from https://github.com/mcepl/html2text/blob/master/html2text.py#L95 def onlywhite(line): """Return true if the line does only consist of whitespace characters.""" for c in line: if c is not ' ' and c is not ' ': return c is ' ' return line to https://github.com/mcepl/html2text/blob/fixtests/html2text.py#L90 def onlywhite(line): """Return true if the line does only consist of whitespace characters.""" for c in line: if c != ' ' and c != ' ': return c == ' ' return line

The second test looks like some kind of corruption, it's supposedly iterating on the characters of a line yet testing for two spaces? Is it possible that the original was a literal tab embedded in the source code (instead of '\t') and that got broken at some point?

According to its name + docstring, the implementation of this method should really be replaced by return line and line.isspace() (the first part being to handle the case of an empty line: in the current implementation the line will be returned directly if no whitespace is found, which will be "negative" for an empty line, and ''.isspace() -> false). Does that fix the failing tests?



More information about the Python-Dev mailing list