msg187414 - (view) |
Author: Baptiste Mispelon (bmispelon) * |
Date: 2013-04-20 10:58 |
When trying to parse the string `a&b`, the parser raises an UnboundLocalError: {{{ >>> from html.parser import HTMLParser >>> p = HTMLParser() >>> p.feed('a&b') >>> p.close() Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.3/html/parser.py", line 149, in close self.goahead(1) File "/usr/lib/python3.3/html/parser.py", line 252, in goahead if k <= i: UnboundLocalError: local variable 'k' referenced before assignment }}} Granted, the HTML is invalid, but this error looks like it might have been an oversight. |
|
|
msg187416 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-04-20 11:43 |
Thanks for the report. Yes, that's in a complicated bit of error recovery code, and clearly you found a path through it that doesn't have a corresponding test :) |
|
|
msg187582 - (view) |
Author: Thomas Barlow (Thomas.Barlow) * |
Date: 2013-04-22 19:26 |
Just adding a patch here with a few unit tests to demonstrate the issue, comments here are welcome. This is my first patch, I believe I have put the tests in the correct place. It appears the problem only occurs if there is an incomplete XML entity where a sequence of valid characters (for an XML entity's name) lead to the end-of-file. The test case for "a&b " passes, as it detects the space as an illegal character for the entity name. |
|
|
msg187608 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2013-04-23 05:32 |
Thanks for the patch Thomas! Starting from your work I made an updated patch that fixes the bug, but at the same time the tests revealed another possible issue. In case of invalid character references, HTMLParser still calls handle_entityref instead of reporting them as 'data'. Not sure what the preferable behavior should be though, but anyway this is a separate issue. |
|
|
msg188222 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-05-01 13:20 |
New changeset 9cb90c1a1a46 by Ezio Melotti in branch '3.3': #17802: Fix an UnboundLocalError in html.parser. Initial tests by Thomas Barlow. http://hg.python.org/cpython/rev/9cb90c1a1a46 New changeset 20be90a3a714 by Ezio Melotti in branch 'default': #17802: merge with 3.3. http://hg.python.org/cpython/rev/20be90a3a714 |
|
|
msg188224 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2013-05-01 13:25 |
Fixed, thanks for the report! |
|
|