Issue 17802: html.HTMLParser raises UnboundLocalError: (original) (raw)

Created on 2013-04-20 10:58 by bmispelon, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue17802-unittest.patch Thomas.Barlow,2013-04-22 19:26 Patch for unit tests to reproduce issue 17802 review
issue17802.diff ezio.melotti,2013-04-23 05:32 review
Messages (6)
msg187414 - (view) Author: Baptiste Mispelon (bmispelon) * Date: 2013-04-20 10:58
When trying to parse the string `a&b`, the parser raises an UnboundLocalError: {{{ >>> from html.parser import HTMLParser >>> p = HTMLParser() >>> p.feed('a&b') >>> p.close() Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.3/html/parser.py", line 149, in close self.goahead(1) File "/usr/lib/python3.3/html/parser.py", line 252, in goahead if k <= i: UnboundLocalError: local variable 'k' referenced before assignment }}} Granted, the HTML is invalid, but this error looks like it might have been an oversight.
msg187416 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-20 11:43
Thanks for the report. Yes, that's in a complicated bit of error recovery code, and clearly you found a path through it that doesn't have a corresponding test :)
msg187582 - (view) Author: Thomas Barlow (Thomas.Barlow) * Date: 2013-04-22 19:26
Just adding a patch here with a few unit tests to demonstrate the issue, comments here are welcome. This is my first patch, I believe I have put the tests in the correct place. It appears the problem only occurs if there is an incomplete XML entity where a sequence of valid characters (for an XML entity's name) lead to the end-of-file. The test case for "a&b " passes, as it detects the space as an illegal character for the entity name.
msg187608 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-04-23 05:32
Thanks for the patch Thomas! Starting from your work I made an updated patch that fixes the bug, but at the same time the tests revealed another possible issue. In case of invalid character references, HTMLParser still calls handle_entityref instead of reporting them as 'data'. Not sure what the preferable behavior should be though, but anyway this is a separate issue.
msg188222 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-05-01 13:20
New changeset 9cb90c1a1a46 by Ezio Melotti in branch '3.3': #17802: Fix an UnboundLocalError in html.parser. Initial tests by Thomas Barlow. http://hg.python.org/cpython/rev/9cb90c1a1a46 New changeset 20be90a3a714 by Ezio Melotti in branch 'default': #17802: merge with 3.3. http://hg.python.org/cpython/rev/20be90a3a714
msg188224 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-05-01 13:25
Fixed, thanks for the report!
History
Date User Action Args
2022-04-11 14:57:44 admin set github: 62002
2013-05-01 13:25:05 ezio.melotti set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2013-05-01 13:20:15 python-dev set nosy: + python-devmessages: +
2013-04-23 05:33:00 ezio.melotti set files: + issue17802.diffmessages: + stage: needs patch -> patch review
2013-04-22 19:26:41 Thomas.Barlow set files: + issue17802-unittest.patchnosy: + Thomas.Barlowmessages: + keywords: + patch
2013-04-20 11:48:08 ezio.melotti set assignee: ezio.melotti
2013-04-20 11:43:20 r.david.murray set type: crash -> behaviorversions: + Python 3.4keywords: + easynosy: + r.david.murray, ezio.melottimessages: + stage: needs patch
2013-04-20 10:58:16 bmispelon create