Issue 8885: markupbase declaration errors aren't recoverable (original) (raw)

Created on 2010-06-03 11:14 by mnot, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
testcase_8885.py mnot,2010-06-11 01:48 test case
Messages (12)
msg106938 - (view) Author: Mark Nottingham (mnot) Date: 2010-06-03 11:14
In markupbase.py's ParserBase.parse_declaration, an unexpected character is caught like this: else: self.error( "unexpected %r char in declaration" % rawdata[j]) However, the position (j) isn't updated, which means that error() will be called again once it returns. For example, this declaration: (which I think is generated by MS Office) will trigger this behaviour. Two possible resolutions: 1) increment J and try the next character in this case 2) document that error() is not recoverable; i.e., it MUST raise an exception. My preference is strongly for #1 (as HTML parsing should be forgiving, and HTMLParser is based upon markerbase).
msg106996 - (view) Author: Mark Nottingham (mnot) Date: 2010-06-03 22:39
Just to be clear -- if error() returns, it will cause an infinite loop.
msg107109 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-04 23:11
Neither markerbase nor markupbase are in the list of 2.6 stdlib modules at http://docs.python.org/modindex.html even with all packages [+] listings expanded to [-]. So I have to guess this is a third party module. If so, please close and report to *its* authors, not here.
msg107114 - (view) Author: Mark Nottingham (mnot) Date: 2010-06-05 00:18
http://svn.python.org/view/python/trunk/Lib/markupbase.py?view=log
msg107457 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-10 12:02
"This module is used as a foundation for the HTMLParser and sgmllib modules (indirectly, for htmllib as well). It has no documented public API and should not be used directly." So, #2 is not relevant unless you are talking about a docstring update or comment in ParserBase. Do you have a test case using one of the consumer modules that demonstrates a bug? markupbase has no test suite of its own (which probably should be fixed someday :)
msg107518 - (view) Author: Mark Nottingham (mnot) Date: 2010-06-11 01:45
I'm using it from HTMLParser; try to parse a document with the DTD given when error is something like: def error(self, msg): self.errors += 1 and it will loop.
msg107519 - (view) Author: Mark Nottingham (mnot) Date: 2010-06-11 01:48
Attaching test case.
msg124525 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-12-23 00:59
I verified the looping behavior of the testcase in both 2.7.1 and, with minor mods, 3.1.3 and 3.2b1, so this is a valid issue. The HTMLParcer docs (2.7, 3.2) do not mention the .error method. The default is def error(self, message): raise HTMLParseError(message, self.getpos()) If this is *not* intended to be part of the api and over-ridden, the name should be changed to ._error and .error deprecated. If it is, it should be documented. I think the self.error call should be followed either by j+=1 so parsing continues with the next char or by a raise statememt so it is definitely stopped.
msg158786 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-04-20 00:30
HTMLParser shouldn't raise errors anymore, so the "error" method (and probably the HTMLParseError exception too) should be deprecated along with the non-strict mode on 3.3.
msg158789 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-04-20 00:43
s/non-strict/strict/
msg158836 - (view) Author: Mark Nottingham (mnot) Date: 2012-04-20 15:17
Why remove 2.7? It'd be an easy bug fix if j is incremented.
msg158853 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-04-20 17:22
Because even on 2.7 the parser is now able to handle broken markup, so "error" won't be called anymore.
History
Date User Action Args
2022-04-11 14:57:01 admin set github: 53131
2012-06-20 19:27:39 ezio.melotti set status: open -> closedresolution: out of datestage: needs patch -> resolved
2012-04-20 17:22:19 ezio.melotti set messages: +
2012-04-20 15:17:51 mnot set messages: +
2012-04-20 00:43:19 ezio.melotti set messages: +
2012-04-20 00:30:15 ezio.melotti set versions: + Python 3.3, - Python 3.1, Python 2.7, Python 3.2nosy: + ezio.melottimessages: + assignee: ezio.melottitype: behavior -> enhancement
2010-12-23 00:59:27 terry.reedy set nosy:terry.reedy, mnot, eric.araujo, r.david.murraymessages: + versions: + Python 3.1, Python 3.2
2010-12-22 08:54:56 eric.araujo set nosy: + eric.araujotitle: markerbase declaration errors aren't recoverable -> markupbase declaration errors aren't recoverableversions: + Python 2.7, - Python 2.6resolution: not a bug -> (no value)stage: needs patch
2010-06-11 01:48:58 mnot set files: + testcase_8885.pymessages: +
2010-06-11 01:45:45 mnot set messages: +
2010-06-10 12:02:41 r.david.murray set nosy: + r.david.murraymessages: +
2010-06-05 00🔞15 mnot set status: pending -> openmessages: +
2010-06-04 23:11:27 terry.reedy set status: open -> pendingnosy: + terry.reedymessages: + resolution: not a bug
2010-06-03 22:39:45 mnot set messages: +
2010-06-03 11:14:09 mnot create