Issue 8885: markupbase declaration errors aren't recoverable (original) (raw)

Created on 2010-06-03 11:14 by mnot, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
testcase_8885.py	mnot,2010-06-11 01:48	test case

Messages (12)
msg106938 - (view)	Author: Mark Nottingham (mnot)	Date: 2010-06-03 11:14
In markupbase.py's ParserBase.parse_declaration, an unexpected character is caught like this: else: self.error( "unexpected %r char in declaration" % rawdata[j]) However, the position (j) isn't updated, which means that error() will be called again once it returns. For example, this declaration: (which I think is generated by MS Office) will trigger this behaviour. Two possible resolutions: 1) increment J and try the next character in this case 2) document that error() is not recoverable; i.e., it MUST raise an exception. My preference is strongly for #1 (as HTML parsing should be forgiving, and HTMLParser is based upon markerbase).
msg106996 - (view)	Author: Mark Nottingham (mnot)	Date: 2010-06-03 22:39
Just to be clear -- if error() returns, it will cause an infinite loop.
msg107109 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-06-04 23:11
Neither markerbase nor markupbase are in the list of 2.6 stdlib modules at http://docs.python.org/modindex.html even with all packages [+] listings expanded to [-]. So I have to guess this is a third party module. If so, please close and report to its authors, not here.
msg107114 - (view)	Author: Mark Nottingham (mnot)	Date: 2010-06-05 00:18
http://svn.python.org/view/python/trunk/Lib/markupbase.py?view=log
msg107457 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-06-10 12:02
"This module is used as a foundation for the HTMLParser and sgmllib modules (indirectly, for htmllib as well). It has no documented public API and should not be used directly." So, #2 is not relevant unless you are talking about a docstring update or comment in ParserBase. Do you have a test case using one of the consumer modules that demonstrates a bug? markupbase has no test suite of its own (which probably should be fixed someday :)
msg107518 - (view)	Author: Mark Nottingham (mnot)	Date: 2010-06-11 01:45
I'm using it from HTMLParser; try to parse a document with the DTD given when error is something like: def error(self, msg): self.errors += 1 and it will loop.
msg107519 - (view)	Author: Mark Nottingham (mnot)	Date: 2010-06-11 01:48
Attaching test case.
msg124525 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-12-23 00:59
I verified the looping behavior of the testcase in both 2.7.1 and, with minor mods, 3.1.3 and 3.2b1, so this is a valid issue. The HTMLParcer docs (2.7, 3.2) do not mention the .error method. The default is def error(self, message): raise HTMLParseError(message, self.getpos()) If this is not intended to be part of the api and over-ridden, the name should be changed to ._error and .error deprecated. If it is, it should be documented. I think the self.error call should be followed either by j+=1 so parsing continues with the next char or by a raise statememt so it is definitely stopped.
msg158786 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2012-04-20 00:30
HTMLParser shouldn't raise errors anymore, so the "error" method (and probably the HTMLParseError exception too) should be deprecated along with the non-strict mode on 3.3.
msg158789 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2012-04-20 00:43
s/non-strict/strict/
msg158836 - (view)	Author: Mark Nottingham (mnot)	Date: 2012-04-20 15:17
Why remove 2.7? It'd be an easy bug fix if j is incremented.
msg158853 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2012-04-20 17:22
Because even on 2.7 the parser is now able to handle broken markup, so "error" won't be called anymore.

History
Date	User	Action	Args
2022-04-11 14:57:01	admin	set	github: 53131
2012-06-20 19:27:39	ezio.melotti	set	status: open -> closedresolution: out of datestage: needs patch -> resolved
2012-04-20 17:22:19	ezio.melotti	set	messages: +
2012-04-20 15:17:51	mnot	set	messages: +
2012-04-20 00:43:19	ezio.melotti	set	messages: +
2012-04-20 00:30:15	ezio.melotti	set	versions: + Python 3.3, - Python 3.1, Python 2.7, Python 3.2nosy: + ezio.melottimessages: + assignee: ezio.melottitype: behavior -> enhancement
2010-12-23 00:59:27	terry.reedy	set	nosy:terry.reedy, mnot, eric.araujo, r.david.murraymessages: + versions: + Python 3.1, Python 3.2
2010-12-22 08:54:56	eric.araujo	set	nosy: + eric.araujotitle: markerbase declaration errors aren't recoverable -> markupbase declaration errors aren't recoverableversions: + Python 2.7, - Python 2.6resolution: not a bug -> (no value)stage: needs patch
2010-06-11 01:48:58	mnot	set	files: + testcase_8885.pymessages: +
2010-06-11 01:45:45	mnot	set	messages: +
2010-06-10 12:02:41	r.david.murray	set	nosy: + r.david.murraymessages: +
2010-06-05 00🔞15	mnot	set	status: pending -> openmessages: +
2010-06-04 23:11:27	terry.reedy	set	status: open -> pendingnosy: + terry.reedymessages: + resolution: not a bug
2010-06-03 22:39:45	mnot	set	messages: +
2010-06-03 11:14:09	mnot	create