Issue 12008: HtmlParser non-strict goes wrong with unquoted attributes (original) (raw)

Issue12008

Created on 2011-05-05 10:47 by svilend, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
html.parser.diff	svilend,2011-05-05 10:47	patch to limit nonstruct regexp's span
test-htmlparser-attrs.py	svilend,2011-05-05 10:48	standalone test

Messages (5)
msg135182 - (view)	Author: svilen dobrev (svilend)	Date: 2011-05-05 10:47
nonstrict mode seems to eat too much into data and gets past endpos of the chunk processed, and parser gets confused and treats any subsequent stuff as data. i didn't think out how to fix the regexp as such, but instead limited its span to :endpos so it doesnot eat too much. seems to happen with unquoted attributes.
msg135183 - (view)	Author: svilen dobrev (svilend)	Date: 2011-05-05 10:51
(the nonstrict regexp came with Issue1046092)
msg143472 - (view)	Author: Piet van Oostrum (pietvo)	Date: 2011-09-03 19:23
I was bitten by this bug today. Hope it will be solved in the next release of Python 3. It is also possible to use the third argument of search in line 285: m = attrfind_tolerant.search(rawdata, k, endpos) This seems to me to be a more `natural' solution.
msg146772 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-11-01 12:44
New changeset 6107a84e3c44 by Ezio Melotti in branch '3.2': #12008: add a test. http://hg.python.org/cpython/rev/6107a84e3c44 New changeset 495b31a8b280 by Ezio Melotti in branch 'default': #12008: merge with 3.2. http://hg.python.org/cpython/rev/495b31a8b280
msg146773 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2011-11-01 12:46
This seems to be already fixed in 3.2/3.3, so I extracted the test from your script and added to the test suite. If you can find a way to break the parser let me know.

History
Date	User	Action	Args
2022-04-11 14:57:16	admin	set	github: 56217
2011-11-01 12:46:40	ezio.melotti	set	status: open -> closedassignee: ezio.melottinosy: + ezio.melottimessages: + resolution: out of datestage: resolved
2011-11-01 12:44:12	python-dev	set	nosy: + python-devmessages: +
2011-09-03 19:23:14	pietvo	set	nosy: + pietvomessages: +
2011-05-06 17:04:48	eric.araujo	set	nosy: + eric.araujo, r.david.murrayversions: + Python 3.3
2011-05-05 10:51:48	svilend	set	messages: +
2011-05-05 10:48:12	svilend	set	files: + test-htmlparser-attrs.pytype: behaviorcomponents: + Library (Lib)versions: + Python 3.2
2011-05-05 10:47:27	svilend	create