Issue 7114: HTMLParser doesn't handle (original) (raw)
Issue7114
Created on 2009-10-12 21:32 by ggbaker, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (4) | ||
---|---|---|
msg93905 - (view) | Author: Greg Baker (ggbaker) | Date: 2009-10-12 21:32 |
I believe what I'm seeing here is somewhat related to issue 670664, but is easier to handle because of the CDATA structure. Basically, HTMLParser doesn't recognize CDATA sections at all, so their content is incorrectly parsed like normal data. The following is an attempt to parse (a snippet of) valid XHTML, but it raises an HTMLParseError. data = """""" from HTMLParser import HTMLParser parser = HTMLParser() parser.feed(data) | ||
msg96164 - (view) | Author: Denis (Denis) | Date: 2009-12-09 01:16 |
The CDATA sections are part of XML specification. http://www.w3.org/TR/REC-xml/#sec-cdata-sect HTML is not XML, so HTMLParser does the right thing here. | ||
msg99604 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2010-02-19 23:50 |
There's no bug here, afaict. | ||
msg100852 - (view) | Author: Fredrik Lundh (effbot) * ![]() |
Date: 2010-03-11 13:56 |
And to clarify, XHTML is an reformulation of HTML4 using XML syntax, so you should use an XML parser to parse it, not an HTML parser. The formats are related, but not identical. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:53 | admin | set | github: 51363 |
2010-03-11 13:56:58 | effbot | set | messages: + |
2010-02-26 11:23:41 | flox | set | status: pending -> closed |
2010-02-20 00:57:36 | flox | set | status: open -> pending |
2010-02-20 00:57:13 | flox | set | status: pending -> openassignee: effbot -> |
2010-02-19 23:50:06 | flox | set | status: open -> pendingpriority: normalcomponents: + XML, - Library (Lib)assignee: effbotnosy: + effbot, floxmessages: + resolution: not a bugstage: resolved |
2009-12-09 01:16:11 | Denis | set | nosy: + Denismessages: + |
2009-10-12 21:32:50 | ggbaker | create |