[Python-Dev] cpython: #15114: the strict mode of HTMLParser and the HTMLParseError exception are (original) (raw)
Ezio Melotti ezio.melotti at gmail.com
Sat Jun 23 17:20:34 CEST 2012
- Previous message: [Python-Dev] cpython: #15114: the strict mode of HTMLParser and the HTMLParseError exception are
- Next message: [Python-Dev] Empty directory is a namespace?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Jun 23, 2012 at 3:29 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
On Sat, 23 Jun 2012 15:28:00 +0200 ezio.melotti <python-checkins at python.org> wrote:
+ .. deprecated-removed:: 3.3 3.5 + The strict argument and the strict mode have been deprecated. + The parser is now able to accept and parse invalid markup too. + What if people want to accept only valid markup?
The problem with the "strict" mode is that is not really strict. Originally the parser was trying to work around some common errors (e.g. missing quotes around attribute values), but was giving up when other markup errors were encountered. When the non-strict mode was introduced, the old behavior was called "strict" and left unchanged for backward compatibility, even thought it wasn't strict enough to be used for validation and it was happy to parse some broken markup (but not other). At the same time the non-strict mode was able to accept some markup errors but not others, and sometimes parsing valid markup yielded different results in strict and non-strict modes.
Then HTML5 was announced, with specific algorithms to parse both valid and invalid markup, so I improved the non-strict mode to 1) be able to parse everything; 2) try to be as close as the HTML5 standard as possible (I don't claim HTML5 conformance though). Now parsing a valid HTML page should give the same result in strict and non-strict mode, so the strict mode is now only useful if you want HTMLParseErrors for an arbitrary subset of markup errors.
As someone already suggested, I should write a blog post explaining all this, but I'm still working on ironing out the last things in the code, so the blog post has yet to reach the top of my todo list.
Best Regards, Ezio Melotti
Regards
Antoine.
- Previous message: [Python-Dev] cpython: #15114: the strict mode of HTMLParser and the HTMLParseError exception are
- Next message: [Python-Dev] Empty directory is a namespace?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]