[Python-Dev] cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /. (original) (raw)

Brian Curtin brian at python.org
Tue Apr 24 21:41:54 CEST 2012

Previous message: [Python-Dev] cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.
Next message: [Python-Dev] cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Apr 24, 2012 at 14:34, Éric Araujo <merwok at netwok.org> wrote:

Le 24/04/2012 15:02, Georg Brandl a écrit :

On 24.04.2012 20:34, Benjamin Peterson wrote:

2012/4/24 Georg Brandl<g.brandl at gmx.net>:

I think that's misleading: there's no way to "correctly" parse malformed HTML. There is in the since that you can follow the HTML5 algorithm, which can "parse" any junk you throw at it. Ah, good. Then I hope we are following the algorithm here (and are slowly coming to use it for htmllib in general). Yes, Ezio’s commits on html.parser/HTMLParser in the last months have been following the HTML5 spec. Ezio, RDM and I have had some discussion about that on some bug reports, IRC and private mail and reached the agreement to do the useful thing, that is follow HTML5 and not pretend that the stdlib parser is strict or validating. Ezio was thinking about a blog.python.org post to advertise this.

Please do this, and I welcome anyone else who wants to write about their work on the blog to do so. Contact me for info.

Previous message: [Python-Dev] cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.
Next message: [Python-Dev] cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list