cpython: 1850f45f6169 (original) (raw)

--- a/Doc/library/html.parser.rst +++ b/Doc/library/html.parser.rst @@ -19,14 +19,15 @@ parsing text files formatted in HTML (Hy .. class:: HTMLParser(strict=True) Create a parser instance. If strict is True (the default), invalid

+ +Example HTML Parser Application +------------------------------- + +As a basic example, below is a simple HTML parser that uses the +:class:HTMLParser class to print out start tags, end tags, and data +as they are encountered:: +

+

+ +The output will then be:: +

+ +:class:.HTMLParser Methods +---------------------------- :class:HTMLParser instances have the following methods: -.. method:: HTMLParser.reset() -

- .. method:: HTMLParser.feed(data) Feed some text to the parser. It is processed insofar as it consists of complete elements; incomplete data is buffered until more data is fed or

.. method:: HTMLParser.close() @@ -68,6 +105,12 @@ An exception is defined as well: the :class:HTMLParser base class method :meth:close. +.. method:: HTMLParser.reset() +

+ .. method:: HTMLParser.getpos() Return current line number and offset. @@ -81,23 +124,35 @@ An exception is defined as well: attributes can be preserved, etc.). +The following methods are called when data or markup elements are encountered +and they are meant to be overridden in a subclass. The base class +implementations do nothing (except for :meth:~HTMLParser.handle_startendtag): + + .. method:: HTMLParser.handle_starttag(tag, attrs)

+.. method:: HTMLParser.handle_endtag(tag) +

+ .. method:: HTMLParser.handle_startendtag(tag, attrs) Similar to :meth:handle_starttag, but called when the parser encounters an @@ -106,57 +161,46 @@ An exception is defined as well: implementation simply calls :meth:handle_starttag and :meth:handle_endtag. -.. method:: HTMLParser.handle_endtag(tag) -

- .. method:: HTMLParser.handle_data(data)

- -.. method:: HTMLParser.handle_charref(name) -

.. method:: HTMLParser.handle_entityref(name)

+ +.. method:: HTMLParser.handle_charref(name) +

.. method:: HTMLParser.handle_comment(data)

.. method:: HTMLParser.handle_decl(decl)

-.. method:: HTMLParser.unknown_decl(data) -

.. method:: HTMLParser.handle_pi(data) @@ -174,29 +218,123 @@ An exception is defined as well: cause the '?' to be included in data. -.. _htmlparser-example: +.. method:: HTMLParser.unknown_decl(data) +

-Example HTML Parser Application -------------------------------- + +.. _htmlparser-examples: -As a basic example, below is a simple HTML parser that uses the -:class:HTMLParser class to print out start tags, end tags, and data -as they are encountered:: +Examples +-------- + +The following class implements a parser that will be used to illustrate more +examples:: from html.parser import HTMLParser

+

+Parsing a doctype:: +

+Parsing an element with a few attributes and a title:: +

+The content of script and style elements is returned as is, without +further parsing::

+Parsing comments:: +

+Parsing named and numeric character references and converting them to the +correct char (note: these 3 references are all equivalent to '>')::

+Feeding incomplete chunks to :meth:~HTMLParser.feed works, but +:meth:~HTMLParser.handle_data might be called more than once:: +

+Parsing invalid HTML (e.g. unquoted attributes) also works:: +

.. rubric:: Footnotes