cpython: 7052eb923fb8 (original) (raw)

--- a/Doc/library/htmlparser.rst +++ b/Doc/library/htmlparser.rst @@ -22,7 +22,7 @@ -------------- -This module defines a class :class:HTMLParser which serves as the basis for +This module defines a class :class:.HTMLParser which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. Unlike the parser in :mod:htmllib, this parser is not based on the SGML parser in :mod:sgmllib. @@ -30,11 +30,12 @@ in :mod:sgmllib. .. class:: HTMLParser()

An exception is defined as well: - .. exception:: HTMLParseError

-:class:HTMLParser instances have the following methods: + +Example HTML Parser Application +------------------------------- + +As a basic example, below is a simple HTML parser that uses the +:class:.HTMLParser class to print out start tags, end tags and data +as they are encountered:: +

+

+ +The output will then be:: +

-.. method:: HTMLParser.reset() +:class:.HTMLParser Methods +----------------------------

.. method:: HTMLParser.feed(data) @@ -73,7 +111,13 @@ An exception is defined as well: Force processing of all buffered data as if it were followed by an end-of-file mark. This method may be redefined by a derived class to define additional processing at the end of the input, but the redefined version should always call

+ +.. method:: HTMLParser.reset() +

.. method:: HTMLParser.getpos() @@ -89,22 +133,34 @@ An exception is defined as well: attributes can be preserved, etc.). +The following methods are called when data or markup elements are encountered +and they are meant to be overridden in a subclass. The base class +implementations do nothing (except for :meth:~HTMLParser.handle_startendtag): + + .. method:: HTMLParser.handle_starttag(tag, attrs)

+ + +.. method:: HTMLParser.handle_endtag(tag) +

.. method:: HTMLParser.handle_startendtag(tag, attrs) @@ -115,94 +171,175 @@ An exception is defined as well: implementation simply calls :meth:handle_starttag and :meth:handle_endtag. -.. method:: HTMLParser.handle_endtag(tag) +.. method:: HTMLParser.handle_data(data)

-.. method:: HTMLParser.handle_data(data) +.. method:: HTMLParser.handle_entityref(name)

.. method:: HTMLParser.handle_charref(name)

- -.. method:: HTMLParser.handle_entityref(name) -

.. method:: HTMLParser.handle_comment(data)

.. method:: HTMLParser.handle_decl(decl)

-.. method:: HTMLParser.unknown_decl(data) -

.. method:: HTMLParser.handle_pi(data)

-.. _htmlparser-example: +.. method:: HTMLParser.unknown_decl(data) +

-Example HTML Parser Application -------------------------------- + +.. _htmlparser-examples: -As a basic example, below is a simple HTML parser that uses the -:class:HTMLParser class to print out start tags, end tags and data -as they are encountered:: +Examples +-------- + +The following class implements a parser that will be used to illustrate more +examples:: from HTMLParser import HTMLParser

-

parser = MyHTMLParser()

+ +Parsing a doctype:: +

+Parsing an element with a few attributes and a title:: +

+The content of script and style elements is returned as is, without +further parsing:: +

+Parsing comments:: +

+Parsing named and numeric character references and converting them to the +correct char (note: these 3 references are all equivalent to '>'):: +

+Feeding incomplete chunks to :meth:~HTMLParser.feed works, but +:meth:~HTMLParser.handle_data might be called more than once:: +

+Parsing invalid HTML (e.g. unquoted attributes) also works:: +