WebClientProgramming (original) (raw)

This wiki is in the process of being archived due to lack of usage and the resources necessary to serve it — predominately to bots, crawlers, and LLM companies. Edits are discouraged.
Pages are preserved as they were at the time of archival. For current information, please visit python.org.
If a change to this archive is absolutely needed, requests can be made via the infrastructure@python.org mailing list.

Client-Side Web Programming

Libraries

µTidylib and mxTidy -- Python interfaces to html tidy library to clean up HTML documents.
html5lib A HTML5-compliant library for parsing arbitarily-broken HTML to a range of tree formats including minidom, elementtree (including lxml) and BeautifulSoup
BeautifulSoup -- a permissive HTML parser.
Don't use HTMLParser (Python 2.x) or html.parser (Python 3.x) on HTML that might be invalid! That way lies pain. Either clean it up (using tidy), or use a different parser.
urllib, urllib2, and httplib in the standard library.
ClientCookie, ClientForm, and Mechanize are higher-level libraries for writing a web client.
mechanoid a mechanize fork.
libxml2dom can parse HTML by employing libxml2's liberal HTML parser.

Resources

Grab a document from the web - from the Python Cookbook
Python web-client programming general FAQs.
urllib -- Open arbitrary resources by URL
urllib2 -- extensible library for opening URLs