[Python-Dev] sgmllib Comments (original) (raw)

Sam Ruby rubys at intertwingly.net
Mon Jun 12 06:05:06 CEST 2006


Terry Reedy wrote:

"Fred L. Drake, Jr." <fdrake at acm.org> wrote in message news:200606112039.37834.fdrake at acm.org...

On Sunday 11 June 2006 16:26, Sam Ruby wrote:

Planet is a feed aggregator written in Python. It depends heavily on SGMLLib. A recent bug report turned out to be a deficiency in sgmllib, and I've submitted a test case and a patch[1] (use or discard the patch, it is the test that I care about). ... and which are original. (Note: feeds often contain such abominations as &copy; which the new code will treat indistinguishably from ©) It really sounds like sgmllib is the wrong foundation for this. ... Have you looked at HTMLParser as an alternate to sgmllib? It has better support for XHTML constructs. Have you (the OP), checked how related Python projects, such as Mark Pilgrim's feed parser, http://www.feedparser.org/ handle the same sort of input (I have only looked at docs and tests, not code).

Just to be clear: Planet uses Mark's feed parser, which uses SGMLlib.

I'm a committer on that project:

http://sourceforge.net/project/memberlist.php?group_id=112328

I was investigating a bug in sgmllib which affected the feed parser (and therefore Planet), and noticed that there were changes in the SVN head of Python which broke three feed parser unit tests.

It is my belief that these changes will break other existing users of sgmllib.



More information about the Python-Dev mailing list