Issue 711632: htmllib.HTMLParser.anchorlist problem - Python tracker (original) (raw)

Issue711632

Created on 2003-03-29 00:26 by cpgray, last changed 2022-04-10 16:07 by admin. This issue is now closed.

Messages (3)
msg15290 - (view) Author: Chris Gray (cpgray) Date: 2003-03-29 00:26
htmllib.HTMLParser.anchorlist is cleared when __init__() is called but not when reset() is called. Processing more than one document with the same instance accumulates anchors from all documents processed in the list. Arguably a feature not a bug, but it makes sense for reset to clear whatever is initialized by __init__. Here is an illustrative IDLE session: Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> import htmllib >>> import formatter >>> p = htmllib.HTMLParser(formatter.NullFormatter()) >>> p.feed('Python') >>> p.anchorlist ['http://www.python.org'] >>> p.reset() >>> p.feed('Sourceforge') >>> p.anchorlist ['http://www.python.org', 'http://sourceforge.net/']
msg15291 - (view) Author: Andrew Gaul (gaul) Date: 2003-08-22 08:44
Logged In: YES user_id=139865 See patch 793021 for a fix.
msg15292 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-09-12 16:38
Logged In: YES user_id=21627 Fixed with #793021.
History
Date User Action Args
2022-04-10 16:07:56 admin set github: 38229
2003-03-29 00:26:11 cpgray create