Issue 912410: HTMLParser should support entities in attributes (original) (raw)

Created on 2004-03-09 01:20 by aaronsw, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
replacement.py	aaronsw,2004-03-09 01:21	replacement unescape function for HTMLParser.py

Messages (4)
msg45480 - (view)	Author: Aaron Swartz (aaronsw)	Date: 2004-03-09 01:20
HTMLParser doesn't currently support entities in attributes, like this: foo This patch fixes that. Simply replace the unescape in HTMLParser.py with: import htmlentitydefs def unescape(self, s): def replaceEntities(s): s = s.groups()[0] if s[0] == "#": s = s[1:] if s[0] in ['x','X']: c = int(s[1:], 16) else: c = int(s) return unichr(c) else: return unichr(htmlentitydefs.name2codepoint[c]) return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+\|\w{1,8}));", replaceEntities, s)
msg45481 - (view)	Author: Aaron Swartz (aaronsw)	Date: 2004-03-09 01:21
Logged In: YES user_id=122141 Oops. The replacement function is attached.
msg45482 - (view)	Author: Aaron Swartz (aaronsw)	Date: 2004-03-09 01:21
Logged In: YES user_id=122141 Argh. Hopefully now.
msg45483 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2007-03-06 14:46
Thanks for the patch. Committed as r54165, with the following changes: - added documentation changes - added testsuite changes - fixed incorrect usage of c in name2codepoint[c] (should be [s]) - included ' in the list of supported entities, for compatibility with older versions of HTMLParser - fall back to replacing an unsupported entity reference with &name;