Issue 513840: entity unescape for sgml/htmllib (original) (raw)

Issue513840

Created on 2002-02-06 17:55 by glchapman, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Messages (4)
msg61076 - (view) Author: Greg Chapman (glchapman) Date: 2002-02-06 17:55
The parsers defined in htmllib and sgmllib do not provide any facilities for unescaping a tag attribute which has an embedded html entityref (i.e., they do not provide a way to convert "a&b" to "a&b"). The parser in HTMLParser unescapes all tag attributes automatically. I'm not sure that's the right approach for sgmllib and htmllib (since it might break existing code), but it seems to me that one of the modules ought to provide a function or method which can do the unescaping if needed. (I'm not familiar with either the SGML or the HTML specification, but I assume one of them mandates the escaping of '&' (e.g.) in tag attributes. If so, then it seems appropriate for one of the modules to provide a function which undoes the mandated transformation.)
msg61077 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2006-06-22 03:57
Logged In: YES user_id=3066 This request is making me reconsider some other changes that have already been made on the trunk (and are now in 2.5b1). Reading this, I thought "Doesn't it already do that?" Turns out that in Python 2.4, it doesn't. Both versions handle this in parsed character data; the difference is confined to attribute values. I'd like to propose adding a Boolean configuration attribute on the parser instance that, when set, causes the parser to decode entity and character references. By default, it would be unset. This would support backward compatibility and make it easier to get attribute value decoding. Another possibility would be to revert the new feature and add a separate method to perform the decoding.
msg114175 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-17 21:41
Is anyone aware if this was implemented in 2.5 or later as hinted at in ? If yes please close this. If no any point in putting this into 3.2?
msg185129 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-03-24 11:33
See also #2927.
History
Date User Action Args
2022-04-10 16:04:57 admin set github: 36039
2013-11-18 09:54:25 ezio.melotti set status: open -> closedassignee: ezio.melottisuperseder: expose html.parser.unescaperesolution: duplicatestage: test needed -> resolved
2013-03-24 11:33:06 ezio.melotti set messages: + versions: + Python 3.4, - Python 3.2
2013-03-23 22:22:01 ezio.melotti set nosy: + ezio.melotti
2010-08-17 21:41:06 BreamoreBoy set nosy: + BreamoreBoymessages: + versions: + Python 3.2, - Python 2.7
2009-02-12 20:03:12 ajaksu2 set keywords: + easystage: test neededversions: + Python 2.7
2002-02-06 17:55:02 glchapman create