Issue 513840: entity unescape for sgml/htmllib (original) (raw)
Issue513840
Created on 2002-02-06 17:55 by glchapman, last changed 2022-04-10 16:04 by admin. This issue is now closed.
Messages (4) | ||
---|---|---|
msg61076 - (view) | Author: Greg Chapman (glchapman) | Date: 2002-02-06 17:55 |
The parsers defined in htmllib and sgmllib do not provide any facilities for unescaping a tag attribute which has an embedded html entityref (i.e., they do not provide a way to convert "a&b" to "a&b"). The parser in HTMLParser unescapes all tag attributes automatically. I'm not sure that's the right approach for sgmllib and htmllib (since it might break existing code), but it seems to me that one of the modules ought to provide a function or method which can do the unescaping if needed. (I'm not familiar with either the SGML or the HTML specification, but I assume one of them mandates the escaping of '&' (e.g.) in tag attributes. If so, then it seems appropriate for one of the modules to provide a function which undoes the mandated transformation.) | ||
msg61077 - (view) | Author: Fred Drake (fdrake) ![]() |
Date: 2006-06-22 03:57 |
Logged In: YES user_id=3066 This request is making me reconsider some other changes that have already been made on the trunk (and are now in 2.5b1). Reading this, I thought "Doesn't it already do that?" Turns out that in Python 2.4, it doesn't. Both versions handle this in parsed character data; the difference is confined to attribute values. I'd like to propose adding a Boolean configuration attribute on the parser instance that, when set, causes the parser to decode entity and character references. By default, it would be unset. This would support backward compatibility and make it easier to get attribute value decoding. Another possibility would be to revert the new feature and add a separate method to perform the decoding. | ||
msg114175 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010-08-17 21:41 |
Is anyone aware if this was implemented in 2.5 or later as hinted at in ? If yes please close this. If no any point in putting this into 3.2? | ||
msg185129 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2013-03-24 11:33 |
See also #2927. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:04:57 | admin | set | github: 36039 |
2013-11-18 09:54:25 | ezio.melotti | set | status: open -> closedassignee: ezio.melottisuperseder: expose html.parser.unescaperesolution: duplicatestage: test needed -> resolved |
2013-03-24 11:33:06 | ezio.melotti | set | messages: + versions: + Python 3.4, - Python 3.2 |
2013-03-23 22:22:01 | ezio.melotti | set | nosy: + ezio.melotti |
2010-08-17 21:41:06 | BreamoreBoy | set | nosy: + BreamoreBoymessages: + versions: + Python 3.2, - Python 2.7 |
2009-02-12 20:03:12 | ajaksu2 | set | keywords: + easystage: test neededversions: + Python 2.7 |
2002-02-06 17:55:02 | glchapman | create |