Issue 1459279: sgmllib.SGMLparser and hexadecimal numeric character refs (original) (raw)
Issue1459279
Created on 2006-03-27 12:51 by nerby, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (3) | ||
---|---|---|
msg60894 - (view) | Author: Francesco Ricciardi (nerby) | Date: 2006-03-27 12:51 |
According to HTML 4.0 specification it is possible to have hexadecimal numeric character references, not only decimal (see http://www.w3.org/TR/REC-html40/charset.html#h-5.3.1). However sgmllib.SGMLparser does not recognize the hexadecimal form. More and more HTML pages now use entities with a high codepoint, not in the official HTML entity list, so proper handling to these references should be implemented. A possible solution could be: - improving the "charref" regular expression, so to include exadecimal values; - considering all numeric references valid: those with n < 255 should be converted to the corresponding characters, those above 255 should be left as numerical charrefs. | ||
msg109853 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010-07-10 11:21 |
sgmllib has been removed from py3k. | ||
msg114670 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010-08-22 10:45 |
sgmllib has been deprecated since 2.6 and has been removed from py3k. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:16 | admin | set | github: 43097 |
2010-08-22 10:45:52 | BreamoreBoy | set | status: open -> closedresolution: out of datemessages: + versions: + Python 3.2, - Python 2.7 |
2010-07-10 11:21:22 | BreamoreBoy | set | nosy: + BreamoreBoymessages: + versions: - Python 3.1 |
2009-04-22 12:45:50 | ajaksu2 | set | keywords: + easy |
2009-03-21 02:02:53 | ajaksu2 | set | stage: test neededtype: enhancementversions: + Python 3.1, Python 2.7, - Python 2.4 |
2006-03-27 12:51:59 | nerby | create |