Issue 15156: Refactor HTMLParser.unescape to use html.entities.html5 (original ) (raw )Created on 2012-06-24 02:45 by ezio.melotti , last changed 2022-04-11 14:57 by admin . This issue is now closed .
Messages (5)
msg163702 - (view)
Author: Ezio Melotti (ezio.melotti) *
Date: 2012-06-24 02:45
HTMLParser has an internal method called unescape [0] used to convert named character references to the equivalent characters, and it does so by using html.entities.name2codepoint to recreate the equivalent of html.entities.entityrefs with the addition of '. Now that the html5 entities have been added to html.entities, the parser should use them instead of name2codepoint. [0]: see Lib/html/parser.py:500
msg163790 - (view)
Author: Ezio Melotti (ezio.melotti) *
Date: 2012-06-24 14:26
Here's a patch, please review.
msg163811 - (view)
Author: Ezio Melotti (ezio.melotti) *
Date: 2012-06-24 17:35
Patch updated after the review.
msg163837 - (view)
Author: Roundup Robot (python-dev)
Date: 2012-06-24 20:04
New changeset 0d53703b1a99 by Ezio Melotti in branch 'default': #15156 : HTMLParser now uses the new "html.entities.html5" dictionary. http://hg.python.org/cpython/rev/0d53703b1a99
msg163838 - (view)
Author: Ezio Melotti (ezio.melotti) *
Date: 2012-06-24 20:05
Fixed, thanks for the reviews!
History
Date
User
Action
Args
2022-04-11 14:57:31
admin
set
github: 59361
2012-06-24 20:05:35
ezio.melotti
set
status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2012-06-24 20:04:09
python-dev
set
nosy: + python-dev messages: +
2012-06-24 17:35:31
ezio.melotti
set
files: + issue15156-2.diff messages: +
2012-06-24 14:26:50
ezio.melotti
set
files: + issue15156.diff keywords: + patch messages: + stage: needs patch -> patch review
2012-06-24 02:45:42
ezio.melotti
create