Issue 15156: Refactor HTMLParser.unescape to use html.entities.html5 (original) (raw)

Created on 2012-06-24 02:45 by ezio.melotti, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue15156.diff ezio.melotti,2012-06-24 14:26 review
issue15156-2.diff ezio.melotti,2012-06-24 17:35 review
Messages (5)
msg163702 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-06-24 02:45
HTMLParser has an internal method called unescape [0] used to convert named character references to the equivalent characters, and it does so by using html.entities.name2codepoint to recreate the equivalent of html.entities.entityrefs with the addition of '. Now that the html5 entities have been added to html.entities, the parser should use them instead of name2codepoint. [0]: see Lib/html/parser.py:500
msg163790 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-06-24 14:26
Here's a patch, please review.
msg163811 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-06-24 17:35
Patch updated after the review.
msg163837 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-06-24 20:04
New changeset 0d53703b1a99 by Ezio Melotti in branch 'default': #15156: HTMLParser now uses the new "html.entities.html5" dictionary. http://hg.python.org/cpython/rev/0d53703b1a99
msg163838 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-06-24 20:05
Fixed, thanks for the reviews!
History
Date User Action Args
2022-04-11 14:57:31 admin set github: 59361
2012-06-24 20:05:35 ezio.melotti set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2012-06-24 20:04:09 python-dev set nosy: + python-devmessages: +
2012-06-24 17:35:31 ezio.melotti set files: + issue15156-2.diffmessages: +
2012-06-24 14:26:50 ezio.melotti set files: + issue15156.diffkeywords: + patchmessages: + stage: needs patch -> patch review
2012-06-24 02:45:42 ezio.melotti create