Issue 18020: html.escape 10x slower than cgi.escape (original) (raw)

Created on 2013-05-20 08:21 by flox, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
htmlescape.patch Teh Matt,2013-05-20 23:05 Speed up html.escape() review
Messages (8)
msg189641 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2013-05-20 08:21
I noticed the convenient ``html.escape`` in Python 3.2 and ``cgi.escape`` is marked as deprecated. However, the former is an order of magnitude slower than the latter. $ python3 --version Python 3.3.2 With html.escape: $ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright)" "h = html(s)" 10000 loops, best of 3: 48.7 usec per loop $ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright) * 19" "h = html(s)" 1000 loops, best of 3: 898 usec per loop With cgi.escape: $ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright)" "h = escape(s)" 100000 loops, best of 3: 7.42 usec per loop $ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright) * 19" "h = escape(s)" 10000 loops, best of 3: 21.5 usec per loop Since this kind of function is called frequently in template engines, it makes a difference. Of course C replacements are available on PyPI: MarkupSafe or Webext But it would be nice to restore the performance of cgi.escape with a pragmatic `.replace(` approach.
msg189643 - (view) Author: Graham Dumpleton (grahamd) Date: 2013-05-20 08:53
Importing the cgi module the first time even in Python 2.X was always very expensive. I would suggest you redo the test using timing done inside of the script after modules have been imported so as to properly separate module import time in both cases from execution time of the specific function.
msg189644 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2013-05-20 09:06
> I would suggest you redo the test using timing done inside of the script after modules have been imported. The -s switch takes care of this.
msg189647 - (view) Author: Graham Dumpleton (grahamd) Date: 2013-05-20 10:14
Whoops. Missed the quoting.
msg189711 - (view) Author: Matt Bryant (Teh Matt) * Date: 2013-05-20 23:05
I did a few more tests and am seeing the same speed differences Florent noticed. It seems reasonable to use .replace() instead, as it does the same thing significantly faster. I've attached a patch doing just this.
msg190267 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2013-05-29 02:30
Matt's patch looks good to me. It removes two module-level dicts, but they're marked as internal, so that's OK. There's already a test case that exercises html.escape(), so I don't think any additional tests are needed.
msg192527 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-07-07 09:11
New changeset db5f2b74e369 by Ezio Melotti in branch 'default': #18020: improve html.escape speed by an order of magnitude. Patch by Matt Bryant. http://hg.python.org/cpython/rev/db5f2b74e369
msg192528 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-07-07 09:12
Fixed, thanks for the report and the patch!
History
Date User Action Args
2022-04-11 14:57:45 admin set github: 62220
2013-07-07 09:12:39 ezio.melotti set status: open -> closedversions: + Python 3.4, - Python 3.2, Python 3.3messages: + resolution: fixedstage: patch review -> resolved
2013-07-07 09:11:36 python-dev set nosy: + python-devmessages: +
2013-06-01 13:33:53 ezio.melotti set assignee: ezio.melottistage: patch review
2013-05-29 02:30:26 akuchling set nosy: + akuchlingmessages: +
2013-05-25 15:42:51 jwilk set nosy: + jwilk
2013-05-20 23:05:30 Teh Matt set files: + htmlescape.patchnosy: + Teh Mattmessages: + keywords: + patch
2013-05-20 10:14:05 grahamd set messages: +
2013-05-20 09:06:38 flox set messages: +
2013-05-20 08:53:52 grahamd set nosy: + grahamdmessages: +
2013-05-20 08:21:38 flox create