Issue 19646: Use PyUnicodeWriter in repr(dict) (original) (raw)

Created on 2013-11-18 21:15 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
dict_repr_writer.patch vstinner,2013-11-18 21:15 review
bench_dict_repr.py vstinner,2013-11-18 21:15
Messages (5)
msg203322 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-18 21:15
Attached patch modify dict_repr() function to use the _PyUnicodeWriter API instead of building a list of short strings with PyUnicode_AppendAndDel() and calling PyUnicode_Join() at the end to join the list. PyUnicode_Append() is inefficient because it has to allocate a new string instead of reusing the same buffer. _PyUnicodeWriter API has a different design. It overallocates a buffer to write Unicode characters and shrink the buffer at the end. It is faster according to my micro benchmark. $ ./python ~/prog/HG/misc/python/benchmark.py compare_to pyaccu writer Common platform: CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Python unicode implementation: PEP 393 CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Timer precision: 40 ns Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09) Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow Bits: int=32, long=64, long long=64, size_t=64, void*=64 Timer: time.perf_counter Platform of campaign pyaccu: Date: 2013-11-18 21:37:44 Python version: 3.4.0a4+ (default:fc7ceb001eec, Nov 18 2013, 21:29:41) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] SCM: hg revision=fc7ceb001eec tag=tip branch=default date="2013-11-18 21:11 +0100" Platform of campaign writer: Date: 2013-11-18 22:10:40 Python version: 3.4.0a4+ (default:fc7ceb001eec+, Nov 18 2013, 22:10:12) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] SCM: hg revision=fc7ceb001eec+ tag=tip branch=default date="2013-11-18 21:11 +0100" --------------------------------------+-------------+-------------- Tests                                 |      pyaccu writer --------------------------------------+-------------+-------------- {"a": 1} 603 ns (*) 496 ns (-18%) dict(zip("abc", range(3))) 1.05 us (*) 904 ns (-14%) {"%03d":"abc" for k in range(10)} 631 ns (*) 501 ns (-21%) {"%100d":"abc" for k in range(10)} 660 ns (*) 484 ns (-27%) {k:"a" for k in range(10**3)} 235 us (*) 166 us (-30%) {k:"abc" for k in range(10**3)} 245 us (*) 177 us (-28%) {"%100d":"abc" for k in range(10**3)} 668 ns (*) 478 ns (-28%) {k:"a" for k in range(10**6)} 258 ms (*) 186 ms (-28%) {k:"abc" for k in range(10**6)} 265 ms (*) 184 ms (-31%) {"%100d":"abc" for k in range(10**6)} 652 ns (*) 489 ns (-25%) --------------------------------------+-------------+-------------- Total 523 ms (*) 369 ms (-29%) --------------------------------------+-------------+--------------
msg203367 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-19 12:11
New changeset 3a354b879d1f by Victor Stinner in branch 'default': Issue #19646: repr(dict) now uses _PyUnicodeWriter API for better performances http://hg.python.org/cpython/rev/3a354b879d1f
msg203368 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-19 12:13
I added a new _PyUnicodeWriter_WriteASCIIString() function to reply to Serhiy's comment on Rietveld: "Perhaps it will be worth to add a helper function or macros _PyUnicodeWriter_WriteTwoAsciiChars()?" changeset: 87263:d1ca05428c38 user: Victor Stinner <victor.stinner@gmail.com> date: Tue Nov 19 12:54:53 2013 +0100 files: Include/unicodeobject.h Objects/listobject.c Objects/unicodeobject.c Python/formatter_unicode.c description: Add _PyUnicodeWriter_WriteASCIIString() function Using this function, there is no need to create temporary colon (": ") or sep (", ") strings, performances are a little better with the final commit. Common platform: Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09) CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Timer: time.perf_counter Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow Python unicode implementation: PEP 393 CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Bits: int=32, long=64, long long=64, size_t=64, void*=64 Platform of campaign pyaccu: Python version: 3.4.0a4+ (default:99141ab08e21, Nov 19 2013, 13:10:27) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] Timer precision: 39 ns Date: 2013-11-19 13:10:28 SCM: hg revision=99141ab08e21 branch=default date="2013-11-19 12:59 +0100" Platform of campaign writer: Python version: 3.4.0a4+ (default:3a354b879d1f, Nov 19 2013, 13:08:42) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] Timer precision: 46 ns Date: 2013-11-19 13:09:20 SCM: hg revision=3a354b879d1f tag=tip branch=default date="2013-11-19 13:07 +0100" --------------------------------------+-------------+-------------- Tests                                 |      pyaccu writer --------------------------------------+-------------+-------------- {"a": 1} 613 ns (*) 338 ns (-45%) dict(zip("abc", range(3))) 1.05 us (*) 640 ns (-39%) {"%03d":"abc" for k in range(10)} 635 ns (*) 447 ns (-30%) {"%100d":"abc" for k in range(10)} 651 ns (*) 424 ns (-35%) {k:"a" for k in range(10**3)} 233 us (*) 132 us (-44%) {k:"abc" for k in range(10**3)} 251 us (*) 154 us (-39%) {"%100d":"abc" for k in range(10**3)} 668 ns (*) 412 ns (-38%) {k:"a" for k in range(10**6)} 268 ms (*) 158 ms (-41%) {k:"abc" for k in range(10**6)} 276 ms (*) 163 ms (-41%) {"%100d":"abc" for k in range(10**6)} 658 ns (*) 422 ns (-36%) --------------------------------------+-------------+-------------- Total 544 ms (*) 321 ms (-41%) --------------------------------------+-------------+--------------
msg203372 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-19 13:18
> Using this function, there is no need to create temporary colon (": ") or sep (", ") strings, performances are a little better with the final commit. I'm surprised that this has given such large effect. ;) I hoped only on more clear code.
msg203373 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-19 13:21
> I'm surprised that this has given such large effect. ;) I hoped only on more clear code. To be honest, I expected shorter code but worse performances using _PyUnicodeWriter_WriteASCIIString(). dict_repr() was not really super fast: it did call PyUnicode_FromString() at each call to decode ": " and ", " from UTF-8. list_repr() and tuplerepr() kept ", " separator cached in a static variable. This is probably why the code is now faster.
History
Date User Action Args
2022-04-11 14:57:53 admin set github: 63845
2013-11-19 13:21:33 vstinner set messages: +
2013-11-19 13🔞33 serhiy.storchaka set messages: +
2013-11-19 12:13:58 vstinner set status: open -> closedresolution: fixed
2013-11-19 12:13:52 vstinner set messages: +
2013-11-19 12:11:31 python-dev set nosy: + python-devmessages: +
2013-11-18 21:15:14 vstinner set files: + bench_dict_repr.py
2013-11-18 21:15:06 vstinner create