Issue 19646: Use PyUnicodeWriter in repr(dict) (original) (raw)

Created on 2013-11-18 21:15 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
dict_repr_writer.patch	vstinner,2013-11-18 21:15	review
bench_dict_repr.py	vstinner,2013-11-18 21:15

Messages (5)
msg203322 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-18 21:15
Attached patch modify dict_repr() function to use the _PyUnicodeWriter API instead of building a list of short strings with PyUnicode_AppendAndDel() and calling PyUnicode_Join() at the end to join the list. PyUnicode_Append() is inefficient because it has to allocate a new string instead of reusing the same buffer. _PyUnicodeWriter API has a different design. It overallocates a buffer to write Unicode characters and shrink the buffer at the end. It is faster according to my micro benchmark. $ ./python ~/prog/HG/misc/python/benchmark.py compare_to pyaccu writer Common platform: CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Python unicode implementation: PEP 393 CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Timer precision: 40 ns Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09) Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow Bits: int=32, long=64, long long=64, size_t=64, void*=64 Timer: time.perf_counter Platform of campaign pyaccu: Date: 2013-11-18 21:37:44 Python version: 3.4.0a4+ (default:fc7ceb001eec, Nov 18 2013, 21:29:41) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] SCM: hg revision=fc7ceb001eec tag=tip branch=default date="2013-11-18 21:11 +0100" Platform of campaign writer: Date: 2013-11-18 22:10:40 Python version: 3.4.0a4+ (default:fc7ceb001eec+, Nov 18 2013, 22:10:12) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] SCM: hg revision=fc7ceb001eec+ tag=tip branch=default date="2013-11-18 21:11 +0100" --------------------------------------+-------------+-------------- Tests \| pyaccu	writer --------------------------------------+-------------+-------------- {"a": 1}	603 ns (*)	496 ns (-18%) dict(zip("abc", range(3)))	1.05 us (*)	904 ns (-14%) {"%03d":"abc" for k in range(10)}	631 ns (*)	501 ns (-21%) {"%100d":"abc" for k in range(10)}	660 ns (*)	484 ns (-27%) {k:"a" for k in range(10**3)}	235 us (*)	166 us (-30%) {k:"abc" for k in range(10**3)}	245 us (*)	177 us (-28%) {"%100d":"abc" for k in range(10**3)}	668 ns (*)	478 ns (-28%) {k:"a" for k in range(10**6)}	258 ms (*)	186 ms (-28%) {k:"abc" for k in range(10**6)}	265 ms (*)	184 ms (-31%) {"%100d":"abc" for k in range(10**6)}	652 ns (*)	489 ns (-25%) --------------------------------------+-------------+-------------- Total	523 ms (*)	369 ms (-29%) --------------------------------------+-------------+--------------
msg203367 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-11-19 12:11
New changeset 3a354b879d1f by Victor Stinner in branch 'default': Issue #19646: repr(dict) now uses _PyUnicodeWriter API for better performances http://hg.python.org/cpython/rev/3a354b879d1f
msg203368 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-19 12:13
I added a new _PyUnicodeWriter_WriteASCIIString() function to reply to Serhiy's comment on Rietveld: "Perhaps it will be worth to add a helper function or macros _PyUnicodeWriter_WriteTwoAsciiChars()?" changeset: 87263:d1ca05428c38 user: Victor Stinner <victor.stinner@gmail.com> date: Tue Nov 19 12:54:53 2013 +0100 files: Include/unicodeobject.h Objects/listobject.c Objects/unicodeobject.c Python/formatter_unicode.c description: Add _PyUnicodeWriter_WriteASCIIString() function Using this function, there is no need to create temporary colon (": ") or sep (", ") strings, performances are a little better with the final commit. Common platform: Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09) CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes Timer: time.perf_counter Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow Python unicode implementation: PEP 393 CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Bits: int=32, long=64, long long=64, size_t=64, void*=64 Platform of campaign pyaccu: Python version: 3.4.0a4+ (default:99141ab08e21, Nov 19 2013, 13:10:27) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] Timer precision: 39 ns Date: 2013-11-19 13:10:28 SCM: hg revision=99141ab08e21 branch=default date="2013-11-19 12:59 +0100" Platform of campaign writer: Python version: 3.4.0a4+ (default:3a354b879d1f, Nov 19 2013, 13:08:42) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] Timer precision: 46 ns Date: 2013-11-19 13:09:20 SCM: hg revision=3a354b879d1f tag=tip branch=default date="2013-11-19 13:07 +0100" --------------------------------------+-------------+-------------- Tests \| pyaccu	writer --------------------------------------+-------------+-------------- {"a": 1}	613 ns (*)	338 ns (-45%) dict(zip("abc", range(3)))	1.05 us (*)	640 ns (-39%) {"%03d":"abc" for k in range(10)}	635 ns (*)	447 ns (-30%) {"%100d":"abc" for k in range(10)}	651 ns (*)	424 ns (-35%) {k:"a" for k in range(10**3)}	233 us (*)	132 us (-44%) {k:"abc" for k in range(10**3)}	251 us (*)	154 us (-39%) {"%100d":"abc" for k in range(10**3)}	668 ns (*)	412 ns (-38%) {k:"a" for k in range(10**6)}	268 ms (*)	158 ms (-41%) {k:"abc" for k in range(10**6)}	276 ms (*)	163 ms (-41%) {"%100d":"abc" for k in range(10**6)}	658 ns (*)	422 ns (-36%) --------------------------------------+-------------+-------------- Total	544 ms (*)	321 ms (-41%) --------------------------------------+-------------+--------------
msg203372 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-11-19 13:18
> Using this function, there is no need to create temporary colon (": ") or sep (", ") strings, performances are a little better with the final commit. I'm surprised that this has given such large effect. ;) I hoped only on more clear code.
msg203373 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-19 13:21
> I'm surprised that this has given such large effect. ;) I hoped only on more clear code. To be honest, I expected shorter code but worse performances using _PyUnicodeWriter_WriteASCIIString(). dict_repr() was not really super fast: it did call PyUnicode_FromString() at each call to decode ": " and ", " from UTF-8. list_repr() and tuplerepr() kept ", " separator cached in a static variable. This is probably why the code is now faster.

History
Date	User	Action	Args
2022-04-11 14:57:53	admin	set	github: 63845
2013-11-19 13:21:33	vstinner	set	messages: +
2013-11-19 13🔞33	serhiy.storchaka	set	messages: +
2013-11-19 12:13:58	vstinner	set	status: open -> closedresolution: fixed
2013-11-19 12:13:52	vstinner	set	messages: +
2013-11-19 12:11:31	python-dev	set	nosy: + python-devmessages: +
2013-11-18 21:15:14	vstinner	set	files: + bench_dict_repr.py
2013-11-18 21:15:06	vstinner	create