Issue 11707: Create C version of functools.cmp_to_key() (original) (raw)

process

Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: daniel.urban, eric.smith, georg.brandl, gruszczy, ncoghlan, python-dev, rhettinger, stutzbach, terry.reedy, vstinner
Priority: low Keywords: easy, patch

Created on 2011-03-29 00:22 by rhettinger, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
11707.patch gruszczy,2011-04-02 20:33 review
11707_2.patch gruszczy,2011-04-02 20:53 review
11707_3.patch gruszczy,2011-04-03 21:18 review
time_cmp_to_key.py rhettinger,2011-04-05 00:35 Timing code
11707_4.patch rhettinger,2011-04-05 00:43 Cleaned-up patch (passes test suite) review
Messages (26)
msg132448 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-29 00:22
For cases where the underlying comparison function is in C (strcoll for example), the pure Python dispatch in cmp_to_key dominates running time. This can be reduced considerably by writing cmp_to_key in C, making its overhead as low or lower than the cost of bound method dispatch.
msg132811 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-02 20:33
Here is a patch that passes functools tests. I'll be happy to work on it further.
msg132814 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011-04-02 20:47
I haven't had time to look at this in detail, but the concept appears sound. I'll try to devote some time to it, but hopefully someone else on core-mentorship with more familiarity with this code will also be able to review it. One thing: You don't want to delete the pure Python version, as that can be used by alternate implementations. You want to leave it in place, and then after it try to import the C version. If the import fails, the Python version will be used, if it succeeds, the C version will be used. I'm reasonably sure there are examples of testing both implementations elsewhere in the stdlib. Maybe ask on core-mentorship for pointers. Thanks for working on this!
msg132815 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-02 20:53
Ok, here is patch that keep python implementation of cmp_to_key if C version can't be imported. Thanks again for you help :-)
msg132817 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-02 22:19
Nice work. There are a few nits (dealloc and traverse need to address both python objects in the struct). I will look at the patch in detail when I get a chance (in the next week or so). It would be great if more tests were added (in general, C equivalents require more testing than their pure Python counterparts). If you get a chance, it would be great if we could see some timings to demonstrate that we're getting a significant improvement (that being the primary motivation for the patch).
msg132833 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-03 05:44
Filip, can you please submit a contributor agreement? http://docs.python.org/devguide/coredev.html#sign-a-contributor-agreement
msg132886 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-03 21:18
I worked a little on the tests. Only then I have noticed, that cmp_to_key tests weren't run, so I have encountered some problems, when I finally turned them on. I have added some of my tests, fixed some things in the patch and now it seems to work fine. I have added traverse method and decrease reference counts in dealloc. I haven't had yet time to run some performance tests, but I'll try to do that. I'll try to submit contributor agreement. It's a pity, that I can't send a signed copy through email. It'll take some time, before it comes from Poland.
msg132888 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-03 21:26
FWIW, the PSF is working towards getting electronic signatures for contributor agreements. In the mean time, a scan or photo would work just fine.
msg132954 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-04-04 17:22
[Question from python-list] Would a C version only be for 3.3 or backported to 3.2 and 2.7. The rationale for backport to 2.7 is that .cmp_to_key was added to 2.7 to aid transition to 3.x. A faster version would make use in 2.7 more inviting. I understand that there is concern that performance enhancements may have unforseen effects.
msg132956 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-04 18:05
I was only aiming for Py3.3. If someone wanted to push for a backport to 3.2, it would be up to the release manager to decide whether a performance booster would be worth the risk of introducing a bug in a point release. ISTM that if someone really cared about performance, they would probably already be using an O(n) key-function approach. This patch eliminates most of the overhead for calling a cmp-function, but it can't do anything about the body of the user-supplied cmp-function which will dominate the running time if it does anything useful.
msg132969 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-04 21:07
Here are some example performance results: def cmp(x, y): return y - x sorted(range(1, 10000000), key=cmp_to_key(cmp)) ''' C version: real 0m19.994s user 0m8.053s sys 0m1.044s Python version: real 0m28.825s user 0m28.046s sys 0m0.728s ''' def cmp(x, y): x = int(x) y = int(y) return (x > y) - (y > x) sorted([str(i) for i in reversed(range(1, 2000000))], key=cmp_to_key(cmp)) ''' Python version real 0m15.930s user 0m15.629s sys 0m0.284s C version real 0m10.880s user 0m10.585s sys 0m0.284s ''' There is some performance gain. I don't know however, if it's enough to use C version instead of Python, that's for Raymond to decide.
msg132972 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-04 21:15
I'm working on cleaning-up (and speeding-up) the patch. I'll post new timings once that's done.
msg132996 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-05 00:35
Attaching timing code. Results with and without patch (as cleaned-up): 7.1 seconds without patch 3.2 seconds with patch
msg132997 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-05 00:43
Attaching cleaned-up version of the patch, making it more like the pure python version: * Made 'obj' a member so it is an accessible field * Accept keyword arguments * Create arg tuple like was done in the Py2.7 code * Create the zero only once * Expanded tests to cover all code paths
msg133010 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-05 09:34
New changeset a03fb2fc3ed8 by Raymond Hettinger in branch 'default': Issue #11707: Fast C version of functools.cmp_to_key() http://hg.python.org/cpython/rev/a03fb2fc3ed8
msg133012 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-05 09:55
Thanks for the patch Filip. Closing this. If anyone wants to lobby the RM for a 3.2 backport, feel free to re-open and assign to the release manager for consideration.
msg133014 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-05 10:21
New changeset 76ed6a061ebe by Victor Stinner in branch 'default': Issue #11707: Fix compilation errors with Visual Studio http://hg.python.org/cpython/rev/76ed6a061ebe
msg133015 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-05 10:33
functools_cmp_to_key() doesn't check that the argument is a callable. >>> table=list(range(3)) >>> table.sort(key=functools.cmp_to_key(3)) Traceback (most recent call last): File "", line 1, in TypeError: 'int' object is not callable PyCallable_Check() should be used, something like: if (!PyCallable_Check(cmp)) { PyErr_SetString(PyExc_TypeError, "parameter must be callable"); return NULL; }
msg133016 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-05 10:54
> functools_cmp_to_key() doesn't > check that the argument is a callable. I don't think that is necessary; it will fail with a TypeError when the attempt is made to call it. This is the approach we take elsewhere (look at the built-in filter() function for example). That being said, if you really think the early check is necessary, feel free to check it in.
msg133085 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-05 20:38
I have a follow-up question: why keyobject_type needed traverse function? From what I read in docs I assumed it is required for GC tracked types. Why was it required here and how it is used?
msg133090 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-05 20:59
> why keyobject_type needed traverse function? I used to know this but don't any more. It might not be necessary. Most of the types I write all use GC so it may just be habit.
msg133091 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-05 21:03
Georg, do you care to opine on whether this should go into 3.2.1? (I don't have any preference). There is a big thread on comp.lang.python where a number of people seem to be very concerned about the efficiency of cmp_to_key(). OTOH, we almost never backport performance patches unless the speed is so slow as to be unusable.
msg133092 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-05 21:06
I see. Thanks :-)
msg133403 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-09 17:04
Georg, would you opine on whether this should to into 3.2.1?
msg133413 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-04-09 18:59
I would have to say that this looks hardly a trivial speed patch, and chances are we cannot guarantee 100% behavior compatibility with the pure-Python version. If you disagree with these two points, then I'm okay with it going in.
msg133414 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-09 19:14
I agree (and was going back and forth between +0 and -0).
History
Date User Action Args
2022-04-11 14:57:15 admin set github: 55916
2011-04-09 19:14:48 rhettinger set status: open -> closedmessages: +
2011-04-09 18:59:25 georg.brandl set messages: +
2011-04-09 17:04:31 rhettinger set status: closed -> openmessages: +
2011-04-05 21:06:12 gruszczy set messages: +
2011-04-05 21:03:13 rhettinger set nosy: + georg.brandlmessages: + assignee: rhettinger -> georg.brandlresolution: accepted
2011-04-05 20:59:56 rhettinger set messages: +
2011-04-05 20:38:53 gruszczy set messages: +
2011-04-05 10:54:26 rhettinger set messages: +
2011-04-05 10:33:55 vstinner set nosy: + vstinnermessages: +
2011-04-05 10:21:51 python-dev set messages: +
2011-04-05 09:55:14 rhettinger set status: open -> closedmessages: +
2011-04-05 09:34:07 python-dev set nosy: + python-devmessages: +
2011-04-05 00:43:55 rhettinger set files: + 11707_4.patchmessages: +
2011-04-05 00:35:37 rhettinger set files: + time_cmp_to_key.pymessages: +
2011-04-04 21:15:05 rhettinger set messages: +
2011-04-04 21:07:17 gruszczy set messages: +
2011-04-04 18:05:32 rhettinger set messages: +
2011-04-04 17:22:07 terry.reedy set nosy: + terry.reedymessages: +
2011-04-03 21:26:21 rhettinger set messages: +
2011-04-03 21🔞16 gruszczy set files: + 11707_3.patchmessages: +
2011-04-03 05:44:15 rhettinger set messages: +
2011-04-03 03:08:11 ncoghlan set nosy: + ncoghlan
2011-04-02 22:19:53 rhettinger set messages: +
2011-04-02 20:53:24 gruszczy set files: + 11707_2.patchmessages: +
2011-04-02 20:47:18 eric.smith set nosy: + eric.smithmessages: +
2011-04-02 20:33:29 gruszczy set files: + 11707.patchkeywords: + patchmessages: +
2011-04-01 20:33:43 stutzbach set nosy: + stutzbach
2011-04-01 16:52:19 daniel.urban set nosy: + daniel.urban
2011-03-29 13:56:41 gruszczy set nosy: + gruszczy
2011-03-29 00:22:27 rhettinger create