[Python-Dev] Optimizing list.sort() by checking type in advance (original) (raw)

Elliot Gorokhovsky elliot.gorokhovsky at gmail.com
Mon Oct 10 22:15:25 EDT 2016


Right - sorry about that last bit!

I couldn't figure out how to use timeit/perf for this because the list has to be reinitialized each time it gets sorted. So with time.time() I can just wrap the sorting and cut the initialization out (and I could always throw it in a for loop to do it as many times as necessary). But with timeit/perf, as I understand, you just give it a number of iterations and a code snippet and it loops for you. So I don't see how I could get it to cut out the initialization. There's an option to provide setup code, of course, but I need to set up before each trial, not just before the loop.

Regardless, one could always wrap the whole contribution with a length test and disable for lists with less than, say, 1000 elements. And though for the above reason I couldn't figure out how to benchmark properly for small lists, it's clear that for large lists this gives a real improvement with very little change in the code: the fact is that it really doesn't make sense to do all this typechecking nlogn times when we can just get it over with once at the beginning. The cost is very, very small, as demonstrated by the last benchmark (which is for a 1e7 list, so it is probably more valid with my crude technique), so heterogeneous lists don't get slowed down perceptibly.

On Mon, Oct 10, 2016 at 8:06 PM Guido van Rossum <guido at python.org> wrote:

So maybe someone should explain to Elliott why his own benchmarks are not trustworthy, rather than just repeat "use perf or timeit". Actually, there are two things: (a) when something new comes along it always needs to prove beyond a shadow of a doubt that it is actually an improvement and not a timing artifact or a trick; (b) you can't time sorting 10 values once and get a useful result. You have to do it many times. And you have to make sure that creating a list of 10 random values isn't taken as part of your test -- that's tricky since random() isn't all that fast; but it has to be done.

Although Elliott had it coming when he used needlessly offensive language in his first post. On Mon, Oct 10, 2016 at 5:09 PM, Steven D'Aprano <steve at pearwood.info> wrote: > On Mon, Oct 10, 2016 at 09:16:32PM +0000, Elliot Gorokhovsky wrote: > >> Anyway, benchmarking technique aside, the point is that it it works well >> for small lists (i.e. doesn't affect performance). > > You've been shown that there is something suspicious about your > benchmarking technique, something that suggests that the timing results > aren't trustworthy. Until you convince us that your timing results are > reliable and trustworthy, you shouldn't be drawing any conclusions > about your fastsort versus the standard sort. > > > -- > Steve _> ________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20161011/da51ba70/attachment-0001.html>



More information about the Python-Dev mailing list