[Python-Dev] memcmp performance (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Fri Oct 21 21🔞58 CEST 2011


On Fri, 21 Oct 2011 18:23:24 +0000 (GMT) Richard Saunders <richismyname at me.com> wrote:

If both loops are the same unicode kind, we can add memcmp to unicodecompare for an optimization: Pyssizet len = (len1<len2) ? len1: len2; /* use memcmp if both the same kind */ if (kind1==kind2) { int result=memcmp(data1, data2, ((int)kind1)*len); if (result!=0)  return result<0 ? -1 : +1;  }

Hmm, you have to be a bit subtler than that: on a little-endian machine, you can't compare two characters by comparing their bytes representation in memory order. So memcmp() can only be used for the one-byte representation. (actually, it can also be used for equality comparisons on any representation)

Rerunning the test with this small change to unicodecompare:

17.84 seconds:  -fno-builtin-memcmp  36.25 seconds:  STANDARD memcmp The standard memcmp is WORSE that the original unicodecompare code, but if we compile using memcmp with -fno-builtin-memcmp, we get that wonderful 2x performance increase again.

The standard memcmp being worse is a bit puzzling. Intuitively, it should have roughly the same performance as the original function. I also wonder whether the slowdown could materialize on non-glibc systems.

I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ... (after we put memcmp in unicodecompare)

A patch for unicode_compare would be a good start. Its performance can then be checked on other systems (such as Windows).

Regards

Antoine.



More information about the Python-Dev mailing list