[Python-Dev] memcmp performance (original) (raw)

Richard Saunders richismyname at me.com
Fri Oct 21 20:23:24 CEST 2011


Richard Saunders I have been doing some performance experiments with memcmp, and I was surprised that memcmp wasn't faster than it was in Python. I did a whole, long analysis and came up with some very simple results.

Antoine Pitrou, 20.10.2011 23:08: Thanks for the analysis. Non-bugfix work now happens on Python 3, where the str type is Python 2's unicode type. Your recommendations would have to be revisited under that light. Stefan Behnel <stefanml at behnel.de> Well, Py3 is quite a bit different now that PEP393 is in. It appears to use  memcmp() or strcmp() a lot less than before, but I think unicodecompare()  should actually receive an optimisation to use a fast memcmp() if both  string kinds are equal, at least when their character unit size is less  than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has  such an optimisation.

I started looking at the most recent 3.x baseline: a lot of places,  the memcmp analysis appears relevant (zlib, arraymodule, datetime, xmlparse): all still use memcmp in about the same way.  But I agree that there are  some major differences in the unicode portion.

As long as the two strings are the same unicode "kind", you can use a  memcmp to compare.  In that case, I would almost argue some memcmp optimization is even more important: unicode strings are potentially 2 to 4 times larger, so the amount of time spent in memcmp may be more (i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).

I went ahead a quick string_test3.py for comparing strings (similar to what I did in Python 2.7)

Simple python string comparison test for Python 3.3

a = []; b = []; c = []; d = [] for x in range(0,1000) :     a.append("the quick brown fox"+str(x))     b.append("the wuick brown fox"+str(x))     c.append("the quick brown fox"+str(x))     d.append("the wuick brown fox"+str(x)) count = 0 for x in range(0,200000) :     if a==c : count += 1     if a==c : count += 2     if a==d : count += 3     if b==c : count += 5     if b==d : count += 7     if c==d : count += 11 print(count)

Timings on On My FC14 machine (Intel Xeon W3520 at 2.67Ghz):

29.18 seconds:  Vanilla build of Python 3.3  29.17 seconds: Python 3.3 compiled with -fno-builtin-memcmp: 

No change: a little investigation shows unicode_compare is where all the work is: Here's currently the main loop inside unicode_compare:

    for (i = 0; i < len1 && i < len2; ++i) {         Py_UCS4 c1, c2;         c1 = PyUnicode_READ(kind1, data1, i);         c2 = PyUnicode_READ(kind2, data2, i);

        if (c1 != c2)             return (c1 < c2) ? -1 : 1;     }

    return (len1 < len2) ? -1 : (len1 != len2);

If both loops are the same unicode kind, we can add memcmp to unicode_compare for an optimization:        Py_ssize_t len = (len1<len2) ? len1: len2;

    /* use memcmp if both the same kind */     if (kind1==kind2) {       int result=memcmp(data1, data2, ((int)kind1)*len);       if (result!=0)  return result<0 ? -1 : +1;      }

Rerunning the test with this small change to unicode_compare:

17.84 seconds:  -fno-builtin-memcmp  36.25 seconds:  STANDARD memcmp

The standard memcmp is WORSE that the original unicode_compare code, but if we compile using memcmp with -fno-builtin-memcmp, we get that wonderful 2x performance increase again.

I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ... (after we put memcmp in unicode_compare)

  Gooday,

  Richie

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20111021/35f4473c/attachment-0001.html>



More information about the Python-Dev mailing list