[Python-Dev] memcmp performance (original) (raw)

Richard Saunders richismyname at me.com
Fri Oct 21 20:23:24 CEST 2011

Previous message: [Python-Dev] Summary of Python tracker Issues
Next message: [Python-Dev] memcmp performance
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Richard Saunders I have been doing some performance experiments with memcmp, and I was surprised that memcmp wasn't faster than it was in Python. I did a whole, long analysis and came up with some very simple results.

Antoine Pitrou, 20.10.2011 23:08: Thanks for the analysis. Non-bugfix work now happens on Python 3, where the str type is Python 2's unicode type. Your recommendations would have to be revisited under that light. Stefan Behnel <stefanml at behnel.de> Well, Py3 is quite a bit different now that PEP393 is in. It appears to use memcmp() or strcmp() a lot less than before, but I think unicodecompare() should actually receive an optimisation to use a fast memcmp() if both string kinds are equal, at least when their character unit size is less than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has such an optimisation.

I started looking at the most recent 3.x baseline: a lot of places, the memcmp analysis appears relevant (zlib, arraymodule, datetime, xmlparse): all still use memcmp in about the same way. But I agree that there are some major differences in the unicode portion.

As long as the two strings are the same unicode "kind", you can use a memcmp to compare. In that case, I would almost argue some memcmp optimization is even more important: unicode strings are potentially 2 to 4 times larger, so the amount of time spent in memcmp may be more (i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).

I went ahead a quick string_test3.py for comparing strings (similar to what I did in Python 2.7)

Simple python string comparison test for Python 3.3

a = []; b = []; c = []; d = [] for x in range(0,1000) : a.append("the quick brown fox"+str(x)) b.append("the wuick brown fox"+str(x)) c.append("the quick brown fox"+str(x)) d.append("the wuick brown fox"+str(x)) count = 0 for x in range(0,200000) : if a==c : count += 1 if a==c : count += 2 if a==d : count += 3 if b==c : count += 5 if b==d : count += 7 if c==d : count += 11 print(count)

Timings on On My FC14 machine (Intel Xeon W3520 at 2.67Ghz):

29.18 seconds: Vanilla build of Python 3.3 29.17 seconds: Python 3.3 compiled with -fno-builtin-memcmp:

No change: a little investigation shows unicode_compare is where all the work is: Here's currently the main loop inside unicode_compare:

for (i = 0; i < len1 && i < len2; ++i) { Py_UCS4 c1, c2; c1 = PyUnicode_READ(kind1, data1, i); c2 = PyUnicode_READ(kind2, data2, i);

if (c1 != c2) return (c1 < c2) ? -1 : 1; }

return (len1 < len2) ? -1 : (len1 != len2);

If both loops are the same unicode kind, we can add memcmp to unicode_compare for an optimization: Py_ssize_t len = (len1<len2) ? len1: len2;

/* use memcmp if both the same kind */ if (kind1==kind2) { int result=memcmp(data1, data2, ((int)kind1)*len); if (result!=0) return result<0 ? -1 : +1; }

Rerunning the test with this small change to unicode_compare:

17.84 seconds: -fno-builtin-memcmp 36.25 seconds: STANDARD memcmp

The standard memcmp is WORSE that the original unicode_compare code, but if we compile using memcmp with -fno-builtin-memcmp, we get that wonderful 2x performance increase again.

I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ... (after we put memcmp in unicode_compare)

Gooday,

Richie

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20111021/35f4473c/attachment-0001.html>

Previous message: [Python-Dev] Summary of Python tracker Issues
Next message: [Python-Dev] memcmp performance
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list