[Python-Dev] memcmp performance (original) (raw)
Richard Saunders richismyname at me.com
Fri Oct 21 20:23:24 CEST 2011
- Previous message: [Python-Dev] Summary of Python tracker Issues
- Next message: [Python-Dev] memcmp performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Richard Saunders I have been doing some performance experiments with memcmp, and I was surprised that memcmp wasn't faster than it was in Python. I did a whole, long analysis and came up with some very simple results.
Antoine Pitrou, 20.10.2011 23:08: Thanks for the analysis. Non-bugfix work now happens on Python 3, where the str type is Python 2's unicode type. Your recommendations would have to be revisited under that light. Stefan Behnel <stefanml at behnel.de> Well, Py3 is quite a bit different now that PEP393 is in. It appears to use memcmp() or strcmp() a lot less than before, but I think unicodecompare() should actually receive an optimisation to use a fast memcmp() if both string kinds are equal, at least when their character unit size is less than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has such an optimisation.
I started looking at the most recent 3.x baseline: a lot of places, the memcmp analysis appears relevant (zlib, arraymodule, datetime, xmlparse): all still use memcmp in about the same way. But I agree that there are some major differences in the unicode portion.
As long as the two strings are the same unicode "kind", you can use a memcmp to compare. In that case, I would almost argue some memcmp optimization is even more important: unicode strings are potentially 2 to 4 times larger, so the amount of time spent in memcmp may be more (i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).
I went ahead a quick string_test3.py for comparing strings (similar to what I did in Python 2.7)
Simple python string comparison test for Python 3.3
a = []; b = []; c = []; d = [] for x in range(0,1000) : a.append("the quick brown fox"+str(x)) b.append("the wuick brown fox"+str(x)) c.append("the quick brown fox"+str(x)) d.append("the wuick brown fox"+str(x)) count = 0 for x in range(0,200000) : if a==c : count += 1 if a==c : count += 2 if a==d : count += 3 if b==c : count += 5 if b==d : count += 7 if c==d : count += 11 print(count)
Timings on On My FC14 machine (Intel Xeon W3520 at 2.67Ghz):
29.18 seconds: Vanilla build of Python 3.3 29.17 seconds: Python 3.3 compiled with -fno-builtin-memcmp:
No change: a little investigation shows unicode_compare is where all the work is: Here's currently the main loop inside unicode_compare:
for (i = 0; i < len1 && i < len2; ++i) { Py_UCS4 c1, c2; c1 = PyUnicode_READ(kind1, data1, i); c2 = PyUnicode_READ(kind2, data2, i);
if (c1 != c2) return (c1 < c2) ? -1 : 1; }
return (len1 < len2) ? -1 : (len1 != len2);
If both loops are the same unicode kind, we can add memcmp to unicode_compare for an optimization: Py_ssize_t len = (len1<len2) ? len1: len2;
/* use memcmp if both the same kind */ if (kind1==kind2) { int result=memcmp(data1, data2, ((int)kind1)*len); if (result!=0) return result<0 ? -1 : +1; }
Rerunning the test with this small change to unicode_compare:
17.84 seconds: -fno-builtin-memcmp 36.25 seconds: STANDARD memcmp
The standard memcmp is WORSE that the original unicode_compare code, but if we compile using memcmp with -fno-builtin-memcmp, we get that wonderful 2x performance increase again.
I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ... (after we put memcmp in unicode_compare)
Gooday,
Richie
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20111021/35f4473c/attachment-0001.html>
- Previous message: [Python-Dev] Summary of Python tracker Issues
- Next message: [Python-Dev] memcmp performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]