[Python-3000] Making more effective use of slice objects in Py3k (original) (raw)
Josiah Carlson jcarlson at uci.edu
Mon Aug 28 21:49:39 CEST 2006
- Previous message: [Python-3000] Making more effective use of slice objects in Py3k
- Next message: [Python-3000] Making more effective use of slice objects in Py3k
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Guido van Rossum" <guido at python.org> wrote:
Josiah (and other supporters of string views), You seem to be utterly convinced of the superior performance of your proposal without having done any measurements. You appear to have a rather naive view on what makes code execute fast or slow (e.g. you don't seem to appreciate the savings due to a string object header and its data being consecutive in memory). Unless you have serious benchmark data (for realistic Python code) I can't continue to participate in this discussion, where you have said nothing new in many posts.
Put up or shut up, eh?
I have written a simple extension module using Pyrex (my manual C extension writing is awful). Here are some sample interactions showing that string views are indeed quite fast. In all of these examples, a naive implementation using only stringview.partition() was able to beat Python 2.5 str.partition, str.split, and re.finditer.
Attached you will find the implementation of stringview I used, along with sufficient build scripts to get it working using Python 2.3 and Pyrex 0.9.3 . Aside from replacing int usage with Py_ssize_t for 2.5, and *nix users performing a dos2unix call, it should work without change with the most recent Python and Pyrex versions.
- Josiah
Using 2.3 : >>> x = stringview(40000*' ') >>> if 1: ... t = time.time() ... while x: ... _1, _2, x = x.partition(' ') ... print time.time()-t ... 0.18700003624 >>>
Compared with Python 2.5 beta 2 >>> x = 40000*' ' >>> if 1: ... t = time.time() ... while x: ... _1, _2, x = x.partition(' ') ... print time.time()-t ... 0.625 >>>
But that's about as bad for Python 2.5 as it can get. What about something else? Like a mail file? In my 21.5 meg archive of py3k, which contains 3456 messages, I wanted to discover all messages.
Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
from stringview import * rest = stringview(open('mail', 'rb').read()) import time if 1: ... x = [] ... t = time.time() ... while rest: ... cur, found, rest = rest.partition('\r\n.\r\n') ... x.append(cur) ... print time.time()-t, len(x) ... 0.0780000686646 3456
What about Python 2.5 using split? That should be fast...
Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
rest = open('mail', 'rb').read() import time if 1: ... t = time.time() ... x = rest.split('\r\n.\r\n') ... print time.time()-t, len(x) ... 0.109999895096 3457
Hrm...what about using re?
import re pat = re.compile('\r\n.\r\n') rest = open('mail', 'rb').read() import time if 1: ... x = [] ... t = time.time() ... for i in pat.finditer(rest): ... x.append(i) ... print time.time()-t, len(x) ... 0.125 3456
Even that's not as good as Python 2.3 + string views.
-------------- next part -------------- A non-text attachment was scrubbed... Name: stringview_build.py Type: application/octet-stream Size: 654 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: stringview.pyx Type: application/octet-stream Size: 2639 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: stringview_helper.h Type: application/octet-stream Size: 1656 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: _setup.py Type: application/octet-stream Size: 255 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0003.obj
- Previous message: [Python-3000] Making more effective use of slice objects in Py3k
- Next message: [Python-3000] Making more effective use of slice objects in Py3k
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]