[Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140 (original) (raw)

Tim Peters tim.one@comcast.net
Sun, 21 Apr 2002 00:22:39 -0400


I expect Martin checked in this change because of the unhappy hours he spent determining that the previous two versions of this function wrote beyond the memory they allocated. Since the most recent version still didn't bother to assert that it wasn't writing out of bounds, I can't blame Martin for checking in a version that does so assert; since I spent hours on this too, and this function has a repeated history of bad memory behavior, I viewed the version Martin replaced as unacceptable.

However, the slowdown on large strings isn't attractive, and the previous version could easily enough have asserted its memory correctness.

-----Original Message----- From: python-checkins-admin@python.org [mailto:python-checkins-admin@python.org]On Behalf Of M.-A. Lemburg Sent: Saturday, April 20, 2002 11:26 AM To: loewis@sourceforge.net Cc: python-checkins@python.org Subject: Re: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140

loewis@sourceforge.net wrote:

Update of /cvsroot/python/python/dist/src/Objects In directory usw-pr-cvs1:/tmp/cvs-serv30961 Modified Files: unicodeobject.c Log Message: Patch #495401: Count number of required bytes for encoding UTF-8 before allocating the target buffer. Martin, please back out this change again. We have discussed this quite a few times and I am against using your strategy since it introduces a performance hit which does not relate to the gained advantage of (temporarily) using less memory. Your timings also show this, so I wonder why you checked in this patch, e.g. from the patch log: """ For the current CVS (unicodeobject.c 2.136: MAL's change to use a variable overalloc), I get 10 spaces 20.060 100 spaces 2.600 200 spaces 2.030 1000 spaces 0.930 10000 spaces 0.690 10 spaces, 3 bytes 23.520 100 spaces, 3 bytes 3.730 200 spaces, 3 bytes 2.470 1000 spaces, 3 bytes 0.980 10000 spaces, 3 bytes 0.690 30 bytes 24.800 300 bytes 5.220 600 bytes 3.830 3000 bytes 2.480 30000 bytes 2.230 With unicode3.diff (that's the one you checked in), I get 10 spaces 19.940 100 spaces 3.260 200 spaces 2.340 1000 spaces 1.650 10000 spaces 1.450 10 spaces, 3 bytes 21.420 100 spaces, 3 bytes 3.410 200 spaces, 3 bytes 2.420 1000 spaces, 3 bytes 1.660 10000 spaces, 3 bytes 1.450 30 bytes 22.260 300 bytes 5.830 600 bytes 4.700 3000 bytes 3.740 30000 bytes 3.540 """ The only case where your patch is faster is for very short strings and then only by a few percent, whereas for all longer strings you get worse timings, e.g. 3.74 seconds compared to 2.48 seconds -- that's a 50% increase in run-time ! Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH


Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/


Python-checkins mailing list Python-checkins@python.org http://mail.python.org/mailman/listinfo/python-checkins